How to integrate AI and LLMs to your products

IA has been all the rage lately. With the release of ChatGPT from OpenAI and Claude from Anthropic, more and more products are integrating machine learning into their offers.

A big challenge for us has been to find ways to use these new large language models while preserving the privacy of our clients and adapting existing solutions to our needs.

Let’s see how to get the most out of the recent machine learning models released and make your products cutting-edge. Fine-tuning, Vector embeddings, Vector databases and few-shot learning will have no secret to you.

But first, what is going on with AI ?

Skip this if you are already familiar with language processing and neural network basics

First, the term “AI” (Artificial Intelligence) is mostly a marketing buzzword. Researchers are working in the field of Machine Learning (or ML), which is the science on how to take a lot of data and train systems that can learn from this data to perform useful tasks (loosely speaking).

Over the years, a number of algorithms were invented to perform this task, but the real game changers came in 2013 and 2017.

In 2013, a new research paper titled “Efficient Estimation of Word Representations in Vector Space” (or word2vec) was published by multiple Google researchers. It describe an algorithm to convert a word into a vector aka a list of numbers. What’s special about this vector (which represents the word) is that the numbers of the list are related to the meaning of the word (and not the letters used for example). This means that two words with similar meanings will have similar list of numbers (the vectors are said to be close). As computers are good with numbers, word2vec allows an efficient representation of words. It is probably the most important algorithm for text processing.

However, to analyse a text, you need to consider how words fit together. To do this, in 2017, researchers (also at Google) invented a new way to assemble neural networks together called a transformer in a paper named “Attention is all you need”. It uses a piece of code inside the neural network called “self-attention” that allows the system to process groups of words together to infer the overall meaning of a text (a process called inference) instead of focusing on one word at a time like was previously done before with LSTM.

What changed in 2022 is that, as computers grew more powerful and with advances in GPUs, people and companies such as OpenAI were able to train transformers to predict what words come next in a text. But they used just about every piece of text ever written as training data.

This makes for a very powerful system, like a sort of super text autocomplete that’s able to produce human looking speak and allows one to build chatbots, summarizers or translators.

One big issue however is that training these models takes a lot of computing power, think about 32,77$ per hour of training if you use AWS with a single GPU instance to train your model and several millions to train a full model.

So, training your own models is not practical if AI is not the main focus of your company. However, there are multiple other things you can do to make this best use of existing models.

How can you adapt models to your needs ?

So, if training is out of the question, how can you use large language model to provide new features to your customers ? How to turn chatGPT into a personal chatbot assistant for your website ? Or build a system that can answer questions requiring specific business knowledge ?

Well, there are several ways, listed here from easiest to hardest to achieve.

Using clever prompts

The simplest way to have a tailored AI is to use a very powerful existing model like ChatGPT and  craft a prompt that integrates the way you want the model to behave. These powerful models are called foundation models as they are the foundation of your app.

For example, you can write a prompt like this:

You are a help desk assitant. If the customer asks about X, answer Y and provide a link to Z.
If the customer asks about X', answer Y' etc...

This is what’s called few-shot learning. You give the model a few examples, or shots on how to respond, hence the name “few-shot learning”.

Advantages of this method

  • Simple to setup
  • You can get very good results quickly


  • You are reliant on other services like OpenIA
  • All your business data does through OpenIA and they can access it
  • User can perform “prompt injection” to access your prompt and potentially steal business secrets.
  • The API costs can quickly rack up if your product is used a lot.

Using smaller models for information retrieval

Another way to use the new advances in text processing is to use vector embeddings. The idea is to turn a bit of text into a vector. This uses a generalized version of word2vec that is able to represent the meaning of a piece of text as a list of number.

Once you have stored all your documents as vectors, you can build a search engine for your documents or snippets of your documents. This is what Quivr does. When a user types some text, you convert it into a vector using a large language model. These models are open-source so you can run them on your computer with no connection and using little CPU power.

Using this vector, you can find the documents most similar to what the user searched, to provide him with answers to his request.

Advantage of this method

  • Can run locally on your infrastructure with no privacy concerns
  • No risk of prompt injection
  • The response time is very fast (less than a few milliseconds), making for a snappy user experience


  • You are not generating text, just providing existing text to the user or doing clever pattern matching.
  • The product cannot produce something truly novel.

Usually, what people do is to combine this information retrieval technique with a clever prompt. You first retrieve relevant documents and you use them to craft a prompt that you feed to a foundation model to get relevant information.

Fine-tuning models

Sometimes, you want the benefits of foundation models but you also need privacy. This is where fine-tuning comes in. There are a lot of large open-source models available online on websites like HuggingFace. These models are not as powerful as the ones provided by AI companies but still have a good understanding of English, translation and summarization capabilities.

Some models are specially crafted to generate code like codegen2 or are general purpose like xgen.

Well, you can take these models and provide them your own documents so that the model learn how to speak like you and ingests information relevant to your business and your usecase.

Advantage of this method

  • Can run locally on your infrastructure
  • The purpose of the hardcoded and cannot be easily changed with clever prompting by malicious users.


  • You’ll still need powerful hardware for fine-tuning, an M1 mac with 16 Go is the bare minimum and about a day of compute time, so iterating is not quick.
  • You’ll need a lot of internal data for the fine-tuning (at least 10 000 data entries)
  • If you feed it confidential documents during the training, the model might reproduce them when used.

Adding more layers

Sometimes, you are not interested in retrieving text or producing text, but simply producing an analysis on text. This includes

  • Sentiment analysis
  • Identifying the author
  • Extracting one specific piece of information like a date, a revenue, or an event outcome (match, lawsuit, etc…)

In that case, what you can do is take an existing open source model made for text generation and chop its head off. Models are made of layers where each layer does a bit of processing on the text represented as a vector and the last layer converts back the vectors into human readable text.

You can remove the last part and replace it with a layer that converts the vectors into a number or a list of choices (to do some kind of classification task for example)

You can then train your last layer separately of the rest of the model on your task using common machine learning models used by data scientists like random forests or simply a linear regression.

Advantage of this method

  • Does not require a lot of computing power
  • Does not require a lot of internal data
  • Runs locally with no privacy concerns

Disadvantages of this method

  • You need somebody with some data science expertise to help you setup the system
  • You only extract one information from your text

AI @ M33

Here at M33, developer productivity is a key concern. We need to build mobile apps that are better than our competitors while producing quality code and being fast.

We have a large internal codebase containing common patterns used on apps. We use an embedding model to quickly search this codebase and provide results straight to our developers through a VSCode extension so that they can access common coding patterns and best practices from the tip of their fingers.

At the same time, we are working on models to detect the author of a piece of code to better detect the author of a bug when a lot of refactoring took place as well as other projects ! Interested ? Contact us !

So, what is the next way you intent to use AI at your company ?

Product Owner ?

Rejoins nos équipes