Tame your LLM

Why you should read this publication

LLMs are gaining ground everywhere, with incredible promises of new high-performance and, brace yourself, “intelligent” tools. Research is progressing more slowly than these promises, and regularly gives us a clearer and more precise view of things. Here, we outline the fundamental limitations of these models, explore the potential security risks, and finally look at approaches that can deliver acceptable results. Whatever the tool you want to use, knowing its limits will always help you work better.

What business processes are likely to change as a result of this research?

The use of LLM to generate answers from extracted information will have to continue to evolve to be more robust to hallucinations or missing information. In the same way, total security of these models is now a matter of course.

the use cases we have developed for customers that touch on the subject of this research review

We regularly deploy these tools and implement them in concrete solutions: searching in a document database, qualifying the toxicity of a message, assisting in a professional activity.

If you only have a minute to devote to reading now, here's the essential content in a nutshell

  1. Fun fact: LLMs fundamentally fail to differentiate clearly between the instruction and the data to be processed when asked to generate a response. Prompt injection has a bright future ahead of it.
  2. This is a new question! So much so that we finally have a first dataset to measure the impact of these problems on a new model.
  3. Fun fact bis: researchers have discovered a new method for easily extracting the private data present in training or fine tuning from a model of this type. If you’ve trained a model on sensitive data, protect it and its use!
  4. In the latter case, the attack can be made on the training data or on the fine tuning, even with very limited knowledge of the shape of the sentence containing the sensitive data.
  5. Composing information means knowing how to link different pieces of information together, for example: “John was born in London, London is in the UK, in which country was John born?”. This kind of thinking is a classic challenge for LLM approaches.
  6. Authors have shown that this type of information composition cannot be solved correctly by Transformer architectures, with a strong limit depending on the number of elements you want to be able to compose.
  7. Finally, because we must always dream a little: Deepmind proposes a new, more optimized architecture, choosing at each layer of the model which elements of the input sequence will be worked on.

New limits for LLMs, new architecture methods, fun will never end... Does an LLM really know how to distinguish an instruction from the data to be processed?

If we look at the landscape of Large Language Models via the numerous announcements of new tools or start-ups, we could easily believe that these tools know how to manage the text to be processed in a rather fine way, at least sufficiently to make such a distinction. And yet, let’s land without grace to get back in touch with reality, such a claim would be particularly dangerous.

This is what Zverev et al demonstrate in ” Can LLMs separate instructions from data? and what do we even mean by that? [1]. The question is deliberately shocking, but it comes at just the right moment to wake us up. As these models are very recent, and our field suffers from a considerable theoretical deficit, this question has not even been properly asked by researchers. So it’s hardly surprising to see this publication at the ICLR 2024 workshop dedicated to the trust and security we can bring to artificial intelligence…

The problem taken up by researchers is that of prompt bypassing. This subject is well known today: a service is developed via an LLM which will take as input text from a third-party user. The call to the LLM will use a prompt from more or less glorious iterations to frame the result. But our user may have fun inserting new instructions into the submitted text, diverting the LLM from its original purpose (the naughty one) and creating an unacceptable security hole.

This is the case studied here by the authors. Indeed, in this usage, the initial prompt is considered as the LLM instruction, and the third-party user’s content is an input data that should not be executed. The researchers are therefore fundamentally questioning the ability of such a model to separate instruction from input. And the answers are, not surprisingly, quite unpleasant.

To address this issue, the authors have already defined a metric to identify how robust a model is when a textual element is transferred from the instruction area to the execution area. Below, g is the model, s is the initial instruction, d is the initial execution data, and x is a disturbance :

This score takes up our good old Kullback-Leibler divergence. The higher the score, the less the model will divide. Note that this metric is rather “weak”, in that it only studies the variation of the response when the distractor is moved, but is not interested in the quality of the response.

Another contribution of the authors was the creation of the first dataset to study this type of disturbance, and therefore the ability of a model to separate instructions from data to be processed. The SEP(should it be executed or processed) dataset thus comprises 9160 elements, each made up of an original instruction, execution data, a disruptive instruction (which will be injected either from one side or the other), and the response to this disruptive instruction. An example of an element is shown below:

And so, rolling wet drums, what are the results?

The lower the separation score, the less the model distinguishes between the two concepts… And here’s a real surprise: the “bigger” a model is, the less able it is to make an acceptable distinction between instruction and data, with GPT-4 in particular being the worst model. In fact, these larger models are better overall, but much more complex to master, and therefore much more sensitive to promptinjections…

This is a very important piece of work, as it shows the gap between a business world that is soaring high into the skies, while researchers continue to ask very fundamental questions, questions to which the answers are not very reassuring. Let’s hope that this type of work will give rise to other approaches to fundamental analysis, so that we can finally better understand these tools and their limits. Otherwise, we’ll remain condemned to wandering in this limbo where we fantasize about the performance of these models, only to be totally disappointed…

[1] https://arxiv.org/pdf/2403.06833v1

 

LLMs trained on sensitive data are ticking time bombs

As we already know from so-called “model inversion” approaches, it is possible (or even trivial) to extract part of the training data from a trained neural network. As soon as a model is trained on a company’s internal data, or even on protected data (personal data, health data), the model itself must be considered as data to be protected.

This state of affairs was depressing enough as it was, Deepmind is offering us a new piece of work as part of the recent ICLR 2024, in which they describe a particularly distressing new form of attack, aimed at the Large Language Models we love so much. “Teach LLMs to Phish: Stealing Private Information from Language Models” by Panda et al, presents this new apocalypse we didn’t necessarily need…

The authors describe a new collection of attacks, such as :

  • The attacker has a vague idea of the upstream text to be used to force the model to spit out sensitive data. Simply asking for the beginning of a biography activates the attack.
  • The attacker can poison the training dataset by adding a moderate number of elements, and then push the model during fine-tuning to keep the sensitive data it sees passing through.
  • As soon as a sensitive piece of data is at least duplicated, the success of the attack increases by 20%. The larger the model, the more fragile it is…
  • Conventional defenses (deduplication) are ineffective.

Want to find out more? Below is a diagram showing one of the attacks described by the authors:

Above:

  • A few “poison” elements are injected into the dataset. These elements are not necessarily injected throughout training, but are sufficient to teach the model to retain specific information (in this case, credit card number).
  • During fine tuning, the model naturally sees this type of information at least once.
  • Finally, when in use, the model sends a prompt weakly reproducing the appearance of poisons, and can retrieve this sensitive data.

And unsurprisingly (cynicism is a combat sport in Deep Learning), it works extremely well. In the first case, the authors analyze the situation where the attacker knows nothing about the “secret”, i.e., the information he wants to capture. Several statements below, where the “secret extraction rate” on the ordinate estimates the proportion of sensitive information that can be extracted:

#1: Duplicate secrets are very easy to obtain

Two curves can be seen above: the blue curve, where the secrets are present only once in the dataset, and the orange curve, where they are duplicated (twice each). While we observe a degradation of the attack each time the size of the secret increases, the simple fact of having duplicated them is enough to explode the scores.

#2: The bigger the model, the more it memorizes these secrets

We observe that on an attack with no prior knowledge, increasing the size of the model causes the extraction rate to explode. We’re back to the sad, classic moral of LLMs: the bigger it is, the more dangerous it is…

#3 Longer training periods or more data make things worse

Left: with sufficient poisons in the training set, the extraction rate worsens when the model has been trained more.

Right: fine-tuning clean data also increases the rate of secret extraction.

But, let’s not forget, this is a case where the attacker doesn’t know exactly what he’s after! However, in the vast majority of cases, the attacker will have imperfect but real knowledge of his victim, and will therefore be able to use this a priori knowledge to better target his attacks. In such cases, attackers can extract up to 80% of secrets. Other declensions concern federated learning, which also seems to be concerned.

What does this mean? That if you want to train a model on a company’s data, or on private data, it’s unimaginable not to totally protect this model, but that perfect protection doesn’t exist… Consequently, it seems important not to leave an LLM in direct access (even if restricted), but to instrument the calls as much as possible to limit the model’s ability to generate information from its training. LLMs are excellent language modeling tools, but using them as information extractors is a delicate subject today, if only in view of the limits we observe in the field of RAG. And it’s no coincidence that framing a model to limit problems (and hallucinations in particular) is the subject of our next webinar in May. 😊

Information composition: a fundamental limitation of LLMs

Following on from the first work described in this review, we have here a new publication from Deepmind (which remains, despite epistemological precautions, an unavoidable prescriber) tackling the task of mapping the ability of these models to address specific simple problems. Forget the oracles announcing versatile human assistant models, the problem studied here is very simple and fundamental: the ability to link elements together. For example (the prompt is potentially much longer, but contains this information):

“London is in the United Kingdom… Alan Turing was born in London… In what country was Alan Turing born?”

On Limitations of the Transformer Architecture[1]by Peng et al, focuses on this so-called composition problem. It is part of a line of work aimed at identifying strong limitations of the Transformer architecture which, since Vaswani et al, has been the go-to approach in language processing. Several previous results are already worthy of note:

  • It was proved in 2020 that a Transformer could not always recognize the parity of information (e.g., handle double negations) or evolve in recursively open parentheses…
  • A more complex case, the so-called 3-matching approach consists of three numbers in a row in a sequence such that the sum of these three numbers is equal to 0 modulo another value. Note that the 2-matching approach can be handled by a Transformer but not by any other known architecture such as MLPs, underlining the superiority of Transformers on certain points.

Each time, this type of result allows us to project ourselves a little more accurately into what these architectures can or cannot achieve. Beware, however! These works are often limited to fairly simple forms of the architecture in question, typically just one or two layers. The stacking of processing layers in a model is a strong factor in complexity evolution (admittedly poorly controlled), and these results lose their mathematically demonstrated “absolute” aspect when faced with models such as Llama2, Llama3, GPT4 or Mistral. Having given this warning, the authors note that the problem seems to persist with GPT4, for example.

No illustration here, and that’s good news! The authors demonstrate (yes, a real mathematical demonstration) that a single-layer Transformer will be unable to manage this information composition problem as soon as the number of elements to be composed exceeds a certain threshold, depending on the number of attention heads and the dimensionality of the latent space. In other words, these (simplistic) architectures have an absolute limit to the number of elements they can compose. Note also that the authors extend the analysis to the Chain of Thought (one of the few prompt methods to have survived beyond three months) to demonstrate the existence of a similar limit.) The examples below illustrate this problem, this time for the “state of the art” model:

Let’s compare this kind of fundamental result to RAG approaches, where at the end of processing we create a long prompt containing the most information, then ask our beloved LLM to synthesize it all, and let’s observe that we’re taking a very dangerous gamble here… It’s not for nothing that in a significant number of cases where a customer comes to us for this type of project, we question the importance of LLM to model a “pretty” text at the end, compared to the proposal of correctly structured unitary information!

[1] https://deepmind.google/research/publications/77946/

But how do you use an LLM? An answer from Stanford

It would be (too) easy to fill our research journals with depressing (albeit necessary) examples. If these models have gigantic flaws, they nonetheless retain a remarkable power in language manipulation, and the right question is rather how to use these models. As a reminder, we recently saw, in another magazine, the AlphaGeometry example where the LLM was confronted with a symbolic validation engine, with impressive results.

The title here describes the ambition: “Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models”, [1]by Shao et al. The article proposes a global modus operandi for writing Wikipedia-like long articles with these explosive tools. And the answer lies fundamentally in breaking down the problem into many different, better-structured calls to achieve much more acceptable results.

The authors here break down the problem: if I want to write a complete article, I’ll need to do some preparatory research to find different visions of the same subject. I’d also need to identify relevant sources (spoiler: don’t wriggle too quickly), and finally generate an overall plan first and then write each section following that plan. This approach has been implemented, as follows:

We start at the top left with a topic, a subject on which we want to write our article. We’ll have the following approaches (follow the numbers in the diagram 😊 ) :

  • (1): The “Survey”. The tool will search for a number of existing articles that may be related to our subject of analysis. A similarity distance will be used to identify this global working context.
  • (2): The generation of “perspectives”. Perspective” here refers to an angle of analysis of the problem, defining the behavior of an agent. For example, to talk about a district of Paris, we could have a perspective where the model must act as a real estate agent, another where it will have in its prompt a role of inhabitant, then historian, and so on. This search for perspectives is fundamental to forcing subsequent LLMs to innovate in their generations of questions.
  • (3) : For each perspective, an LLM is used. At each iteration, this LLM will create a new question on the subject to be analyzed, taking into account the perspective and the history of questions/answers already generated.
  • (4) to (6): A second, “expert” LLM receives this question. It will start by decomposing the question into unit queries to be made to a validated information base, executing these queries and filtering the result, to then generate a synthesis
  • (7) : In parallel, an LLM is asked to generate theoutline of the article to be respected.
  • (8) : Finally, all the generated conversations are used so that one last (finally!) LLM can generate the content.

This approach produces much better results. We’ve found that breaking this down into a number of specific calls logically limits hallucinations, while at the same time giving us a lever of control over the tool: we can analyze and filter the perspectives used, such as the conversations selected to generate the final text.

Nevertheless, a few unpleasant remarks are obligatory here:

  • Already, we’re multiplying the number of calls to LLMs, driving up the cost of using the system.
  • Then, even if it has been observed that exchanging two different LLMs generates better results, we’re still talking about dangerous tools that can fail. The same applies to calling on the expert to synthesize the results of his research. Even if we force a context based solely on controlled and true information, our LLM will obviously be able to hallucinate.

Moreover, if we want to project ourselves towards a professional tool, we observe that the validated information base is indispensable here, and that we need to be able to carry out unit searches on it successfully, which is not impossible, but not necessarily guaranteed…

[1] https://arxiv.org/pdf/2402.14207v2

Transformer's architectural upgrades go down the drain

The author can’t fully contain his emotion when thinking of all these attempts, thrown into the sea ofArxiv since 2017, to improve the architecture of good old Transformer. The majority of these attempts dance around the attention mechanism and its quadratic complexity, but some, more interesting than others, question the ability to approach each problem differently via the same architecture. The challenge is to create a model capable of “deciding” whether it has sufficiently processed a sequence or not, in order to get away from the monolithic nature of these architectures. One candidate (which we studied in 2018) in its time was Universal Transformers by Dehghani et al.

Today, Deepmind is proposing a new architecture that seems very relevant, even if the reader should keep the necessary distance and not see it as a definitive solution. “Mixture-of-Depths: Dynamically allocating compute in transformer-based language models”, [1]by Raposo et al, returns to the subject of dynamic adjustment to the complexity of a sequence with an approach that offers clear advantages.

Basically, the idea here is to get around the main problem of all previous approaches, which “decided” within the model whether certain elements of a sequence needed more work than others. These approaches were indeed “too dynamic”, and even though they reduced the number of calculations required, the fact that this number of calculations could vary caused major optimization problems, preventing us from really obtaining a more efficient approach in the end. The idea here is to decide, upstream, the number of tokens on which each layer of the model will work, and then force the model to choose which tokens to keep. As the number of tokens does not change, this approach is easier to implement and optimize.

The diagram below shows the central mechanism proposed by the authors:

Left: the central mechanism of a Transformer layer, illustrated in the case where a token will not be the source of calculations but will be directly transferred to the next layer (case on the left), and in the case where, following the “Route” calculation, the token will instead be transformed to generate new, higher-level data (right).

Right from top to bottom: the first graph shows, through the elements of a sequence (x-axis), for the different layers of the model (y-axis), whether each element has been processed or, on the contrary, has been routed around the block. We can see that, depending on the layer, the model learns to handle each token differently. Below, on one side we see a similar diagram for a “Vanilla Transformer” (i.e. a classic architecture), and on the right the Early-Exit version. This version corresponds to the approaches taken by the authors prior to this work, when it was possible to decide to stop working on an element of the sequence, but only in a definitive way.

The results are interesting, and presented in two forms of application by the authors:

  • For similar computational complexity, the authors improve the initial architecture by up to 2% in accuracy.
  • More interestingly, the authors train a Dynamic Transformer with the same quality as a conventional model, but with half the number of calculations.

Let’s not forget that in Deep Learning, speeding up training is often more interesting than increasing results in a crude way. And while we obviously can’t guarantee that this particular architecture will become the norm, it’s clear that this type of improvement is becoming more widespread. We could even compare this work with Mamba (discussed in the previous review) as an interesting candidate to watch out for in the next generation of models. The monolithic approach of Transformers is indeed a flaw (over which Mamba, with its selection system, takes a clear advantage). Here, one interest of this work is that it builds on other competing Transformer optimization work and will potentially be simpler to generalize. As usual, let’s remain cautious and continue to spy on research. 😊

[1] https://arxiv.org/pdf/2404.02258v1