In the world of AI product development, navigating complex challenges is part of the game. Our vision at Pieces is to make the developer workflow effortless by facilitating the retrieval, understanding, and reuse of code snippets. To enable that, we need machine learning models that can extract essential metadata from a given code snippet, such as titles, descriptions, tags, and related links. We were interested in developing a model that could operate locally on a user's device, be adept at multitasking, and deliver results efficiently. LoRA AI (Low-rank adaptation of large language models) provided the functionality we were looking for.
Initially, we had a clear vision of what our machine learning system should accomplish, but there were challenges to overcome: Both model training and data collection posed difficulties. The conventional approach of assembling a dataset, selecting a transformer trained on a similar task, and fine-tuning it wasn’t going to work for us.
First, fine-tuning a generative language model on a specific task requires a significant amount of labeled training data. And while some of our data needs could be fulfilled with scraped open-source datasets from StackOverflow and GitHub, some tasks, such as code snippet titles, simply had no direct data source.
Next, hyper parameter tuning a separate transformer on each task would have required us to store each model on the local machine, requiring too much disk space to be feasible. Finally, fine-tuning several transformers would take a considerable amount of time, slowing the pace of iteration.
Two pivotal techniques in the modern ML toolkit, LoRA training with parameter-efficient fine-tuning and AI-generated target data, proved instrumental in overcoming these obstacles.
LoRA AI: Our Secret Sauce for Simplifying Training Complexity
Our journey to tackle the modeling challenge started with fine-tuning parameters with efficient methods. These methods aim to fine-tune a model to a specific task by optimizing a subset of the model parameters. These methods have shown to require less training data and training compute, but also can improve generalization error. Here, the LoRA AI model stood out as a potential solution.
LoRA machine learning operates on the idea that while the weight matrices of large language models (LLMs) are usually full-rank, the changes made while adapting a model for a specific task usually have lower rank. We learn trainable decomposition matrices that are added to the attention layers of the transformer.
This substantially reduced the number of parameters we needed to train (by a factor of 100 in our cases) in comparison to full fine-tuning, which retrains all parameters. Using LoRA AI dramatically reduced our training time, accelerating iteration and achieving impressive generalization. In fact, for the smaller language models we were interested in, LoRA training performed better on significantly less data than full fine-tuning.
One other key advantage of the LoRA training method was that it allowed us to learn separate adaptions of the model for each task, each of which is only a fraction of the size of the foundation model. This meant that in deployment we were able to use one small foundation model for multiple tasks. Adding new tasks now scales very well in terms of both compute and disk space.
LLM Parameters: Our Answer to the Data Bottleneck
From the start, we knew that our training data quality and quantity would significantly impact our model's performance. However, acquiring high-quality, labeled data was a massive hurdle. Neither did we want to invest in extensive data collection or annotation efforts before finalizing our modeling approach. The solution came in the form of large language models (LLMs).
LLMs allowed us to generate high-quality labels that closely mimicked human-like text. This capability enabled us to generate experimental datasets on-the-go, allowing us to commence the modeling process alongside designing our data requirements.
Leveraging our existing code snippet datasets and some thoughtful prompt design, we could quickly create effective training sets for each metadata generation task. This innovative approach dramatically reduced the resources we would have otherwise spent on data collection and preparation, effectively eliminating our data bottleneck.
Bringing It All Together: The Synergy of LoRA AI and Auto-Generated Labels
The combination of the LoRA training method and AI-generated labels provided us with the necessary tools to realize our product vision. AI-generated labels eliminated the need for large-scale data collection and cleaning, while LoRA AI enabled quick and efficient adaptation of a pre-trained transformer to multiple tasks. Furthermore, for deployment, we could launch a single foundation model and swap in the task-specific decomposition matrices as needed.
Ultimately, we successfully developed a model that operates locally on a user's machine, swiftly performs multiple tasks, and doesn't burden the app's disk space requirements. This was the groundwork that allowed us to quickly develop models that title, describe, and tag user’s code snippets on-device. In my next article, we’ll train a code description model using LoRA AI and GPT-generated descriptions.
Want to see our LoRA machine learning in action? Get started with Pieces today.