On-Device LLMs are LIVE! 🔥 Pieces for Developers 2.7.0, Pieces OS 6.2.0 & Pieces for VS Code 1.4.0

Pieces Team

We're thrilled to announce that the much-anticipated On-Device LLMs are now live in the Pieces for Developers Suite! With the ability to seamlessly switch between different LLMs, whether they're on your device or in the cloud, you now have more power and flexibility at your fingertips. On-Device LLM support is also available in our VS Code and Obsidian plugins, and coming soon to JetBrains and JupyterLab.

We support some of the most well-known LLMs, including OpenAI’s GPT and Meta’s Llama 2, so that you can leverage the strengths of these models directly in your development workflow. This means you can choose the LLM that best fits your coding style and project requirements, which enhances the Pieces Copilot's ability to provide relevant, context-aware code suggestions from your code snippets.

Let's explore this exciting update in detail!

Biggest Benefits of On-Device LLMs:

Offline Support: On-Device LLMs allow you to use the Pieces Copilot even when you're offline. You can continue coding without interruption, regardless of your internet connection.
Data Privacy: By processing code suggestions locally, your code never leaves your machine. This provides an additional layer of privacy and security.
Reduced Latency: On-Device LLMs eliminate the need for network calls to get code suggestions, reducing latency and making your coding experience more seamless.
Enhanced Power: With On-Device LLMs, you can leverage the power of your local machine for faster, more efficient code generation. This means quicker responses from the Copilot and a smoother coding experience.

On-Device LLMs in the Pieces Desktop App

When chatting with the Pieces Copilot in our Desktop App, you can select either On-Device or Cloud LLMs. To do so:

Navigate to the Copilot & Global Search view
Select the LLM model logo (OpenAI or Meta) in the bottom left corner of the input window
Select your Cloud or On-Device LLM

Currently, you can choose either the 7B CPU or 7B GPU Llama 2 model to be installed on your device. Please note: these models are large, and installation time will depend on your download speed. You can stop the download at any time by hitting the Stop button.

Once you select a model, you can set any local files, websites, or saved snippets as context and then interact with the Pieces Copilot.

Our Cloud LLM options include GPT 3.5 Turbo, GPT 3.5 Turbo 16k, and GPT 4.

Support for more LLMs is coming soon, including:

PaLM 2
CodeLlama
TinyLlama

When selecting an on-device LLM, consider the following guidelines:

Use the GPU model only if your device has a powerful, dedicated GPU. Running the GPU model without a dedicated GPU may cause crashes or poor performance.
If your device has limited RAM or an older CPU, we recommend using a cloud LLM instead of an on-device model. The cloud LLMs have lower system requirements.
For machines with Apple Silicon or high-end GPUs (like NVIDIA RTX series), the GPU models will provide the best performance.
The CPU models require less RAM and more CPU power, making them a good choice for most modern laptops and desktops.

If you are concerned about large consumption of memory by the local LLM, don’t worry. We provide ways to unload this memory consumption.

The user can unload this any time from the desktop app
(or) the memory will unload automatically after a few minutes on inactivity

On-Device LLMs in the Pieces for VS Code Extension

Just like in our Desktop App, you can now select the Copilot Runtime in Pieces for VS Code. This is a significant enhancement for developers who prefer to work within their VS Code IDE instead of utilizing the Pieces Desktop App.

To choose your Copilot Runtime:

Click on the button labeled ‘Copilot Runtime’ on the bottom left of the view.. This action will prompt the Copilot Runtime dialog box to appear.
Within this dialog box, you'll find options to select between 'On Device' (which encompasses all Local LLMs) and 'Cloud'.
To select a runtime, navigate to the dropdown menu for the desired runtime and make your selection. If you wish to utilize Local LLMs, you'll first need to download the specific LLM locally. Look for a download icon to initiate this process. (e (we have some recommendations on which model to select below).
Make sure your system has the right hardware to satisfy the requirements.
The model will be downloaded as a 4GB file, so it may take some time depending on your internet connection.
Once the model has been downloaded, there should be a plug icon, click on the plug icon to select the model.
A confirmation alert will appear once the runtime is successfully selected. You are now set up and ready to start chatting with the Pieces Copilot, contextualized by your Code repository in VS Code and your Pieces snippets, 100% offline!

Note: The dialog box also has suggestions on what runtime might be ideal based on the device.

Additionally, you can set the context for the Pieces Copilot. Whether it's a specific file, code snippet, or entire project, you can select context to receive optimized responses from the Copilot, even offline.

On-Device LLMs in the Pieces for Obsidian Plugin

Data privacy is absolutely critical to the Obsidian community. We understand that you typically do not want your vault contents to co-mingle with large companies’ training data sets. Now that On-Device LLMs are available in the Pieces Copilot, Obsidian users can interact with a Copilot contextualized by your vault without your notes leaving your device. This is all possible due to Pieces’ commitment to developing on-device, air-gapped AI features.

To access On-Device LLMs in Obsidian:

Click on the P logo that should now be on your left-hand sidebar.
Now that the Pieces view is open, click on the Pieces Copilot Logo to open our Copilot view
Click on the button labeled ‘Copilot Runtime’ on the bottom left of the view
Click on the Llama2 box
Download which model you would like to use (we have some recommendations on which model to select below). Make sure your system has the right hardware to satisfy the requirements.
The model will be downloaded as a 4GB file, so it may take some time depending on your internet connection.
Once the model has been downloaded, there should be a plug icon, click on the plug icon to select the model. You are now set up and ready to start chatting with the Pieces Copilot, contextualized by your notes, 100% offline!

Pro tip: If you would like to use some files or folders as context within the chat, open the context selector by clicking on the settings icon in the chat, and choose which files you want to use as context for your conversation!

Join our Discord Server 🎉

Do you love Pieces? Stop sending us carrier pigeons 🐦 and join our Discord Server to chat with our team, other power users, get support, and more. 🤝

Support

As always, if you run into issues or have feedback, please fill out this quick form or email us at support@pieces.app and we’ll be in touch as soon as possible!

Interested in becoming a Pieces Content Partner?

Learn More