How to try open-source LLMs locally?

Running open-source LLMs locally has benefits like data privacy and cost savings, but it requires some technical know-how. Here's a breakdown to get you started:

Before you begin:

  • Hardware considerations: Since general-purpose LLMs are resource-intensive,  your computer should have enough RAM (ideally 16GB or more) and a powerful graphics card (GPU) for optimal performance. My experiments are with i9 16GB laptop with NVIDIA 4070, with Ubuntu 22.04 LTS as OS.

Approaches to try LLMs locally:

  1. All-in-one Desktop Solutions: These are ideal for beginners. Tools like GPT4All offer user-friendly interfaces that mimic popular LLM services like ChatGPT. You simply download and launch the application, and it handles the technical setup for you.

  2. Command Line Interface (CLI) & Backend Servers: This approach offers more flexibility but requires some technical knowledge.  There are many LLM frameworks like GPT4All, LM Studio, Jan, llama.cpp, llamafile, Ollama, and NextChat, that do the background work (downloading weights. initializing model and running inference with simple command). Tools like Ollama alows you to run LLMs with simple commands or LM Studio allows you to run LLMs through code commands. You'll need to install Python libraries and configure a local server to interact with the models.

Additional Tips:

  • Start with smaller models: Begin with less complex models that require fewer resources. As you get comfortable, you can experiment with larger ones.
  • Explore the LLM landscape: Hugging Face is a popular platform for exploring open-source LLMs https://huggingface.co/. They offer various models and resources for getting started.

My experiments with running LLMs locally:
I decided to try ollama as it has a super simple setup and allows you to try several open-source LLMs like  Llama 3, Mistral, and Gemma. 

Steps:

1. Install ollama on linux

curl -fsSL https://ollama.com/install.sh | sh


2. Once sucesfully installed, run the model of your choice as follows:

$ ollama run llama2

When you run the command for the first time, it takes a while as the model is downloaded. Once successful, you should get a LLM prompt 
>>>  

You may get help with >>> /?  or else you can start prompting. I tried prompting it by asking more about llama2 model.




Next I decided to try newly released code generator model by Mistral called Codestral

$ ollama run codestral

It tooks a while for a model to download, but soon you get prompt where you can ask it to write a code for you.

>>> write me a streamlit app that can scrape the top trending google searches and show it in nicely formatted list   



The code generated by codestral may not always correct and you may need some additional prompting /minor code corrections to get the correct working code 






Comments