Running Local LLMs with Ollama
When I first started working with LLMs, my concern was the amount of data being sent up to a big company who would invest the data to make more recommendations. A big concern I had was also related to people send up PII or sensitive information without realizing that every request is saved and could potentially be used in a response to another user.
After digging through some articles and watching many YouTube videos, I found a pretty cool solution that is free and open source software (FOSS). Ollama.
Ollama is an application that runs in the background on your computer that loads LLMs that you can interact with. The best thing is that all of your requests stay on your computer.
By default, Ollama runs in a Terminal prompt. If you are more accustom to using OpenAI’s ChatGPT interface You can install the Open WebUI application in a Docker container. The interface is nearly identical to ChatGPT.
There are so many other features available in Ollama that I’m not covering in this post. I recommend heading over to their GitHub repo to check them out. Notably, you can make API requests to the Ollama web server on your computer and creating a modelfile that modifies the prompt and responses to fit a particular need.
What Models are Available?
There are tons of LLMs that you can download and run locally using a simple command in the Terminal app.
Popular LLMs on Ollama website:
- Llama2
- Codellama
- Gemma
- Codegemma
- Mistral
- Llava - a particularly cool model that can describe images
- Others
Installing Ollama
You can download Ollama from the Ollama website. If you are running Windows 10 or better, you’re in luck. They just released a Preview. There are installation options for Linux, Mac, and Windows.
Go to https://ollama.com/download and click the box for operating system you are using. On a Mac, you’ll open the disk image and double click the installer.
Downloading a Model
The model you use depends on a combination of your computer’s RAM, GPU, and, CPU. Make sure you’ve got some breathing room on your hard drive as most of the 7B LLMs are between 2-5GB. If you have a capable computer and can run the huge models, they can be in excess of 20GB each.
The GitHub page for Ollama makes the following statement about RAM requirements:
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
-ollama/ollama: Get up and running with Llama 2, Mistral, Gemma, and other large language models.
I’m running several of the LLMs above on an M1 MacBook Pro with 8 GB of RAM and have had flawless responses when running 7B parameter or smaller LLMs—I believe this is a combination of RAM and GPU (from the M1 chip).
If you notice that the response is coming out a few words at a time and taking a long time to generate, you’re probably hitting the limits of your computer’s ability to use the LLM. I’d recommend switching to a smaller model or exercising your patience.
Running ollama run llama2
in terminal will download the llama2 model if it’s not already on your computer and start the chat session.
Chatting With a Model
Once the model has been downloaded (using the command above), it will initiate the LLM and will be ready for your first request.
If you are starting a new session, open the Terminal application and run ollama run llama2
to start interacting with the LLM.
You can create line breaks in your request by using CTRL + ENTER
, or you can use three double-quotes at the start and end of your prompt (see below).
1
2
3
>>> ### Hello,
... world!
... ###
To exit back to the terminal prompt, you can either hit CTRL + D
or type the /bye
command.
Removing a Model
It couldn’t be simpler to remove a model after you’ve tested it out or want to free up space on your hard drive.
In terminal, run ollama rm llama2
Conclusion
If you’ve been interested in using LLMs and are concerned about where the data you submit to AI systems goes and how it’s used. I’d recommend running a local LLM using the Ollama application on your computer. I would definitely recommend running the Open WebUI application alongside Ollama to get the ChatGPT-like interface without the security concerns.