How to Run Private AI Models Locally & Offline: The Complete Guide to Ollama

you are just minutes away from running an entire large language model on your computer that is absolutely not connected to the internet at all which is big news for some major reasons here’s the thing when using large language models you know OpenAI Claude Deepseek there’s quite a few right now but the number one thing is all of your data is being sent to them when possible I prefer to keep my data private most of the time companies will use your data to either a train their models or b they’ll sell your data to different companies for a profit and here’s the thing about your data the country hosting the large language model creates their own data rights and regulations.

for example the United States where I’m from probably holds more data than anybody else in the world but the US has different laws and regulations around how user data can be used and handled compared to many other countries some countries can do literally whatever they want with your data so overall I try to keep my st uff as private as possible no matter if the application is hosted in the US or somewhere else the second point is that API endpoints to these large language models have a cost attached now it’s not like a super high cost but it does cost something which stinks when trying to to learn a new technology so to eliminate both of these items at the end you’ll have a fully dedicated large language model running on your machine.

Step 1: Downloading and Installing Ollama on Your Machine

with over a decade of experience and I’ve helped over 100,000 developers learn and grow within their craft the very first thing that we need to do to be able to run models locally on our machine is go to ola.com now,ama.com is a technology get up and running with large language models fast if you come over here and you click download it’ll allow you to download for Mac OS Linux Windows it’s all completely free so feel free to go a head and download this now the models you get to pick from are all the same models that you hear about all the time we have DeepSeek R1 we have Llama 3.3 we have different versions of uh Llama Mistral Gwyn a whole bunch of different types.

Understanding System Requirements: RAM and Model Parameters

i’m not even sure how you really say that jimma maybe Jimma but um a whole bunch of different kinds now one thing to note about these models is if you go to the Olama Lama GitHub page and you scroll all the way not all the way to the bottom but like right here to our model library we can see that the model has something called parameters and then the size so one thing if you come down here it says you should have at least 8 GB of available RAM available to run a 7B model 16 GB of RAM to run a 13B model and 32 GB of RAM to run a 33 billion model so 32 GB of RAM runs a 33B model and we can see like the Deep Seek up here that takes 404 like that is a ton of RAM and that’s why you hear all everybody talking about like these super machines to be able to ru n these huge models because you really really do need them like this is enormous .

however what my machine can run and probably your machine is we can run things that are probably less than 7B like I have a 16 GB computer so anything under 7B I can run pretty efficiently so I’m going to be running um Llama 2 which is a older version of Llama it doesn’t take much at all to run that about 3 to 4 GB of RAM and then I’m also going to be running uh Jimma 2 which we can see that the command is right here now to get this application running after you install you can run automatically or you might need to go ahead and run it by default.

once you install it’s when you install it it goes through all the normal um installation process that everything else goes through so make sure you run it once you have it running you’ll know because like right here I just started up Llama so you can see it right here if you’re in like a Windows machine you’ll see it in your desk view at the bottom whatever that t hing’s called on Windows i’m a Mac user but you’ll be able to see it right here .

Step 2: Launching Your First Local AI Model via Terminal

so once you have this all installed I’m going to open up my terminal and you can check to see if you have it installed by just typing in O Lama now if you type in O Lama and you see all of this stuff that means you’re good to go you have it installed and you have it running however you don’t yet have a model running right to be able to run a model it takes a little bit different work so to run a model I’m going to start with just llama 2 is you need to say o llama run and then whatever you’re installing so I’m going to be using Llama 2 now I already have Llama 2 installed.

so if I click that it’s going to like set it up instantiate it if you don’t already have it installed on your machine it’s going to download it right now and then get it running so I’m going to let this think for a little bit and then I’ll come right back all right so now that I’m back we can see it’s going to say send a messa ge so I can say hello llama and when you say that it’s going to give us a response completely not on the internet that’s the biggest thing this is running completely locally grin hello there human nuzzles nose are you here to give me some treats or just to say hello okay me saying hello llama actually made it think it’s a llama all right not what I was expecting but this is definitely a model so to cancel out we can just say /y and now we’re completely out.

Managing Multiple Models and Testing Offline Performance

if we want to say o llama list it’ll tell you all the different models you have so I installed llama 2 uh 2 days ago it’s 3.8 GB and what we can do here is install another type of model so we can use we can have multiple models on here so I’m going to install the Gemma 2 so the Olama run Jimma 2-2B this is the smaller model we can see since I don’t have it installed on my machine we’re pulling the manifest we’re installing it it’s 1.6 GB to download.

so it’s going to take a little bit at my internet speed of 40 megabytes per second i think my internet’s typically higher than that but it seems like a little slow but it’s almost there so once this downloads we’ll be able to now use that AI model locally without any of the internet connected and after we install this i’ll disconnect the internet so we can see exactly what’s happening.

i’m going to come up here and just disconnect my internet completely so I do not have internet and if I say hello it’s going to respond immediately hi how can I help you today if I say slashby and I want to go back to uh llama llama run llama 2 it doesn’t need to install anything cuz I already have the model on my machine,

i can say hello there and it’s going to respond back to me hello there how are you today so that is how we can work with multiple models and this is really cool so I have two um llama list is what I need this is really cool for my AI testing or my AI agent or whatever I use and that’s just because of API endpoints costing money and not sending all my data to a public company when I can just have everything running completely local here.

Step 3: Using Ollama as a Local Server for App Development

now another cool thing about this is if we come over here and we say llama serve now I must already have something running on here but if you do llama serve you can see the exact location that llama is running on so why that’s super important though that’s super important because that’s how you can connect to these local models when you are running applications so if you’re running a application that’s like an AI agent and you want it to talk to your O Lama and inside of here you’re running you know llama 2 all you have to say is connect to this endpoint and it’s going to start using that local model for development so that’s really awesome stuff.

Housekeeping: How to List and Remove Models

another thing we’re going to want to do is if we say llama list to see our two things and we say lama to get our things pop up we can see this remove.

so if I want to remove this right here we can say llama remove and then pass that in and now we just got rid of that llama list perfect now but have you ever read the book Llama Llama Repama i have children so I read Llama Llama Red Pajama all the time and I can’t help but think of that book when I’m dealing with O Lama.

The Future of Private and Local AI

anyway this is how you run local models completely on your machine without using the internet and uh this is really awesome stuff so I do think we’re going to be going into a future in the next couple years where we’re going to be running very very large models locally as computers continue to grow so until next time.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top