I've just created c/Ollama!

catty@lemmy.world · edit-2 5 days ago

I've just created c/Ollama!

brucethemoose@lemmy.world · edit-2 5 days ago

TBH you should fold this into localllama? Or open source AI?

I have very mixed (mostly bad) feelings on ollama. In a nutshell, they’re kinda Twitter attention grabbers that give zero credit/contribution to the underlying framework (llama.cpp). And that’s just the tip of the iceberg, they’ve made lots of controversial moves, and it seems like they’re headed for commercial enshittification.

They’re… slimy.

They like to pretend they’re the only way to run local LLMs and blot out any other discussion, which is why I feel kinda bad about a dedicated ollama community.

It’s also a highly suboptimal way for most people to run LLMs, especially if you’re willing to tweak.

I would always recommend Kobold.cpp, tabbyAPI, ik_llama.cpp, Aphrodite, LM Studio, the llama.cpp server, sglang, the AMD lemonade server, any number of backends over them. Literally anything but ollama.

…TL;DR I don’t the the idea of focusing on ollama at the expense of other backends. Running LLMs locally should be the community, not ollama specifically.

Sims@lemmy.ml · 2 days ago

Thanks for Lemonade hint. For Ryzen AI: https://github.com/lemonade-sdk/lemonade (linux=cpu for now)

brucethemoose@lemmy.world · 1 day ago

You can still use the IGP, which might be faster in some cases.

southernbeaver@lemmy.world · 5 days ago

What would you recommend to hook to my home assistant?

brucethemoose@lemmy.world · edit-2 5 days ago

Totally depends on your hardware, and what you tend to ask it. What are you running? What do you use it for? Do you prefer speed over accuracy?

southernbeaver@lemmy.world · 3 days ago

My HomeAssistant is running on Unraid but I have an old NVIDIA Quadro P5000. I really want to run a vision model so that it can describe who is at my doorbell.

brucethemoose@lemmy.world · edit-2 2 days ago

Oh actually that’s a great card for LLM serving!

Use the llama.cpp server from source, it has better support for Pascal cards than anything else:

https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md

Gemma 3 is a hair too big (like 17-18GB), so I’d start with InternVL 14B Q5K XL: https://huggingface.co/unsloth/InternVL3-14B-Instruct-GGUF

Or Mixtral 24B IQ4_XS for more ‘text’ intelligence than vision: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

I’m a bit ‘behind’ on the vision model scene, so I can look around more if they don’t feel sufficient, or walk you through setting up the llama.cpp server. Basically it provides an endpoint which you can hit with the same API as ChatGPT.

Encrypt-Keeper@lemmy.world · edit-2 5 days ago

I’m going to go out on a limb and say they probably just want a comparable solution to Ollama.

brucethemoose@lemmy.world · edit-2 5 days ago

OK.

Then LM Studio. With Qwen3 30B IQ4_XS, low temperature MinP sampling.

That’s what I’m trying to say though, there is no one click solution, that’s kind of a lie. LLMs work a bajillion times better with just a little personal configuration. They are not magic boxes, they are specialized tools.

Random example: on a Mac? Grab an MLX distillation, it’ll be way faster and better.

Nvidia gaming PC? TabbyAPI with an exl3. Small GPU laptop? ik_llama.cpp APU? Lemonade. Raspberry Pi? That’s important to know!

What do you ask it to do? Set timers? Look at pictures? Cooking recipes? Search the web? Look at documents? Do you need stuff faster or accurate?

This is one reason why ollama is so suboptimal, with the other being just bad defaults (Q4_0 quants, 2048 context, no imatrix or anything outside GGUF, bad sampling last I checked, chat template errors, bugs with certain models, I can go on). A lot of people just try “ollama run” I guess, then assume local LLMs are bad when it doesn’t work right.

Jonathan@lemmy.world · 5 days ago

Cool! I’ll subscribe. I’ve got about a dozen projects I’d like to build with Ollama, if I’ll get the motivation and free time who knows?

catty@lemmy.world · edit-2 5 days ago

Start now! Install it, get a python environment up and running if you haven’t already, and get that first play-around project working which you work outwards from!

Jonathan@lemmy.world · 5 days ago

Already setup! I think the first thing I want to do is setup retrieval augmented generation. Several of my hobby ideas will require it I think. Started trying to read up on it a couple days ago and I had a serious lack of focus going on.

I’ve been kind of hoping to come across a super simple way to implement it, but haven’t exactly looked much yet.

catty@lemmy.world · 5 days ago

Sounds like a great first question! Go for it!

Otter@lemmy.ca · edit-2 5 days ago

There is also !localllama@sh.itjust.works :)

crossposting between the communities can help grow both