cross-posted from: https://sh.itjust.works/post/61139432
I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this
cross-posted from: https://sh.itjust.works/post/61139432
I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this
How many GPUs do you even need to have a usable, self-hosted AI? It looks like he has 6 on his rig. Probably each costs 2k or something. That’s not peanuts. I have a 12GB VRAM card. It probably can’t generate anything in any meaningful amount of time. Which brings me to the question: who is this for?
Regardless, impressive what he vibe-coded there.
I think in one video it looked like 16 cards. I think he did multiple bifurcations of the pcie lanes. I think he is / was using it for protein folding as well.
That’s definitely not my level of disposable wealth/income. I can barely afford one card.
My MacBook Air with 24GB of unified RAM is enough to run something simple and useful.
I have a rx5600xt (6gb), 32gb ram, ryzen 3600. System hasn’t been updated since i built it during covid. QwenV3-vl35B is the heftiest thing I can run, it gets around 2 tokens/sec, in LM studio. It’s easier than most people seem to think.
How do you now run out of RAM? Does it offload to system RAM?
Yes, offloads into system. Oh and i forgot to mention that’s with the context set around 25k. That can vary greatly per model though, it’s taken some experimentation to figure that out.
Thank you. That’s good to know.
I can tell you from personal experience, 8GB is not enough for a snappy experience. Maybe if you had it setup to churn through data overnight. My RTX 3060 Ti was not happy.