cross-posted from: https://sh.itjust.works/post/61139432
I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this
What the actual fuck is this timeline that we are living in?
I kinda loved his “you should self host to decentralize from big tech” and “run graphene and Linux to avoid data collection” content, but idk what the local ai stuff is any good for
It’s good for the same things machine learning has always been good for. Language synthesis and analysis. Selfhosting something like Paperless for document management. It actually has a very rudimentary learning engine for document classification for a long time but feeding document content to a local AI model for organization tagging is very useful.
If you use AI for a lot of small things, then you can offload the tasks to a locally run server.
Or if you see it as a feature you plan on using for a long time and don’t want to have to keep paying big tech for the privilege of using AI, and hell, you already have a nice graphics card, it’s perfect.
How many GPUs do you even need to have a usable, self-hosted AI? It looks like he has 6 on his rig. Probably each costs 2k or something. That’s not peanuts. I have a 12GB VRAM card. It probably can’t generate anything in any meaningful amount of time. Which brings me to the question: who is this for?
Regardless, impressive what he vibe-coded there.
I think in one video it looked like 16 cards. I think he did multiple bifurcations of the pcie lanes. I think he is / was using it for protein folding as well.
That’s definitely not my level of disposable wealth/income. I can barely afford one card.
My MacBook Air with 24GB of unified RAM is enough to run something simple and useful.
I can tell you from personal experience, 8GB is not enough for a snappy experience. Maybe if you had it setup to churn through data overnight. My RTX 3060 Ti was not happy.
I have a rx5600xt (6gb), 32gb ram, ryzen 3600. System hasn’t been updated since i built it during covid. QwenV3-vl35B is the heftiest thing I can run, it gets around 2 tokens/sec, in LM studio. It’s easier than most people seem to think.
How do you now run out of RAM? Does it offload to system RAM?
Yes, offloads into system. Oh and i forgot to mention that’s with the context set around 25k. That can vary greatly per model though, it’s taken some experimentation to figure that out.
Thank you. That’s good to know.
First one-click RCE is in: https://www.reddit.com/r/LocalLLaMA/comments/1ttls1y/just_found_a_1click_rce_in_pewdiepies_odysseus/ … smh …
So 1) that sucks but 2) why the fuck would you ever run this exposed to the internet.
He’s done the main quest. Now he’s doing the side quests.

One more harness, bro.
I love that guy. Remember hating him back in the days when he got popular by sitting and yelling while playing games. But damn the guy matured and put out epic content the past 10 years or so.
Honestly now that hes mostly retired his content is so chill now
Agree
Man, this is Ouroboros feedback loop.








