I doubt it, but honesty, many systems can do inference pretty well, like how I ran the MLX version of Qwen 3 4b with a DuckDuckGo search RAG, and used it to ask quick questions and verify some simple things, running on a MacBook Air m2 16gb, and barely made a dent in the RAM utilisation or SoC, and this also goes for my much less powerful machines, like even a galaxy a20, with 3gb of memory and a low spec octacore exynos, can run small models really well, although the quantisation needs to be a bit strict.
I’d argue it’s inevitable for the simple reason that the whole AI as a service business model is a catch 22. Current frontier models aren’t profitable, and all the current service providers live off VC funding. And if models become cheap enough to be profitable, then they’re cheap enough to run locally too. And there’s little reason to expect that models aren’t going to continue being optimized going forward, so we are going to hit an inflection point where local becomes the dominant paradigm.
We’ve seen the pendulum swing between mainframe and personal computer many times before. I expect this will be no different.
Actually, I agree. And so far, small local models are really solid, and can punch above its weight even when compared to frontier models.
I believe what I meant when I said I doubted it was since these AI corpos seemingly give no indication that local is an option, so most people would think they can only access an LLM through the web. This would bolster the SaaS ecosystem dominating over local AI, although local will keep increasingly growing as a more favourable option.
Although I do agree that the industry will shift from being server based to PC based inference as well, I don’t see that shift being large enough to make these companies change their training paradigms to include telemetry from local AI, but I’m sure some will.
I doubt it, but honesty, many systems can do inference pretty well, like how I ran the MLX version of Qwen 3 4b with a DuckDuckGo search RAG, and used it to ask quick questions and verify some simple things, running on a MacBook Air m2 16gb, and barely made a dent in the RAM utilisation or SoC, and this also goes for my much less powerful machines, like even a galaxy a20, with 3gb of memory and a low spec octacore exynos, can run small models really well, although the quantisation needs to be a bit strict.
I’d argue it’s inevitable for the simple reason that the whole AI as a service business model is a catch 22. Current frontier models aren’t profitable, and all the current service providers live off VC funding. And if models become cheap enough to be profitable, then they’re cheap enough to run locally too. And there’s little reason to expect that models aren’t going to continue being optimized going forward, so we are going to hit an inflection point where local becomes the dominant paradigm.
We’ve seen the pendulum swing between mainframe and personal computer many times before. I expect this will be no different.
Actually, I agree. And so far, small local models are really solid, and can punch above its weight even when compared to frontier models.
I believe what I meant when I said I doubted it was since these AI corpos seemingly give no indication that local is an option, so most people would think they can only access an LLM through the web. This would bolster the SaaS ecosystem dominating over local AI, although local will keep increasingly growing as a more favourable option.
Although I do agree that the industry will shift from being server based to PC based inference as well, I don’t see that shift being large enough to make these companies change their training paradigms to include telemetry from local AI, but I’m sure some will.
Oh yeah corps will absolutely do that. We can kinda see the same thing happening with everything moving to streaming services too.