llama.cpp is one of those projects that makes you appreciate open source — a dependency-free framework for running quantized SLMs locally on your laptop.
I'm bullish on where local models are heading. Teacher/Student training is showing real promise, and while tokens-per-second still lags behind cloud APIs, we're building on the shoulders of open source contributors who came before us. That's how the internet was built too.
To make experimenting easier, I put together a lightweight shell wrapper that connects local SLMs to any CLI tool that supports local LLMs. Tested it with Opencode — works well, and runs faster than Ollama on Docker.
https://github.com/hamr0/co...