Inside the Runtime

From memory and compute pipelines to context management and assistant workflows, we design the full AI stack, and heren share our progress to drive the future of local intelligence together.

Inside the Runtime

Follow new releases, engineering breakthroughs, and examples of Local AI in action — all built to run closer to where your product lives.

On-Device Model Architecture: Where GPT-OSS Fits in the Edge AI Landscape

Edge devices have limited memory footprints, making Mixture of Experts (MoE) models with active parameter selection the optimal solution for deploying sophisticated AI reasoning locally. Active vs....

Boosting Local Inference with Speculative Decoding

In our recent posts, we’ve explored how CPUs deliver impressive results for local LLM inference, even rivaling GPUs, especially when LLMs push on hardware's memory bandwidth limits. These bandwidth...

Ready to Get Started?

OpenInfer is now available! Sign up today to gain access and experience these performance gains for yourself.