Startup Subquadratic Claims Sparse-Attention LLM Cuts Costs and Boosts Speed
Executive Briefing
- Announces SubQ, a sparse-attention LLM purportedly 56 times faster than FlashAttention-based models in speed benchmarks
- Scores 89.7% on LiveCodeBench, placing it alongside top coding models from OpenAI, Google DeepMind, and Anthropic
- Claims dramatic cost reduction, citing $8 versus $2,600 to run the same large-dataset retrieval test against Anthropic's Opus 4.6
- Offers a 12-million-token context window, roughly 12 times larger than most current frontier models
- Third-party evaluator Appen validates core architectural claims, though SubQ remains unavailable for broad public testing
- Founders assert transformers could become obsolete within years if sparse-attention approaches gain wider adoption
Sponsored