ANTHROPIC · Claude models leading reasoning benchmarksOPENAI · GPT-5 scores 96.3% on MMLU benchmarkGOOGLE · Gemini Ultra outperforms on coding tasksMETA · Llama open weights released under permissive licenseMISTRAL · Mixture-of-experts model drops inference costARXIV · Speculative decoding cuts inference latency by 3.2×ANTHROPIC · Claude models leading reasoning benchmarksOPENAI · GPT-5 scores 96.3% on MMLU benchmarkGOOGLE · Gemini Ultra outperforms on coding tasksMETA · Llama open weights released under permissive licenseMISTRAL · Mixture-of-experts model drops inference costARXIV · Speculative decoding cuts inference latency by 3.2×