LLM Inference Optimization

ElastixAI Emerges From Stealth to Redefine Generative AI Economics via FPGA-Based Supercomputers

ElastixAI Inc. today emerged from stealth to tackle the systemic inefficiencies and high costs of generative AI (GenAI) inference. Founded by former Apple and Meta machine learning (ML) researchers, ...

TMCnet

Inception Launches Mercury 2, the Fastest Reasoning LLM - 5x Faster Than Leading Speed-Optimized LLMs, with Dramatically Lower Inference Cost

Inception, the company behind the first commercial diffusion large language models (dLLMs), today announced the launch of Mercury 2, the fastest reasoning LLM and first reasoning dLLM. Mercury 2 ...

Manifold-Constrained Hyper-Connections: The Architectural Breakthrough That Might Redefine LLM Training

If mHC scales the way early benchmarks suggest, it could reshape how we think about model capacity, compute budgets and the ...

VentureBeat

New LLM optimization technique slashes memory costs up to 75%

Researchers at the Tokyo-based startup Sakana AI have developed a new technique that enables language models to use memory more efficiently, helping enterprises cut the costs of building applications ...

InfoWorld

Multi-token prediction technique triples LLM inference speed without auxiliary draft models

With reported 3x speed gains and limited degradation in output quality, the method targets one of the biggest pain points in production AI systems: latency at scale.

Semiconductor Engineering

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

A new technical paper titled “Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference” was published by researchers at University of Cambridge, Imperial College London ...

Automat-it Launches LLM Selection Optimizer to Slash Startup LLM Costs by up to 60%

AWS Premier Tier Partner leverages its AI Services Competency and expertise to help founders cut LLM costs using ...

Business Wire

MangoBoost Launches Mango LLMBoost™: AI Inference Optimization Software with Up to 12.6x Relative Performance Improvement and 92% Cost Savings

BELLEVUE, Wash.--(BUSINESS WIRE)--MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, is announcing the launch of Mango LLMBoost™, system ...

Search Engine Land

LLM optimization in 2026: Tracking, visibility, and what’s next for AI discovery

Marketing, technology, and business leaders today are asking an important question: how do you optimize for large language models (LLMs) like ChatGPT, Gemini, and Claude? LLM optimization is taking ...

12d

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

New deployment data from four inference providers shows where the savings actually come from — and what teams should evaluate ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results