Key Highlights
- Nebius (NBIS) reached an agreement to purchase inference optimization specialist Eigen AI through a combined payment of approximately $643 million in cash and Class A shares.
- The acquisition brings Eigen AI’s optimization capabilities into the Nebius Token Factory, which serves as a managed inference solution for enterprise artificial intelligence applications.
- MIT’s HAN Lab alumni who founded Eigen AI will create Nebius’s inaugural Bay Area engineering center.
- Collaborative model optimization efforts between the companies have achieved top-tier performance rankings on Artificial Analysis benchmark tests.
- Shares of NBIS climbed 8.51% following the announcement, reaching $150.00, reversing a 6.07% drop from the previous week.
On May 1, 2026, Nebius (NBIS) revealed its plan to purchase Eigen AI through a transaction valued at roughly $643 million. The acquisition will combine cash and Nebius Class A shares, calculated using the company’s 30-day volume-weighted average price at the time of signing. Markets responded positively, pushing NBIS up 8.51% to $150.00.
The deal should finalize in the coming weeks, subject to regulatory approval and customary completion requirements.
Eigen AI specializes in inference processing and model optimization. The company enables artificial intelligence teams to deploy open-source models with enhanced speed and reduced costs in live environments, eliminating the need for custom-built optimization infrastructure.
The integration strategy calls for embedding Eigen AI’s technology within the Token Factory platform. Token Factory delivers autoscaling API endpoints and fine-tuning workflows for prominent open-source models such as Llama, DeepSeek, Qwen, Gemma, and additional frameworks.
Collaboration between the organizations predates this acquisition. The teams previously partnered to create optimized model versions that achieved leading positions on Artificial Analysis, a prominent AI performance benchmarking service.
Eigen AI’s Technical Foundation
Eigen AI emerged from research conducted at MIT’s HAN Lab. The founding team includes Ryan Hanrui Wang and Wei-Chen Wang, whose contributions represent breakthrough techniques in production AI systems.
Ryan developed Sparse Attention methodology (SpAtten), which stands as the most-referenced HPCA publication from 2020 onward. Wei-Chen created Activation-aware Weight Quantization (AWQ), honored with the MLSys 2024 Best Paper Award and currently serves as the industry-standard method for 4-bit model deployment.
Di Jin, another co-founder, earned his doctorate from MIT CSAIL and participated in development of Meta’s Llama 3 and Llama 4 post-training processes. His research portfolio includes the CGPO framework for reinforcement learning from human feedback.
After completing the transaction, this team will establish operations in the San Francisco Bay Area, marking Nebius’s first engineering and research facility in the United States.
Position Within the Inference Landscape
Inference workloads currently represent the most rapidly expanding segment of AI computing infrastructure. Projections indicate inference will account for approximately two-thirds of aggregate AI computational requirements throughout 2026.
Effective inference execution demands sophisticated technical capabilities. The process encompasses model representation formats, GPU kernel optimization, and dynamic workload distribution — competencies that remain outside the skillset of most development teams.
Open-source model distributions compound these challenges, as they typically arrive without performance optimization. Emerging architectural patterns including Mixture-of-Experts and Compressed Sparse Attention create additional complexity around memory utilization and computational efficiency that demand specialized expertise.
Eigen AI delivers comprehensive optimization spanning post-training refinement, fine-tuning procedures, and production inference across all leading open-source model families. The company’s kernel-level and model-level techniques extract enhanced performance from current hardware configurations while minimizing additional engineering requirements.

