A majority of investors now consider an AI bubble the top financial risk of 2026. Michael Burry, the hedge fund investor who made hundreds of millions betting against the housing market in 2008, is actively shorting NVIDIA, comparing the company to Cisco at the center of the dot-com collapse. Late last year, NVIDIA circulated a seven-page memo to Wall Street analysts rebutting his claims. A web of circular financing deals, NVIDIA investing billions in OpenAI and Anthropic, who turn around and spend billions on NVIDIA chips, has drawn scrutiny from the FTC, economists, short sellers and even the companies involved. Sam Altman himself conceded in August 2025 that “investors as a whole are overexcited about AI.” And then, every quarter, NVIDIA beats expectations again. The bubble thesis doesn’t die: it just gets punted to the next earnings call. On Wednesday, it got punted again.
NVIDIA reported record fourth-quarter revenue of $68.1 billion for the quarter ended Jan. 25, 2026, up 73% year over year, beating analyst expectations of $65.6 billion. Guidance for Q1 came in at $78 billion, more than $5 billion above consensus, and the company noted it was assuming zero Data Center compute revenue from China. But the real signal for the research community came during the earnings call, when CEO Jensen Huang reframed the economics of scientific computing around a single metric: the cost of generating a token.
“The way that software is done in the future, using AI, is token driven,” Huang told analysts. NVIDIA is putting token economics front and center. The company says new SemiAnalysis InferenceX results show GB300 NVL72 systems (Blackwell Ultra) delivering up to 50x higher throughput per megawatt, translating to sharply lower cost per token versus the prior Hopper generation. That claim, if it holds across scientific workloads, would fundamentally alter the calculus for every research institution weighing AI infrastructure investments.
Rubin is NVIDIA’s next bet on the same metric: one-tenth the cost per million tokens versus Blackwell in projected reasoning workloads. NVIDIA says Rubin is in full production and expected to be available from partners in the second half of 2026.
The Vera Rubin platform: Six chips, one architecture
The Vera Rubin system represents NVIDIA’s most complex platform to date, comprising six co-designed chips built around a rack-scale architecture of 72 Rubin GPUs and 36 Vera CPUs. Each Rubin GPU features approximately 336 billion transistors and 288 GB of sixth-generation HBM4 memory delivering roughly 22 TB/s of bandwidth per chip — nearly three times what Blackwell offers.
At the rack level, a Vera Rubin NVL72 system delivers 3,600 PFLOPS of NVFP4 inference performance and 2,520 PFLOPS of NVFP4 training performance, with 20.7 TB of HBM4 capacity. NVIDIA claims the platform can train mixture-of-experts models with up to one-quarter the number of GPUs required by Blackwell.
But the less-discussed story, and the one most relevant to R&D, is what Huang revealed about the Vera CPU as a standalone component.
Vera CPU: A different kind of data center processor
When Wells Fargo analyst Aaron Rakers asked about Vera’s role as a standalone CPU in the company’s earnings call, Huang clarified that NVIDIA’s architectural decisions diverge from conventional data center processors. “We made fundamentally different architecture decisions about our CPUs compared to the rest of the world’s CPUs,” Huang said. “It is designed to be focused on very high data processing capabilities.”
The 88-core Vera CPU, built on custom Arm-compatible “Olympus” cores with LPDDR5X memory, is optimized not for general-purpose server workloads but for the data-intensive pipeline that defines modern AI research: data processing before training, pre-training, post-training, and the emerging class of tool-use and agentic workloads where AI models interact with external software.
Huang explicitly framed Vera’s design rationale through Amdahl’s Law, the principle that overall system speedup is limited by its slowest component. “When you accelerate the algorithms to the limit, as we have, Amdahl’s Law would suggest that you need really, really fast single-threaded CPUs,” he said, adding that Vera is “off the charts better” than its predecessor Grace in single-threaded performance.
For research organizations, this signals a potential shift in how computing infrastructure is architected. Rather than pairing high-end GPUs with commodity x86 CPUs, the standard playbook for most academic and national lab clusters, NVIDIA is arguing that a purpose-built CPU optimized for AI data pipelines can remove bottlenecks that limit the return on GPU investment.
The competitive context
NVIDIA’s dominance in research computing is not going unchallenged. AMD’s forthcoming Instinct MI400 offers 432 GB of HBM4, significantly more memory than Rubin’s 288 GB, targeting memory-constrained mixture-of-experts workloads. AMD has also secured a major commitment from Meta for its Helios rack-scale system, with shipments slated for the second half of 2026.
Google has begun rolling out its TPU v7 “Ironwood” family on Google Cloud, and Amazon’s Trainium3 UltraServers are expanding across AWS regions. For research institutions, the diversification of high-performance AI silicon means more leverage in procurement negotiations, even as NVIDIA maintains that its CUDA ecosystem, which Huang noted runs all 1.5 million AI models on Hugging Face, remains the decisive competitive moat.


