Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

How xAI turned a factory shell into an AI ‘Colossus’ to power Grok 3 and beyond

By Brian Buntz | February 18, 2025

In a repurposed Electrolux factory in Memphis, xAI trained Grok 3 on a supercomputer dubbed “Colossus,” which the company says includes 200,000 Nvidia H100 GPUs. Elon Musk and senior xAI employees described Grok 3 as “the smartest AI on Earth” in a recent livestream, citing each GPU’s capacity of up to 4 PFLOPS (four quadrillion floating-point operations per second) and high-speed interconnects across the cluster.

A custom liquid-cooling system replaces traditional fans, with Musk noting that xAI is using a design “nobody’s done at scale.” The company also claims daily improvements to Grok 3 and is already planning further upgrades: it intends to adopt Nvidia’s forthcoming H200 GPUs—projected to ship with 141 GB of HBM3e memory—and eventually transition to Blackwell GB200 chips, which could deliver 20 PFLOPS each. xAI says it built Colossus in significantly less time than most data center projects require, transforming the abandoned site into a major AI development facility.

[xAI]

How Colossus took shape

A Rapid-Build Supercomputer: In 122 days, xAI transformed an abandoned Electrolux factory in Memphis into “Colossus,” with 100,000 Nvidia H100 GPUs, then doubled it to 200,000 in another 92 days—a feat born of necessity after training Grok 2 on just 8,000 GPUs. Drawing an estimated 250 megawatts, buffered by Tesla MegaPacks to handle AI training’s power swings, Colossus will help xAI offer daily improvements to Grok 3.

Why Memphis? The abandoned Electrolux factory offered considerable warehouse space and an initial 15 MW of industrial power—later scaled to roughly 250 MW with Tesla MegaPacks. Rejecting new construction quotes of 18–24 months, Musk’s team leased generators and a quarter of the U.S.’s mobile cooling capacity to kickstart operations fast.

Overcoming Hardware Debugging: Assembling 200,000 interconnected GPUs in under a year brought chaos: mismatched BIOS firmware, network cable snarls, and cosmic-ray bit flips sparking random errors. “It’s a battle against entropy,” xAI’s Jimmy Ba quipped, warning a single transistor flip could throw a 100,000-GPU training run off kilter. Musk recalled debugging a BIOS mismatch at 4:20 a.m., while the team unplugged cables to test stability, rerouting as needed.

Core hardware and topology

xAI’s “Colossus” runs on 200,000 Nvidia H100 GPUs synced by high-speed interconnects. Igor Babuschkin calls it “the biggest, fully connected H100 cluster of its kind.” Each H100 packs 80 GB of HBM2e memory at 2 TB/s bandwidth and nearly 4 PFLOPS of FP8 performance with sparsity. xAI rolled out a custom liquid-cooling system after renting a quarter of the U.S.’s mobile cooling units. Musk teases a next-gen evolution with Nvidia Blackwell GB200 GPUs—20 PFLOPS each, 8 TB/s bandwidth.

Synchronous training and “continuous daily improvement”

xAI’s “continuous daily improvement” for Grok 3 relies on online fine-tuning and reinforcement learning for incremental updates. The cluster churns out synthetic data via smaller models, feeding Grok 3 for focused training bursts. This demands orchestration for rapid partial re-runs across the array, amid cosmic-ray bit flips and power swings that Jimmy Ba calls a “battle against entropy.”

A big launch, but perhaps a narrow edge

Early tests from AI researchers show mixed results. Andrej Karpathy praised Grok 3’s ability to parse complex training papers but noted it struggled with certain “tricky” logic puzzles—comparing its performance roughly to “o1-pro” from OpenAI. While an earlier version (nicknamed “chocolate”) achieved an Arena score of 1402, Grok 3 itself only slightly outperforms Google’s Gemini 2.0 models and OpenAI’s ChatGPT-4o. Ethan Mollick called it a “very solid frontier model” but added that “OpenAI maintains a strong advantage” in enterprise partnerships. Gary Marcus deemed the launch “no game changer.” xAI’s internal benchmarks show Grok 3 narrowly leading competitors in math, science, and coding tasks by a few percentage points—noticeable but not overwhelming. Independent, peer-reviewed evaluations are still pending.

Related Articles Read More >

AI Agents in the Lab
How AI agents are reshaping R&D 
U.S. reportedly will rework GPU export controls amid industry pushback
Musk tests AI-powered government layoffs under Trump’s DOGE agenda
Berkeley debuts $5,000 open-source humanoid built with desktop 3D printers
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE