Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Hugging Face integrates Groq, offering native high-speed inference for 10 major open weight models

By Brian Buntz | June 16, 2025

Groq, the AI accelerator company based in Mountain View, California, has announced that the open-source AI platform Hugging Face has integrated its Language Processing Unit (LPU) inference engine as a native provider on its platform, giving Hugging Face’s over 1 million developers access to inference speeds exceeding 800 tokens per second across ten open weight models, which requires just three lines of code to implement.

For years, Graphics Processing Units (GPUs), have dominated the space, driving advances like AlexNet, the Transformer architecture and Generative Adversarial Networks (GANs). GPUs excel at training models by processing massive batches of data in parallel. Google diversified the landscape with its Tensor Processing Units (TPUs), custom-built version of GPUs tailored for AI workloads. The Groq LPU (Language Processing Unit), however, is different by design. Instead of processing data in large batches, it is a new type of processor built specifically for the sequential nature of AI inference, generating text or other outputs token by token. This specialized, streamlined architecture is what allows it to avoid the “batching” latency of GPUs, resulting in dramatically faster real-time inference speeds.

This integration makes Groq’s high-speed inference directly accessible to developers using some of the industry’s most capable open-weight models, including:

  • meta-llama/Llama-3.3-70B-Instruct
  • google/gemma-2-9b-it
  • meta-llama/Llama-Guard-3-8B
  • meta-llama/Meta-Llama-3-70B-Instruct
  • meta-llama/Meta-Llama-3-8B-Instruct
  • deepseek-ai/DeepSeek-R1-Distill-Llama-70B
  • meta-llamaLlama-4-Scout-17B-16E-Instruct
  • meta-llama/Llama-4-Maverick-17B-128E-Instruct
  • Qwen/QwQ-32B
  • Qwen/Qwen3-32B

This Hugging Face integration marks Groq’s third major platform partnership in as many months. In April, Groq became the exclusive inference provider for Meta’s official Llama API, delivering speeds up to 625 tokens per second to enterprise customers. The following month, Bell Canada selected Groq as the sole provider for its sovereign AI network, a 500MW initiative across six sites beginning with a 7 MW facility in Kamloops, British Columbia, located in the Thompson-Nicola region of south-central British Columbia, approximately 350 kilometers northeast of Vancouver. With new data centers in Houston and Dallas pushing its global capacity past 20 million tokens per second, Groq has grown from 1.4 million to over 1.6 million developers since the Meta announcement.

Related Articles Read More >

Waters to combine with BD’s biosciences unit, creating a $40B life science and diagnostics heavyweight as financial pressures mount
SpaceX’s Starship explosions reveal the high-cost of ‘fail fast’ R&D
Robot administers record-length life-saving surgery
Pepperl+Fuchs launches industrial thin client that can power up to four 4K lab displays
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE