Hugging Face integrates Groq, offering native high-speed inference for 9 major open weight models

Groq, the AI accelerator company based in Mountain View, California, has announced that the open-source AI platform Hugging Face has integrated its Language Processing Unit (LPU) inference engine as a native provider on its platform, giving Hugging Face’s over 1 million developers access to inference speeds exceeding 800 tokens per second across ten open weight models, which requires just three lines of code to implement.

For years, Graphics Processing Units (GPUs), have dominated the space, driving advances like AlexNet, the Transformer architecture and Generative Adversarial Networks (GANs). GPUs excel at training models by processing massive batches of data in parallel. Google diversified the landscape with its Tensor Processing Units (TPUs), custom-built version of GPUs tailored for AI workloads. The Groq LPU (Language Processing Unit), however, is different by design. Instead of processing data in large batches, it is a new type of processor built specifically for the sequential nature of AI inference, generating text or other outputs token by token. This specialized, streamlined architecture is what allows it to avoid the “batching” latency of GPUs, resulting in dramatically faster real-time inference speeds.

This integration makes Groq’s high-speed inference directly accessible to developers using some of the industry’s most capable open-weight models, including:

meta-llama/Llama-3.3-70B-Instruct
google/gemma-2-9b-it
meta-llama/Llama-Guard-3-8B
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-8B-Instruct
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
meta-llamaLlama-4-Scout-17B-16E-Instruct
meta-llama/Llama-4-Maverick-17B-128E-Instruct
Qwen/QwQ-32B
Qwen/Qwen3-32B

This Hugging Face integration marks Groq’s third major platform partnership in as many months. In April, Groq became the exclusive inference provider for Meta’s official Llama API, delivering speeds up to 625 tokens per second to enterprise customers. The following month, Bell Canada selected Groq as the sole provider for its sovereign AI network, a 500MW initiative across six sites beginning with a 7 MW facility in Kamloops, British Columbia, located in the Thompson-Nicola region of south-central British Columbia, approximately 350 kilometers northeast of Vancouver. With new data centers in Houston and Dallas pushing its global capacity past 20 million tokens per second, Groq has grown from 1.4 million to over 1.6 million developers since the Meta announcement.

Related Articles Read More >

NSF’s $1.5 billion X-Labs initiative promises flexibility, but leaves key questions unanswered

Musk says the cheapest place to put AI is space. SpaceX’s IPO filing isn’t so sure.

JetX automates LC sample preparation with integrated extraction, filtration and dilution

NASA’s Hubble detects first-ever spin reversal of comet

Search R&D World