Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Meta’s Sapiens vision models bring 3D analysis of humans to the “wild”

By Brian Buntz | August 27, 2024

Sapiens demo

[Meta]

“Facebook’s parent Meta AI has unveiled Sapiens, a family of high-performance vision models designed to excel in ‘in-the-wild’ environments, overcoming the limitations of traditional models often confined to controlled studio settings. The family of models focuses on ‘four fundamental human-centric vision tasks,’ as the arXiv paper on the tech noted. Those include 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Meta notes that the models offer both adaptability and robust performance. Meta engineers highlight that they are “extremely easy to adapt for individual tasks by simply fine-tuning models pre-trained on over 300 million in-the-wild human images.”

Tapping the transformer architecture

Overview of number of humans per image in the Humans-300M dataset

Meta emphasizes their ability to generalize to real-world scenarios, stating that the models have strong performance with real-world data and function well when data is “scarce or entirely synthetic.” To accomplish that feat, the models use a combination of a large-scale, curated training dataset and a scalable architecture based on vision transformers. Interest in transformers has exploded since about 2018 — especially in natural language processing tasks but also in models such as Google DeepMind’s AlphaFold 2 for protein structure prediction. Computer vision applications are hot, too.

Before Sapiens, Meta had gathered significant experience with transformer architectures, having developed models like Data-efficient Image Transformers (DeiT) in 2021 and DETR (DEtection TRansformer), an object detection framework. In Sapiens, transformers’ attention mechanisms allow the various models to weigh the importance of different parts of the input image and dynamically focus on the most relevant features. Such capabilities allow the models to accurately infer human pose, segmentation, depth, and surface normals across various scenarios, from simple poses to complex interactions in cluttered environments.

Sapiens models also use multi-headed self-attention to process high-resolution images, allowing them to discern subtle variations in human anatomy.

Native support for 1k inference

Sapiens models “natively support 1K high-resolution inference,” and their performance improves as parameters are scaled. As the paper notes, “model performance across tasks improves as we scale the number of parameters from 0.3 to 2 billion.” The results are impressive, with Meta reporting that “Sapiens consistently surpasses existing baselines across various human-centric benchmarks. We achieve significant improvements over the prior state-of-the-art on Humans-5K (pose) by 7.6 mAP, Humans-2K (part-seg) by 17.1 mIoU, Hi4D (depth) by 22.4% relative RMSE, and THuman2 (normal) by 53.5% relative angular error.”

Sapiens models are fine-tuned for four human tasks: 2D pose estimation, body-part segmentation, depth prediction, and normal prediction.

Sapiens models are designed for four human tasks: 2D pose estimation, body-part segmentation, depth prediction, and normal prediction. [From Meta’s Arxiv paper]

The arXiv paper notes that the goal of Sapiens is to offer a unified framework and models to “unlock a wide range of human-centric applications for everybody.” The core focus is on 3D human digitization, which “remains a pivotal goal in computer vision.” In the long run, Meta envisions the model family could be a tool for acquiring large-scale, real-world supervision with human-in-the-loop to develop future generations of human vision models.

Potential applications are diverse

Sapiens could have an array of potential uses. In the entertainment industry, the high-fidelity pose estimation and body-part segmentation could facilitate motion capture for films and video games, enabling more realistic CGI-based character animations. Additionally, the detailed facial keypoint detection (243 points) could enhance facial expression analysis for applications in human-computer interaction or emotion recognition systems. In augmented and virtual reality, Sapiens’ depth estimation and surface normal prediction could improve the integration of virtual objects into real environments.

Outside of entertainment, the models’ ability to generalize to in-the-wild scenarios points to potential applications in surveillance and security, such as crowd behavior analysis or anomaly detection in public spaces. Sapiens could also potentially fund use in Advanced Driver Assistance Systems where it provides improved ability to avoid pedestrian collisions. Finally, in healthcare, the precise body pose and depth estimation capabilities could be useful for gait analysis, physical therapy monitoring, or ergonomics assessments.

Over the course of the year, Meta has announced a string of new AI models even as its Reality Labs division slims.

The Sapien models are available to download for free on GitHub.

Related Articles Read More >

Open-source Boltz-2 can speed binding-affinity predictions 1,000-fold
New Gemini 2.5 Pro model achieves top-tier science and coding performance while costing 1/8th the price of OpenAI’s o3
Berkeley Lab’s Dell and NVIDIA-powered ‘Doudna’ supercomputer to enable real-time data access for 11,000 researchers
Scientific lab
Google Cloud, Dexcom and Recursion see AI agents shifting from demo to practical lab applications
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE