Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Special Chips for Specific Purposes: A look at acceleration technologies for scientific computing

By R&D Editors | May 31, 2007

Special Chips for Specific Purposes: A look at acceleration technologies for scientific computing

Rediscovering the advantages of the hybrid approach

The analogy between hybrid cars and hybrid computing is almost perfect: Combine two kinds of engines and use each in the situations for which they perform best. The result is energy savings…or, higher performance for the same energy cost. For both cars and computers, energy efficiency is becoming the primary issue. For hybrid

 
click to enlarge 

Figure 1: AMBER is an application that studies the dynamic behavior of macromolecules where generalized born (GB) 1, 2, and 6 are some of the methods within the application that account for 95 to 97 percent of the calculation time. This chart shows the speed-up comparison of running GB 1, 2, and 6 on the host system versus use of the ClearSpeed Advance X620 accelerator, a coprocessor developed for scientific computing.
 

computing, the two kinds of engines are a general-purpose processor (like those made by Intel and AMD) and a coprocessor specifically crafted for certain workloads. The workloads might be network traffic handling, graphics display or high performance, high-accuracy scientific calculations.

The idea of adding special-purpose processors to accelerate general-purpose hosts is causing considerable interest and excitement in scientific computing. The key is to understand what each accelerator technology can and cannot do. Properly applied, they may get us to petascale scientific computing much sooner than can massive collections of unaccelerated general-purpose servers.

Visible and invisible coprocessors

Computers have long used coprocessors in ways that we do not think about, because they function automatically in dedicated ways. Think of a disk drive controller, or the processor that controls a printer, or even the one that manages the keyboard. Graphics cards handle the task of turning geometric primitives into screen images. They have more processing power (and more transistors) than the largest mainframes of the 1970s, yet we do not think about them as coprocessors. These are invisible coprocessors.

Accelerators for scientific computing are visible coprocessors, at least for now. If you use a field-programmable gate array (FPGA) or a general-purpose graphics processing unit (GPGPU) to handle part of a technical computing job, you will be using a programming environment to specify what it does and how it coordinates with its host processor. As software environments mature, scientific coprocessors may become more like the invisible kind by intercepting dynamically linked library calls to math-intensive functions and just accelerating them. This approach is already working well for some technical applications.Since coprocessors for scientific applications are still visible to the application programmer, the issue is to fit the right hardware to the job. All accelerators are good…for their intended purpose.

Power-efficient design

What is it about scientific computing that we can exploit to increase speed and save power? Sometimes we see studies at the machine instruction level that show business applications and scientific applications looking very much alike in their requirements, but that is a fallacy. The apparent similarity is merely the result of forcing both applications to run on identical hardware. In addition, the significant differences in how each application stresses the memory are difficult to glean from a simple survey of the op codes. Scientific algorithms are profoundly different from the kind of algorithms used for, say, compiling C++ code or maintaining databases.

Scientific computing tends to have memory access patterns that are static and known at compile time. In solving equations or computing forces or iterating to converge a result, the memory access pattern usually does not depend on the actual values very much. That means we can minimize the control logic that takes up a very large fraction of the processing hardware. That saves power and space. Techniques like “scout threads” and “speculative execution” have been powerful ways to keep the old serial-thread programs improving for a few more years, but all that guessing uses a lot of wattage, especially when the guesses turn out to be wrong.

Another thing we can save on is automatic caching. Scientific computing certainly needs local memory that is faster than main memory, but we can manage it explicitly since we know what has to go

 
Figure 2: The ClearSpeed Advance e620 accelerator 

where. Automatic caching brings in an entire cache line when an array access pattern might repeatedly need only one word in the cache line, so it causes more harm than good to scientific code performance. Dragging all those bytes to and from the main memory wastes electricity, even if the architecture fully supports the wide transfers.

It really becomes apparent how different general-purpose computing is from scientific computing when we consider multi-core architectures. For general computing, we need designs that share memory with guaranteed cache coherency, and that allow each core to run instruction streams independently of the others. In contrast, for scientific computing, you can usually distribute the data structures in a planned way so that the hardware can be much simpler and more power-effective. Furthermore, single-instruction, multiple data (SIMD) control works very well for the kind of array constructs common in scientific programs, at least up to a few dozen processors. Having to manage separate instruction streams adds to the programming burden of the application programmer, consumes a great deal of space on the chip, and burns more electric power.

Specialized graphics processors for scientific jobs

Graphics processors (GPUs) are even more restricted in the types of workloads they process and, hence, can specialize even more than scientific coprocessors. For instance, the floating-point operations they do require only single precision and can get away with the crudest form of rounding (truncation toward zero) to save space on the chip. The streaming nature of graphics operations also allows for a design where data does not stay around for very long, so GPUs need little memory and not much sophistication in how to access it.

Just as it would be a mistake to try to use a scientific coprocessor to run standalone Linux or to process real-time transactions, it would be a mistake to use coprocessors designed for graphics as engines for physical simulations and serious engineering tasks. Some graphics coprocessor vendors are talking about having it both ways, adding more control and precision to their GPUs so that they can play in both the graphics and scientific markets (GPGPUs). But it’s the usual story: doing so adds cost and power consumption for features that their main customer base doesn’t need and certainly doesn’t want to pay extra to support.

Summary

The trend is toward special chips for specific purposes. We used to take this approach to save transistors. As transistors became less expensive, we instead combined all functions into a single, general-purpose commodity processor. Today, the limit on scientific computing capability is coming from a very different direction: power consumption and heat dissipation. Thus, we are rediscovering the advantages of the hybrid approach where systems contain optional coprocessors that let us make the best possible use of every kilowatt-hour on our power bill.

Coprocessors for scientific computing are of the visible type now. However, as they evolve, we will take them for granted as accelerators that run automatically in the background, speeding fundamental kernel operations just as graphics processors speed fundamental video processing tasks today.

John Gustafson is Chief Technology Officer for HPC at ClearSpeed Technology. He may be reached at [email protected].

Related Articles Read More >

QED-C outlines road map for merging quantum and AI
Quantum computing hardware advance slashes superinductor capacitance >60%, cutting substrate loss
Hold your exaflops! Why comparing AI clusters to supercomputers is bananas
Why IBM predicts quantum advantage within two years
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE