NVIDIA CUDA 4.0 Toolkit
NVIDIA CUDA 4.0 Toolkit was designed for developing parallel applications using NVIDIA GPUs. Its main features include NVIDIA GPUDirect 2.0 Technology, which offers support for peer-to-peer communication among GPUs within a single server or workstation, enabling easier and faster multi-GPU programming and application performance. Unified Virtual Addressing (UVA) provides a single merged-memory address space for the main system memory and the GPU memories, enabling quicker and easier parallel programming. Thrust C++ Template Performance Primitives Libraries provide a collection of open source C++ parallel algorithms and data structures that ease programming for C++ developers. With Thrust, routines such as parallel sorting are 5X to 100X faster than with Standard Template Library (STL) and Threading Building Blocks (TBB).