TotalView 8.9.2 with ReplayEngine 2.1 and MemoryScape 3.2.2. is a debugging and memory analysis product that simplifies development for data-intensive applications, with support for NVIDIA CUDA 4.0 and SDK 4.0. It supports developers who are sharing GPUs across multiple threads and faster multi-GPU programming with unified virtual addressing. It provides intuitive control of GPU device kernel threads and a straightforward graphic display of CUDA exceptions, with a clear representation of GPU device memory types for all CUDA variables. The product has the ability to simultaneously debug parallel applications using more than one GPU device per node across an entire cluster. It is designed for developer productivity, simplifying and shortening the process of developing, debugging and optimizing complex code. The product provides a combination of capabilities for pinpointing and fixing hard-to-reproduce bugs, memory leaks and performance issues related to parallel development.