NVIDIA CUDA 6.5 brings GPU-accelerated computing to 64-bit ARM platforms. The toolkit provides programmers with a platform to develop advanced scientific, engineering, mobile and HPC applications on GPU-accelerated ARM and x86 CPU-based systems. Features include support for Microsoft Visual Studio 2013, cuFFT callbacks capability and improved debugging for CUDA FORTRAN applications. Application Replay mode enables faster analysis of complex scenarios using multiple hardware counters, and the CUDA Occupancy Calculator API frees programmer from having to manually configure kernel launches for each GPU architecture. The “nvprune” utility prunes object files and libraries to only contain device code needed for the specified target architectures, reducing application size and improving load-time performance.