New OpenCL driver and Visual Profiler released

Yesterday, we released a bunch of new stuff for registered developers to use.

The big news is our OpenCL performance profiler for the GPU. We also released updated OpenCL drivers (now packaged with all the other drivers instead of being inside the SDK) and several new SDK code samples to help developers using OpenCL.

The OpenCL Visual Profiler will be included in the next release of the CUDA Toolkit.

The OpenCL Visual Profiler uses the extensive performance instrumentation in NVIDIA’s OpenCL drivers and hardware performance signals designed into NVIDIA GPUs to provide developers with insight into performance bottlenecks and opportunities for optimization. Key features include:

    Profiling of actual hardware signals, kernel efficiency, and instruction issue rate

    Timing of memory copies between system memory and device memory

    Customizable graphs to help developers focus in on problem areas

    Basic auto-analysis to reveal warp serialization problems

    Easy import/export to CSV for custom analysis

Support for multi-GPU performance scaling has been added to most of the SDK code samples for OpenCL, including:






We also added a few DirectCompute samples, if you’re interested in that sort of thing.

The drivers and SDK code samples in this release are compatible with with the publicly available CUDA Toolkit 2.3 which is available at

Finally, we also released our OpenCL Best Practices Guide, designed to help developers using OpenCL on the CUDA architecture implement high performance parallel algorithms and understand best practices for GPU Computing. Chapters on the following topics and more are included in the guide:

    Heterogeneous Computing with OpenCL

    Performance Metrics

    Memory Optimizations

    NDRange Optimizations

    Instruction Optimizations

    Control Flow

    Performance Optimization Strategies

The OpenCL Best Practices Guide will also be included in the next release of the CUDA Toolkit, but you can get a copy now at