Links to CUDA development tools

Please refer to this page for a reasonably comprehensive list of development tools, libraries, plugins for GPU computing using CUDA-enabled GPUs:

If we missed something, please post it on this thread.

I think it would be useful to link to MisterAnderson42’s GPUWorker class for spawning host threads to more easily manage multi-GPU programs. Unfortunately, it seems to be only referenced in the forum, and inside the source code for HOOMD.

Perhaps he could be persuaded to make a homepage for GPUWorker, and then you could link to that.

Komrade: a pretty neat C++ library for CUDA with a very silly name

There is this debugging tool from the University of Oxford for viewing and comparing the contents of host and device memory.…uting/memviewer

[quote name=‘worc1154’ post=‘548755’ date=‘Jun 4 2009, 08:19 AM’]

There is this debugging tool from the University of Oxford for viewing and comparing the contents of host and device memory.

I couldn’t get the link provided to work.

Here is an updated url for the MemViewer tool:…uting/memviewer

Ocelot is an alternative to deviceemu. It executes CUDA programs one instruction at a time as they would be on a GPU with a very large warp size.

It has built in memory checking functionality that will detect if you use a host pointer in device code or write to memory that has not been allocated.

AgPerfMon is a tool mostly for PhysX and graphics programmers, but it also reveals some low level CUDA kernel scheduling. It records timestamps, SM, and Warp IDs of running kernels and shows them on a timeline.

A plugin for Eclipse for CUDA and/or QT development/compilation:…udaqt/index.php

There are a few more that you should add:

  • Full support for .Net (full CUDA driver API access and more) (C# and Visual Basic Examples)

  • Full support for Perl (full CUDA driver API access and more–see below)

  • Full support for Python (full CUDA driver API access and more–see below)

  • Full access for Ruby to run CUDA via the CUDA driver API

  • Full access for Lua to run CUDA via the CUDA driver API

  • Source code for all Kappa library language bindings and keywords are available using the Kappa library installers.

Performance is usually comparable to C++ since this is a high-level interface–most CUDA API operations such as memory management and transfer and other CUDA API operations are performed by the Kappa C++ library. (Performance can be better than any single CUDA C/C++ SDK example since all CUDA best practices, memory mapping plus concurrent kernel execution are the default if supported by the GPU hardware.) Full multi-GPU and CUDA JIT is available for all language bindings.

Since the Kappa library uses a producer/consumer data flow scheduler, defaults to asynchronous CUDA kernel launches, and supports asynchronous CPU kernel and SQL operations, it can achieve full occupancy of CPU and GPU. The CUDA kernel launches are such that, on GF100 GPUs, concurrent kernel execution is automatic and the usual mode. This assumes that the GPU has occupancy available for that mixture of kernels. Whether CUDA kernels can execute concurrently becomes a (potentially nondeterministic) result of the dynamics of execution of host and GPU code that should always meet or exceed performance otherwise available.

For .Net, you can create .Net subclass instances to tie to the Kappa IO keyword and to receive exception notifications. These subclasses execute on the host thread associated to the GPU context so that the full CUDA API is accessible for that GPU context.

For the Perl and Python mentioned above, developers can use a mixture of CUDA C++ running on the GPU, and C++ (including OpenMP), Perl, or Python running on the host as a single integrated processing task.

Additional language bindings (non-tested–no examples) are available for invoking CUDA via the Kappa library from: Java, R, PHP, Octave/Matlab, TCL, allegrocl, chicken, guile, mzscheme, ocaml, and pike.

The Kappa library is commercial but the .Net, Perl, Python, Lua, Ruby, etc modules/packages, examples, and keyword source code are available under the MIT License.

CUVI Lib v0.3 (Beta version) is a new library from TunaCode. You can download a copy from:

CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP

In the current release of CUVI Lib you will find:

  • Optical Flow (Horn & Shunck)
  • Optical Flow (Lucas & Kanade)
  • Discrete Wavelet Transform (Forward and Inverse)
  • Hough Transform
  • Hough Lines (Lines Detector)
  • Color Conversion (RGB-to-gray and RGBA-to-Gray)

Several more advanced features will be added to CUVI Lib in upcoming releases. A detailed function reference can be downloaded from:

We are looking forward to hearing your feedback and guidance on our forums ( and look forward to make CUVI Lib a single complete source of computer vision and image processing functions implemented on the GPU.

How does the binding work on it?

Links to the CUDA 32-bit and 64-bit toolkits do not work: result is a nearly blank page with File Not Found message.

Links to the CUDA 32-bit and 64-bit toolkits do not work: result is a nearly blank page with File Not Found message.

An open source project, SGC Ruby CUDA, is made available at and the Ruby standard Gems repository.

It provides accesses to CUDA API in a Ruby program.

CUDA Eclipse plugin:
Yellow Dog Linux, tailored for CUDA development: