CUDA C++ unit testing in host and device code

Is there a unit testing framework (or a plugin to a framework) that supports testing of objects in host and device code? Or are there plans for such a framework?

For example I have a matrix class that has some methods like calculating the determinant or trace. It would be nice to write a test that is automatically run on cpu and in a cuda kernel.
In this example I want to write a test like this:

Matrix a;


assert_eq( theDeterminant, a.det() );

In this simple case a unit testing framework could easily setup code for a kernel to run the test.

Of course, I can write such a code myself, but my question is if there is already a standard way to do this.

There are two options I’ve used:

  1. Declare the device functions also as host functions (you can put both qualifiers on a single function). Then you can unit test your device functions on the host. While that verifies the logic at the C level using the host compiler, it doesn’t actually check the behavior of the function on the GPU, and definitely doesn’t work for device functions that contain CUDA specific calls (like __syncthreads() or math intrinsics).

  2. Write test harness with a global function that wraps around each device function. Then you can call the functions from the host (usually just one thread) and again, write your tests in host code.

I haven’t seen anything more streamlined, unfortunately.

  1. That is what I have done up to now. But I want to go a step further to ensure that I do not use c++ techniques that are (currently) not supported in device code (or even simpler: that I forgot to put a device somewhere).

  2. I started doing this, but it means
    a) having an extra test and additionally some setup work for the global function.
    b) writing an abstract test environment. That’s what I did not want to do…

What I do now is something between a) and b): I setup a test function that is executed on host and device and writes some output in a temporary object (along with a isEqual() function). This idea is borrowed from the CUDA branch of the Eigen library (

But I do not like this approach that much because it still means a lot of extra setup which is in principal not necessary.

I know it’s a few years later, but I was wondering this myself. Is there a test framework for CUDA?