Dynamic parallelism problem

I have had strange problem with DP. Currently following snipped fails:

// If __forceinline__ is added, problem goes away
__device__
void initThread( int *s) {
    // This fails in memcheck in debug build with message
    // "Invalid __global__ write of size 4"
    //  Address is >= 0x01000000
    *s = threadIdx.x;
}

__global__
void simulation() {
    __shared__ int pmseed[CHILD_THREADS];
    // This is ok
    pmseed[threadIdx.x] =  threadIdx.x;
    initThread(&pmseed[threadIdx.x]);
}

Kernel seems to work. But when run using memcheck assigment “*s = …” creates error.
This kernel is invoked from another kernel.
Problem seems to happen only in debug builds, probably release builds inline function.

Any ideas what could be wrong?

Minimal nsight project reproducing is in http://gowrite.net/error.zip
My environment is opensuse 12.2 (64bit), GTX titan, Cuda SDK 5.5-22.

My first thought looking at it is you are passing around pointers to shared memory. If you change the shared memory declaration to global does the problem still manifest?

The calling kernel is the most basic one:

__global__
void simpleTree()
{
    simulation<<<1, CHILD_THREADS>>>();
}

So there should not be anything wrong with shared access.

BTW, if I invoke simulation() directly from program, memcheck seems to be happy.

  • Lauri

Hi lpaatero, could you update with the version of cuda-memcheck and the driver you are using ? The cuda-memcheck version can be read out with cuda-memcheck --version. The driver version is the output of cat /proc/driver/nvidia/version. Could you also retry with the latest CUDA driver available ?

Versions are:
CUDA-MEMCHECK version 5.5 (25)

NVRM version: NVIDIA UNIX x86_64 Kernel Module 319.37 Wed Jul 3 17:08:50 PDT 2013
GCC version: gcc version 4.7.1 20120723 [gcc-4_7-branch revision 189773] (SUSE Linux)

As far as I know, these are most recent versions.

  • Lauri

The latest available driver for the GTX Titan for Linux 64 bit is : 319.60 (http://www.geforce.com/drivers/results/67567). Could you try using this driver ? If the problem persists, could you file a bug through the registered developer site with a self contained application that reproduces the issue ?