calling a __global__ function() from a __global__ function


I am trying to call a global kernel from another global kernel but getting an error as below:

Error calling a global function(“xyz”) from a global function(“run10times”) is only allowed on the compute_35 architecture or above

My machine config:
Windows 10 64 bit
Processor Intel Xeon
Visual Studio 2015

Question 1:Please confirm whether I would be able to able to implement “Dynamic Parallelism” on the above machine or not?

Question 2: Please provide a sample which I can run and test, whether I am doing the “Dynamic Parallelism” correctly or not.


CUDA 8.0

Are you giving “-arch=compute_61” (or similar) flags to nvcc?
It looks like you are not, in which case the default is to compile for compute capability 2.0 which does not support dynamic parallelism.

There are a couple of dynamic parallelism examples in the advanced section of the samples that come with CUDA.

Please tell how would I set the setting in Visual Studio 2015 for “-arch=compute_61”?

Please tell how would I set the setting in Visual Studio 2015 for “-arch=compute_61”?

please use google search or look at any of the cuda sample projects for such basic questions.

I had the exact same problem and found a good tutorial that worked for me here:

How can the architecture be set in linux where we are using command line for compilation.
I am using nvcc -o outputFile command and I am getting the same error.

The answer is already in this thread. Take a look at comment #3. It’s also documented in the nvcc manual.

Thank you @Robert_Crovella
I didn’t know the exact command.
After googling a bit I found this one : nvcc -arch=sm_35 -rdc=true -o simple1 -lcudadevrt