Dynamic Parallelism

lwan61c1t3 · June 19, 2014, 7:53pm

Hi,

I just moved from GeForce 580 to Tesla K20c, and wanted to take the advantage of the dynamic parallelism. However, when I tried to convert my test.cu file into test.ptx under Window 7 invironment, I saw the following error.

nvcc : warning : The 'compute_10' and 'sm_10' architectures are deprecated, and may be removed in a future release.
test.cu
test.cu(4): warning: variable "index" was declared but never referenced

test.cu(14): error: calling a __global__ function("mykernel_child") from a __global__ function("mykernel") is only allowed on the compute_35 architecture or above

1 error detected in the compilation of "C:/Users/.../Local/Temp/tmpxft_00000d0c_00000000-8_test.cpp1.ii".

I got CUDA ToolKit 6.0 installed and convert test.cu using

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\bin\nvcc" -ptx -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin" -o test.ptx test.cu

Is there anything wrong with my setting? Thanks in advance!

lwan61c1t3 · June 20, 2014, 12:12am

I carefully checked the machine. I found there are two GPU installed. One is Tesla K20c, the other one is ION. And the ION is responsible for video output. So, I guess the problem is that the compiler doesn’t know the existence of Tesla K20c.

njuffa · June 20, 2014, 12:18am

The compiler is not aware of which GPUs are in the system, and does not need to be. However, you need to tell the compiler which architecture(s) it should build for. K20 is sm_35, but your build log seems to indicate you are building for sm_10. I suggest adding -arch or -gencode switch as appropriate for the compiler to target the sm_35 or compute_35 platform (= compute capability 3.5).

lwan61c1t3 · June 21, 2014, 2:54am

Thanks, njuffa! ^_^

lwan61c1t3 · June 23, 2014, 3:51pm

Hi,

I met another problem that needs help. After I compiled the following code with “nvcc -ptx dynamic_parallelism.cu -gencode arch=compute_35,code=sm_35” in Linux, I found the generated .ptx file cannot be used in a Window machine.

__global__ void mykernel_child(){
        int index;
}

__global__ void mykernel(const int N, double *a, double *b, double *c){
        int index = threadIdx.x;
        if(index < N)
                c[index] = a[index] + b[index];

         mykernel_child <<< 1,1 >>> ();
}

I have a K20c installed on that machine and I used Matlab to call that ptx file. However, the Matlab keeps reporting

Error using parallel.gpu.CUDAKernel
An error occurred during PTX compilation of <image>.
The information log was:

The error log was:
ptxas : fatal error : Unresolved extern function 'cudaLaunchDevice
The CUDA error code was: CUDA_ERROR_NO_BINARY_FOR_GPU.

Error in Test_CUDA_Matlab_Interface (line 4)
k = parallel.gpu.CUDAKernel('dynamic_parallelism.ptx',
'dynamic_parallelism.cu', 'mykernel')

I guess that problem would be the ptx is not relocatable. So, I tried to compile that baby code in Window machine instead and it failed. The reported error message is below:

dynamic_parallelism.cu(14): error: kernel launch from __device__ or __global__ functions requires separate compilation mode

Many thanks in advance. Any suggestions would be greatly appreciated!

fgalian · June 24, 2014, 8:36am

Hi.

I use dinamic parallelism and CUDA 5 and 6 with 3.5 and 5 architecture.

To compile my code, I use:

nvcc -arch=sm_35 -rdc=true -lcudadevrt name.cu (with cuda 5 and 3.5 arch)
nvcc -arch compute_50 -rdc=true name.cu (with cuda 6 and 5 arch)

When I call a kernel from another kernel performance is very low.

Topic		Replies	Views
k20c Problem Dynamic Parallelism CUDA Programming and Performance	1	566	March 22, 2013
Trouble building with Dynamic Parallelism CUDA Setup and Installation	1	616	November 18, 2019
About dynamic parallelism of CUDA Fortran Legacy PGI Compilers	7	9203	December 2, 2016
dynamic parallelism with cuda driver api CUDA Programming and Performance	6	1859	January 7, 2015
calling a __global__ function() from a __global__ function CUDA Programming and Performance	9	9674	August 3, 2019
Dynamic Parallelism : code: 30, reason: unknown error from cudaMalloc and cudaMemcpy CUDA Programming and Performance	1	1750	July 14, 2015
Dynamic parallelism in PVF cannot compile Legacy PGI Compilers	2	6597	March 12, 2015
dynamic parallelism CUDA Programming and Performance	3	1103	December 30, 2012
Debugging Dynamic Parallelism and preemption mode CUDA Programming and Performance	8	5725	July 28, 2013
Calling a child kernel from a parent kernel doesn't work CUDA Setup and Installation	0	776	December 31, 2013

Dynamic Parallelism

Related topics