Dynamic Parallelism on TX1

soni · March 10, 2016, 12:52pm

I have written a simple program incorporating dynamic parallelism and built it from command line using “nvcc -arch=sm_35 -rdc=true hello_world.cu -o hello -lcudadevrt” which gives nvlink error.What is the proper way to built it?

dusty_nv · March 11, 2016, 5:39am

Looks like you’re trying to build using SM_35 instead of SM_53. Try using SM_53 instead.

Similar example from CUDA toolkit, nvlink error happens:

ubuntu@tegra-ubuntu:~/7.0.48/NVIDIA_CUDA-7.0_Samples/0_Simple/cdpSimplePrint$ make TARGET_ARCH=armv7l 
/usr/local/cuda-7.0/bin/nvcc -ccbin g++ -I../../common/inc  -m32    -dc -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpSimplePrint.o -c cdpSimplePrint.cu 
/usr/local/cuda-7.0/bin/nvcc -ccbin g++   -m32      -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpSimplePrint cdpSimplePrint.o  -lcudadevrt 
nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpSimplePrint.o' 
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpSimplePrint.o' 
make: *** [cdpSimplePrint] Error 255

But then works on TX1 when building only SM_53:

ubuntu@tegra-ubuntu:~/7.0.48/NVIDIA_CUDA-7.0_Samples/0_Simple/cdpSimplePrint$ make TARGET_ARCH=armv7l SMS=53 
/usr/local/cuda-7.0/bin/nvcc -ccbin g++   -m32      -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpSimplePrint cdpSimplePrint.o  -lcudadevrt

ubuntu@tegra-ubuntu:~/7.0.48/NVIDIA_CUDA-7.0_Samples/0_Simple/cdpSimplePrint$ ./cdpSimplePrint 
starting Simple Print (CUDA Dynamic Parallelism) 
Running on GPU 0 (GM20B) 
*************************************************************************** 
The CPU launches 2 blocks of 2 threads each. On the device each thread will 
launch 2 blocks of 2 threads each. The GPU we will do that recursively 
until it reaches max_depth=2 
In total 2+8=10 blocks are launched!!! (8 from the GPU) 
*************************************************************************** 
Launching cdp_kernel() with CUDA Dynamic Parallelism: 
BLOCK 0 launched by the host 
BLOCK 1 launched by the host 
|  BLOCK 2 launched by thread 0 of block 0 
|  BLOCK 3 launched by thread 0 of block 1 
|  BLOCK 4 launched by thread 0 of block 0 
|  BLOCK 5 launched by thread 0 of block 1 
|  BLOCK 7 launched by thread 1 of block 0 
|  BLOCK 8 launched by thread 1 of block 1 
|  BLOCK 6 launched by thread 1 of block 0 
|  BLOCK 9 launched by thread 1 of block 1==========================

soni · March 15, 2016, 1:18pm

Trying to build my application using cmake

my cmakelists.txt:

#Include the folders containing OPENCV
include_directories(/usr/include/)
#set(CMAKE_CXX_FLAGS “-g -O3”)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; “-arch=sm_53; -rdc=true; -lcudadevrt” )

set(PROJECT_LINK_LIBS GL GLU X11 glut GLEW opencv_core opencv_imgproc opencv_video opencv_features2d opencv_calib3d opencv_objdetect opencv_flann opencv_stitching )
set(CUDA_VERBOSE_BUILD ON)
set(CUDA_SEPARABLE_COMPILATION ON)
set(CUDA_PROPAGATE_HOST_FLAGS OFF)
set(src kernel.cu
new.cpp
)

cuda_add_executable(out ${src} OPTIONS -gencode arch=compute_53,code=sm_53)
target_link_libraries(out ${PROJECT_LINK_LIBS} ${CUDA_LIBRARIES})

whem i am using this cmake file to build my application m facing a linking error

link.stub:(.text+0x11c): undefined reference to __fatbinwrap_54_tmpxft_00007a07_00000000_7_cuda_device_runtime_cpp1_ii_8b1a5d37' link.stub:(.text+0x120): undefined reference to __fatbinwrap_54_tmpxft_00007a07_00000000_7_cuda_device_runtime_cpp1_ii_8b1a5d37’
collect2: error: ld returned 1 exit status
make[2]: *** [out] Error 1
make[1]: *** [CMakeFiles/out.dir/all] Error 2
make: *** [all] Error 2

kindly help…

kayccc · April 28, 2016, 10:39am

Hi soni,

Sorry for the late reply.

The CMakeLists.txt posted in the forum isn’t complete and the generated nvcc commands are not posted.
It’s hard to tell without a complete repro case.

Thanks

Topic		Replies	Views
Cross compiling dynamic parallelism for jetson aarch64 cuda 10.2 nvlink error Jetson TX2 cuda , compile	5	1281	December 1, 2021
Compile cuda program with Dynamic Parallelism Jetson TX2	4	3710	October 18, 2021
How to compile the Dynamic Parallelism CUDA code by cmake ? CUDA Programming and Performance	0	1232	February 15, 2017
Nvlink seems not to link for cuda libraries if cross compiling and --cpu-arch=AARCH64 is specified General driveos-cuda	6	2254	November 9, 2021
dynamic parallelism CUDA Programming and Performance	4	471	July 26, 2019
nvlink errors using dynamic parallelism with CUDA 9.1 on Tesla V100/Ubuntu 18.04 CUDA Programming and Performance	4	753	December 18, 2019
Dynamic parallelism in Mex interfaced MATLAB CUDA Setup and Installation	8	2479	August 28, 2013
nvlink error when compiling CUDA code in linux Announcements	0	1404	February 15, 2019
Dynamic parallelism on Jetson TX1 isn't working properly Jetson TX1	0	557	June 20, 2016
Using Dynamic Parallelism in multiple VS2019 projects CUDA NVCC Compiler visual-studio	5	1429	October 11, 2021

Dynamic Parallelism on TX1

Related topics