I have written a simple program incorporating dynamic parallelism and built it from command line using “nvcc -arch=sm_35 -rdc=true hello_world.cu -o hello -lcudadevrt” which gives nvlink error.What is the proper way to built it?
Looks like you’re trying to build using SM_35 instead of SM_53. Try using SM_53 instead.
Similar example from CUDA toolkit, nvlink error happens:
ubuntu@tegra-ubuntu:~/7.0.48/NVIDIA_CUDA-7.0_Samples/0_Simple/cdpSimplePrint$ make TARGET_ARCH=armv7l
/usr/local/cuda-7.0/bin/nvcc -ccbin g++ -I../../common/inc -m32 -dc -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpSimplePrint.o -c cdpSimplePrint.cu
/usr/local/cuda-7.0/bin/nvcc -ccbin g++ -m32 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpSimplePrint cdpSimplePrint.o -lcudadevrt
nvlink error : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpSimplePrint.o'
nvlink error : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpSimplePrint.o'
make: *** [cdpSimplePrint] Error 255
But then works on TX1 when building only SM_53:
ubuntu@tegra-ubuntu:~/7.0.48/NVIDIA_CUDA-7.0_Samples/0_Simple/cdpSimplePrint$ make TARGET_ARCH=armv7l SMS=53
/usr/local/cuda-7.0/bin/nvcc -ccbin g++ -m32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpSimplePrint cdpSimplePrint.o -lcudadevrt
ubuntu@tegra-ubuntu:~/7.0.48/NVIDIA_CUDA-7.0_Samples/0_Simple/cdpSimplePrint$ ./cdpSimplePrint
starting Simple Print (CUDA Dynamic Parallelism)
Running on GPU 0 (GM20B)
***************************************************************************
The CPU launches 2 blocks of 2 threads each. On the device each thread will
launch 2 blocks of 2 threads each. The GPU we will do that recursively
until it reaches max_depth=2
In total 2+8=10 blocks are launched!!! (8 from the GPU)
***************************************************************************
Launching cdp_kernel() with CUDA Dynamic Parallelism:
BLOCK 0 launched by the host
BLOCK 1 launched by the host
| BLOCK 2 launched by thread 0 of block 0
| BLOCK 3 launched by thread 0 of block 1
| BLOCK 4 launched by thread 0 of block 0
| BLOCK 5 launched by thread 0 of block 1
| BLOCK 7 launched by thread 1 of block 0
| BLOCK 8 launched by thread 1 of block 1
| BLOCK 6 launched by thread 1 of block 0
| BLOCK 9 launched by thread 1 of block 1==========================
Trying to build my application using cmake
my cmakelists.txt:
#Include the folders containing OPENCV
include_directories(/usr/include/)
#set(CMAKE_CXX_FLAGS “-g -O3”)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; “-arch=sm_53; -rdc=true; -lcudadevrt” )
set(PROJECT_LINK_LIBS GL GLU X11 glut GLEW opencv_core opencv_imgproc opencv_video opencv_features2d opencv_calib3d opencv_objdetect opencv_flann opencv_stitching )
set(CUDA_VERBOSE_BUILD ON)
set(CUDA_SEPARABLE_COMPILATION ON)
set(CUDA_PROPAGATE_HOST_FLAGS OFF)
set(src kernel.cu
new.cpp
)
cuda_add_executable(out ${src} OPTIONS -gencode arch=compute_53,code=sm_53)
target_link_libraries(out ${PROJECT_LINK_LIBS} ${CUDA_LIBRARIES})
whem i am using this cmake file to build my application m facing a linking error
link.stub:(.text+0x11c): undefined reference to __fatbinwrap_54_tmpxft_00007a07_00000000_7_cuda_device_runtime_cpp1_ii_8b1a5d37' link.stub:(.text+0x120): undefined reference to
__fatbinwrap_54_tmpxft_00007a07_00000000_7_cuda_device_runtime_cpp1_ii_8b1a5d37’
collect2: error: ld returned 1 exit status
make[2]: *** [out] Error 1
make[1]: *** [CMakeFiles/out.dir/all] Error 2
make: *** [all] Error 2
kindly help…
Hi soni,
Sorry for the late reply.
The CMakeLists.txt posted in the forum isn’t complete and the generated nvcc commands are not posted.
It’s hard to tell without a complete repro case.
Thanks