Accelerating Lidar for Robotics with NVIDIA CUDA-based PCL

Originally published at: Accelerating Lidar for Robotics with NVIDIA CUDA-based PCL | NVIDIA Developer Blog

Many Jetson users choose lidars as their major sensors for localization and perception in autonomous solutions. Lidars describe the spatial environment around the vehicle as a collection of three-dimensional points known as a point cloud. Point clouds sample the surface of the surrounding objects in long range and high precision, which are well-suited for use…

Hi, jwitsoe!
I am trying to use CUDA-PCL on Jetson TX2. But I have encountered a CUDA failure problem which seems to be hard for me to deal with. It would be great if you can give some suggestions.

Environment: Jetson TX2 with Jetpack 4.5 (Ubuntu 18.04, CUDA-10.2, PCL 1.8.1)
Problem: When I run the built demo in each subfolder, it turn out to be a CUDA failure. One output example is given below as I run the demo in cuda-pcl/cuda-segmentation:

nvidia@nvidia-tx2:~/Downloads/cuda-pcl/cuda-segmentation$ ./demo sample.pcd

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7850MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)


Cuda failure: no kernel image is available for execution on the device at line 310 in file cudaSegmentation.cpp error status: 209
Aborted (core dumped)

I have managed several attempts to solve it.
First, I downgrade to Jetpack 4.4.1 which is the same as the official test environment. But it did not work.
Next, I followed solutions to other similar problem. Specifically, I manually add 62 (which corresponds to the compute capability 6.2 of Jetson TX2) to the SMS variable in makefile. Still, nothing changed.
Since the source code is not there, I can’t do more with it.

I don’t know much about CUDA programing, but I guess the .so file is not compiled with sms=62 so it can’t be executed on Jetson TX2. I would be appreciated if you could fix it for us TX2 users.

Hi triokun,
You are right that the error below means there is no kernel for CURRENT device.
This is because CUDA-PCL was not compiled for SM62.
Cuda failure: no kernel image is available for execution on the device at line 310 in file cudaSegmentation.cpp error status: 209
Aborted (core dumped)

1 Like

Hi, leif!
Thanks for your answering.
I’m wondering if you can recompile the library for TX2 if you have the source code. It would help me a lot.

This is lib for TX2, but it has not been tested because there is no TX2 on local side.

1 Like

I’m grateful for your help. I have tested it on TX2 and it worked perfectly!
Would you mind recompiling the other two lib (libcudafilter.so and libcudaicp.so) for TX2?
Again, thank you so much!

Please check the two libs.

Hi @leif ,

Try building the CUDA-ICP example and got a usr/bin/ld: ./lib/libcudaicp.so: error adding symbols: file in wrong format error.

Environment: GTX 1050 (Ubuntu 20.04, CUDA-10.2, PCL 1.10.1)

USE Default CUDA DIR: /usr/local/cuda
TARGET_ARCH: x86_64
CUDA_VERSION: 11000
SMS: 30 35 50 53 60 61 70 72 
g++ -D_REENTRANT -std=c++11 -std=c++14 -O2 -o demo obj/main.o  -L/usr/lib -L/usr/local/lib -L/usr/local/cuda/lib64 -lcudart_static -lrt -ldl -lpthread -lcudart -L/lib64 -lcudnn -lpthread -L/usr/lib/aarch64-linux-gnu/ -lboost_system -lpcl_common -lpcl_io -lpcl_recognition -lpcl_features -lpcl_sample_consensus -lpcl_octree -lpcl_search -lpcl_filters -lpcl_kdtree -lpcl_segmentation -lpcl_visualization ./lib/libcudaicp.so
/usr/bin/ld: ./lib/libcudaicp.so: error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status
make: *** [Makefile:173: demo] Error 1

How do I get passed that?

Looks like the libraries are compiled for ARM processors. Can you recompile (or provide the source code) for x86_64 ?