Accelerating Lidar for Robotics with NVIDIA CUDA-based PCL

jwitsoe · February 1, 2021, 5:42am

Originally published at: https://developer.nvidia.com/blog/accelerating-lidar-for-robotics-with-cuda-based-pcl/

Many Jetson users choose lidars as their major sensors for localization and perception in autonomous solutions. Lidars describe the spatial environment around the vehicle as a collection of three-dimensional points known as a point cloud. Point clouds sample the surface of the surrounding objects in long range and high precision, which are well-suited for use…

triokun · May 30, 2021, 9:24am

Hi, jwitsoe!
I am trying to use CUDA-PCL on Jetson TX2. But I have encountered a CUDA failure problem which seems to be hard for me to deal with. It would be great if you can give some suggestions.

Environment: Jetson TX2 with Jetpack 4.5 (Ubuntu 18.04, CUDA-10.2, PCL 1.8.1)
Problem: When I run the built demo in each subfolder, it turn out to be a CUDA failure. One output example is given below as I run the demo in cuda-pcl/cuda-segmentation:

nvidia@nvidia-tx2:~/Downloads/cuda-pcl/cuda-segmentation$ ./demo sample.pcd

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7850MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Cuda failure: no kernel image is available for execution on the device at line 310 in file cudaSegmentation.cpp error status: 209
Aborted (core dumped)

I have managed several attempts to solve it.
First, I downgrade to Jetpack 4.4.1 which is the same as the official test environment. But it did not work.
Next, I followed solutions to other similar problem. Specifically, I manually add 62 (which corresponds to the compute capability 6.2 of Jetson TX2) to the SMS variable in makefile. Still, nothing changed.
Since the source code is not there, I can’t do more with it.

I don’t know much about CUDA programing, but I guess the .so file is not compiled with sms=62 so it can’t be executed on Jetson TX2. I would be appreciated if you could fix it for us TX2 users.

leif · June 2, 2021, 3:48am

Hi triokun,
You are right that the error below means there is no kernel for CURRENT device.
This is because CUDA-PCL was not compiled for SM62.
Cuda failure: no kernel image is available for execution on the device at line 310 in file cudaSegmentation.cpp error status: 209
Aborted (core dumped)

triokun · June 2, 2021, 10:53am

Hi, leif!
Thanks for your answering.
I’m wondering if you can recompile the library for TX2 if you have the source code. It would help me a lot.

leif · June 4, 2021, 7:31am

This is lib for TX2, but it has not been tested because there is no TX2 on local side.

triokun · June 5, 2021, 2:43am

I’m grateful for your help. I have tested it on TX2 and it worked perfectly!
Would you mind recompiling the other two lib (libcudafilter.so and libcudaicp.so) for TX2?
Again, thank you so much!

leif · June 8, 2021, 5:28am

Please check the two libs.

gautran · June 11, 2021, 2:44pm

Hi @leif ,

Try building the CUDA-ICP example and got a usr/bin/ld: ./lib/libcudaicp.so: error adding symbols: file in wrong format error.

Environment: GTX 1050 (Ubuntu 20.04, CUDA-10.2, PCL 1.10.1)

USE Default CUDA DIR: /usr/local/cuda
TARGET_ARCH: x86_64
CUDA_VERSION: 11000
SMS: 30 35 50 53 60 61 70 72 
g++ -D_REENTRANT -std=c++11 -std=c++14 -O2 -o demo obj/main.o  -L/usr/lib -L/usr/local/lib -L/usr/local/cuda/lib64 -lcudart_static -lrt -ldl -lpthread -lcudart -L/lib64 -lcudnn -lpthread -L/usr/lib/aarch64-linux-gnu/ -lboost_system -lpcl_common -lpcl_io -lpcl_recognition -lpcl_features -lpcl_sample_consensus -lpcl_octree -lpcl_search -lpcl_filters -lpcl_kdtree -lpcl_segmentation -lpcl_visualization ./lib/libcudaicp.so
/usr/bin/ld: ./lib/libcudaicp.so: error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status
make: *** [Makefile:173: demo] Error 1

How do I get passed that?

gautran · June 11, 2021, 2:53pm

Looks like the libraries are compiled for ARM processors. Can you recompile (or provide the source code) for x86_64 ?

triokun · June 13, 2021, 1:47pm

Yes, they all work well on TX2 except for CUDA_VoxelGrid. Here is the output；

---------------checking CUDA VoxelGrid---------------------
ERROR case
status = 11

leif · June 18, 2021, 2:07am

Jetson has a GPU with known type but PC not.
It is hard to adjust cuda-pcl for all GPUs.
We may support X86_64 later.

leif · June 18, 2021, 2:11am

The VoxelGrid may be not suitable for TX2.
We will try to check it later.

nghiaphamsg · July 10, 2021, 1:57pm

Hi @leif
When i run cuda-icp example and output:

$~/cuda-pcl-main/cuda-icp$ ./demo
GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X1
Capbility: 5.3
Global memory: 3956MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Loaded 859059 data points for P with the following fields: x y z rgb
Loaded 784546 data points for Q with the following fields: x y z rgb
iter.Maxiterate 20
iter.threshold 1e-12
iter.acceptrate 1

Target rigid transformation : cloud_in → cloud_icp
Rotation matrix :
| 0.923880 -0.382683 0.000000 |
R = | 0.382683 0.923880 0.000000 |
| 0.000000 0.000000 1.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

matrix_icp native value
Rotation matrix :
| 1.000000 0.000000 0.000000 |
R = | 0.000000 1.000000 0.000000 |
| 0.000000 0.000000 1.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.000000 >

------------checking CUDA ICP(GPU)----------------
Cuda failure: the launch timed out and was terminated at line 59 in file cudaICP.cpp error status: 702
Aborted (core dumped)

Can you help me fix problem ?
Thanks you.

leif · July 19, 2021, 1:58am

Hi @nghiaphamsg
Error status: 702 means that :
Specified whether there is a run time limit on kernels
https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_19a63114766c4d2309f00403c1bf056c8
Could you try to boost your device firstly?

asagllam · October 11, 2021, 6:51pm

Hi jwitose,
I am processing 3D points in a custom way to create 2D images. So, I am going through all the points and want to speed up the process using CUDA. The code is part a ROS node. Could you please let me know whether you have any samples or tutorials for ROS and CUDA especially for Point Cloud processing?
Thanks in advance,
Ahmet

leif · October 26, 2021, 8:30am

Hi asagllam,
Our cuda-pcl provide some libs and head files which can be used directly for any framework include ROS.

rchen390 · November 16, 2021, 8:04pm

Hello @leif ,

Could you compile the ICP, filter, and segmentation libraries for the TX1 as well?

Thanks,
Ryan

leif · November 22, 2021, 3:12am

Hi @rchen390
Please check the three libs which was compiled for TX1 with Jetpack 4.4.1.

rchen390 · December 13, 2021, 7:47am

Hi @leif,

Thank you for compiling the libraries for me last time. If possible, could you also compile the ICP, filter, and segmentation libraries for the NVIDIA GeForce RTX 2080?

Happy holidays!

leif · December 15, 2021, 1:51am

Hi @rchen390
Please check these libraries.

Topic		Replies	Views
ROS (rviz) - point cloud visualization. Jetson TX1	2	1562	October 18, 2021
Trouble with python-pcl installation on Jetson TX2 Gaming and Visualization Technologies	0	842	July 15, 2020
building a cucv? computer vision CUDA Programming and Performance	27	20558	September 14, 2009
Compilation of Point Cloud Library on TX2 falied Jetson TX2	4	2611	October 18, 2021
Configure eclipse for CUDA CUDA Programming and Performance	3	3488	January 30, 2011
Nearest Neighbors ICP CUDA Programming and Performance	17	14190	March 2, 2011
Is this a task in which CUDA could speed up things? CUDA Programming and Performance	11	11821	September 29, 2009
GPU Acceleration with Lidars and cloud processing Isaac ROS	1	533	November 29, 2023
jCUDA - Java library for CUDA Windows support CUDA Programming and Performance	12	62919	August 30, 2009
ray tracer choosing tools CUDA Programming and Performance	24	33944	May 20, 2008

Related topics