Thanks for you advice, I have checked the output of compute-sanitizer ./thrust_pcl_conflict
. Seems to be conflict on the shared memory. But I don’y have the knowledge to debug it.
========= COMPUTE-SANITIZER
Allocated | Total Memory | Free Memory
0, 8361738240, 8215068672
0, 8361738240, 8215068672
0, 8361738240, 8212971520
========= Invalid __shared__ write of size 4 bytes
========= at 0xe540 in void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<int, int, unsigned int>::Policy800, (bool)0, (bool)0, int, int, unsigned int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>)
========= by thread (64,0,0) in block (0,0,0)
========= Address 0x9d2b6050 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x305122]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:__cudart798 [0x2f41b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaLaunchKernel [0x8b31b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<int, int, unsigned int>::Policy800, false, false, int, int, unsigned int>(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>) [0x2000f]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaError thrust::cuda_cub::launcher::triple_chevron::doit_host<void (*)(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>), int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int> >(void (*)(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>), int const* const&, int* const&, int const* const&, int* const&, unsigned int* const&, unsigned int const&, int const&, int const&, cub::GridEvenShare<unsigned int> const&) const [clone .isra.0] [0x20b63]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaError cub::DeviceRadixSort::SortPairs<int, int, int>(void*, unsigned long&, cub::DoubleBuffer<int>&, cub::DoubleBuffer<int>&, int, int, int, CUstream_st*) [0x26ced]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::device::OctreeImpl::build() [0x1d99b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::gpu::Octree::build() [0x14931]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:main [0xe324]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
========= Host Frame: [0x29d90]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main [0x29e40]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0xddf5]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
=========
========= Invalid __shared__ write of size 4 bytes
========= at 0xe540 in void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<int, int, unsigned int>::Policy800, (bool)0, (bool)0, int, int, unsigned int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>)
========= by thread (65,0,0) in block (0,0,0)
========= Address 0x9d2b6054 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x305122]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:__cudart798 [0x2f41b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaLaunchKernel [0x8b31b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<int, int, unsigned int>::Policy800, false, false, int, int, unsigned int>(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>) [0x2000f]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaError thrust::cuda_cub::launcher::triple_chevron::doit_host<void (*)(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>), int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int> >(void (*)(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>), int const* const&, int* const&, int const* const&, int* const&, unsigned int* const&, unsigned int const&, int const&, int const&, cub::GridEvenShare<unsigned int> const&) const [clone .isra.0] [0x20b63]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaError cub::DeviceRadixSort::SortPairs<int, int, int>(void*, unsigned long&, cub::DoubleBuffer<int>&, cub::DoubleBuffer<int>&, int, int, int, CUstream_st*) [0x26ced]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::device::OctreeImpl::build() [0x1d99b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::gpu::Octree::build() [0x14931]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:main [0xe324]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
========= Host Frame: [0x29d90]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main [0x29e40]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0xddf5]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
=========
========= Invalid __shared__ write of size 4 bytes
========= at 0xe540 in void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<int, int, unsigned int>::Policy800, (bool)0, (bool)0, int, int, unsigned int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>)
========= by thread (66,0,0) in block (0,0,0)
......
=========
========= Invalid __shared__ write of size 4 bytes
========= at 0xe560 in void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<int, int, unsigned int>::Policy800, (bool)0, (bool)0, int, int, unsigned int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>)
========= by thread (157,0,0) in block (1,0,0)
========= Address 0x1032c is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x305122]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:__cudart798 [0x2f41b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaLaunchKernel [0x8b31b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<int, int, unsigned int>::Policy800, false, false, int, int, unsigned int>(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>) [0x2000f]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaError thrust::cuda_cub::launcher::triple_chevron::doit_host<void (*)(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>), int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int> >(void (*)(int const*, int*, int const*, int*, unsigned int*, unsigned int, int, int, cub::GridEvenShare<unsigned int>), int const* const&, int* const&, int const* const&, int* const&, unsigned int* const&, unsigned int const&, int const&, int const&, cub::GridEvenShare<unsigned int> const&) const [clone .isra.0] [0x20b63]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:cudaError cub::DeviceRadixSort::SortPairs<int, int, int>(void*, unsigned long&, cub::DoubleBuffer<int>&, cub::DoubleBuffer<int>&, int, int, int, CUstream_st*) [0x26ced]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::device::OctreeImpl::build() [0x1d99b]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::gpu::Octree::build() [0x14931]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:main [0xe324]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
========= Host Frame: [0x29d90]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main [0x29e40]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0xddf5]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: [0x441886]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:cudaStreamSynchronize [0x8b0fb]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::device::OctreeImpl::build() [0x1de65]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::gpu::Octree::build() [0x14931]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:main [0xe324]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
========= Host Frame: [0x29d90]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main [0x29e40]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0xddf5]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaGetLastError.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: [0x441886]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:cudaGetLastError [0x689a7]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::device::OctreeImpl::build() [0x1de6d]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:pcl::gpu::Octree::build() [0x14931]
========= in /usr/local/lib/libpcl_gpu_octree.so.1.14
========= Host Frame:main [0xe324]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
========= Host Frame: [0x29d90]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main [0x29e40]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0xddf5]
========= in /notebooks/PCCudaFilter/build/./thrust_pcl_conflict
=========
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 128 errors