Hi,
I’m using an older 28.2 BSP with Docker in Ubuntu 14.04. I have the Cuda 8.0.84 packages installed:
ii cuda-command-line-tools-8-0 8.0.84-1 arm64 CUDA command-line tools
ii cuda-core-8-0 8.0.84-1 arm64 CUDA core tools
ii cuda-cublas-8-0 8.0.84-1 arm64 CUBLAS native runtime libraries
ii cuda-cublas-dev-8-0 8.0.84-1 arm64 CUBLAS native dev links, headers
ii cuda-cudart-8-0 8.0.84-1 arm64 CUDA Runtime native Libraries
ii cuda-cudart-dev-8-0 8.0.84-1 arm64 CUDA Runtime native dev links, headers
ii cuda-cufft-8-0 8.0.84-1 arm64 CUFFT native runtime libraries
ii cuda-cufft-dev-8-0 8.0.84-1 arm64 CUFFT native dev links, headers
ii cuda-curand-8-0 8.0.84-1 arm64 CURAND native runtime libraries
ii cuda-curand-dev-8-0 8.0.84-1 arm64 CURAND native dev links, headers
ii cuda-cusolver-8-0 8.0.84-1 arm64 CUDA solver native runtime libraries
ii cuda-cusolver-dev-8-0 8.0.84-1 arm64 CUDA solver native dev links, headers
ii cuda-cusparse-8-0 8.0.84-1 arm64 CUSPARSE native runtime libraries
ii cuda-cusparse-dev-8-0 8.0.84-1 arm64 CUSPARSE native dev links, headers
ii cuda-documentation-8-0 8.0.84-1 arm64 CUDA documentation
ii cuda-driver-dev-8-0 8.0.84-1 arm64 CUDA Driver native dev stub library
ii cuda-license-8-0 8.0.84-1 arm64 CUDA licenses
ii cuda-misc-headers-8-0 8.0.84-1 arm64 CUDA miscellaneous headers
ii cuda-npp-8-0 8.0.84-1 arm64 NPP native runtime libraries
ii cuda-npp-dev-8-0 8.0.84-1 arm64 NPP native dev links, headers
ii cuda-nvgraph-8-0 8.0.84-1 arm64 NVGRAPH native runtime libraries
ii cuda-nvgraph-dev-8-0 8.0.84-1 arm64 NVGRAPH native dev links, headers
ii cuda-nvml-dev-8-0 8.0.84-1 arm64 NVML native dev links, headers
ii cuda-nvrtc-8-0 8.0.84-1 arm64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-8-0 8.0.84-1 arm64 NVRTC native dev links, headers
ii cuda-repo-l4t-8-0-local 8.0.84-1 arm64 cuda repository configuration files
ii cuda-samples-8-0 8.0.84-1 arm64 CUDA example applications
ii cuda-toolkit-8-0 8.0.84-1 arm64 CUDA Toolkit 8.0 meta-package
ii libcudnn6 6.0.21-1+cuda8.0 arm64 cuDNN runtime libraries
ii libcudnn6-dev 6.0.21-1+cuda8.0 arm64 cuDNN development libraries and headers
ii libcudnn6-doc 6.0.21-1+cuda8.0 arm64 cuDNN documents and samples
ii nv-gie-repo-ubuntu1604-ga-cuda8.0-trt2.1-20170614 1-1 arm64 nv-gie repository configuration files
I can build the Cuda C++ samples, I can run deviceQuery, however if I run any example it crashes in cudaMalloc().
What is interesting is that if I run the exact same binary with cuda-memcheck or cuda-gdb, the exact same sample program runs perfectly fine. A sample app would be clock.cu
Since cuda-gdb and cuda-memcheck probably handle memory allocation differently, could you please tell me if there’s some export that I need to do? I really need to have them work at this version if possible.
./samples/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X2"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.2
Total amount of global memory: 7855 MBytes (8236851200 bytes)
( 2) Multiprocessors, (128) CUDA Cores/MP: 256 CUDA Cores
GPU Max Clock rate: 1301 MHz (1.30 GHz)
Memory Clock rate: 1600 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = NVIDIA Tegra X2
Result = PASS
Here’s how it crashes with normal gdb:
Program received signal SIGSEGV, Segmentation fault.
0x0000007fb7508628 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
(gdb) bt
#0 0x0000007fb7508628 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#1 0x0000007fb74ee0c8 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#2 0x0000007fb74ee8fc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#3 0x0000007fb74f8210 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#4 0x0000007fb74f8700 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#5 0x0000007fb742c930 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#6 0x0000007fb742c9dc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#7 0x0000007fb7438d60 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8 0x0000007fb7442cc0 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#9 0x0000007fb7410adc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#10 0x0000007fb741230c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#11 0x0000007fb734f118 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#12 0x0000007fb7472748 in cuDevicePrimaryCtxRetain () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#13 0x0000000000434700 in cudart::contextStateManager::initPrimaryContext(cudart::device*) ()
#14 0x0000000000434964 in cudart::contextStateManager::initDriverContext() ()
But runs fine with memcheck
root@4005ed8:/usr/local/cuda# bin/cuda-memcheck samples/0_Simple/asyncAPI/asyncAPI
========= CUDA-MEMCHECK
[samples/0_Simple/asyncAPI/asyncAPI] - Starting...
GPU Device 0: "NVIDIA Tegra X2" with compute capability 6.2
CUDA device [NVIDIA Tegra X2]
time spent executing by the GPU: 3883.78
time spent by CPU in CUDA calls: 19.55
CPU executed 1632906 iterations while waiting for GPU to finish
========= ERROR SUMMARY: 0 errors