New to NVIDIA Drive OS, Hands on, Next steps

Have Server Setup with NVIDIA Driver OS installed but awaiting for Drive AGX Pegasus hardware.

I want to start from sensor interfacing and up the layers, what could be best documentation to use/read?

Probably cross-compiler/examples will be a good starting point.

Hi @jpvans,

You can download https://developer.nvidia.com/DRIVE/secure/docs/NVIDIA_DRIVE_SW_10_References.zip and start from the samples and how to build mentioned in /DRIVE_Software/DRIVE_AV_doxy_files/dwx_samples_section.html#dwx_other_sensor_samples_group and /DRIVE_Software/DRIVE_AV_doxy_files/dwx_samples_building.html respectively.

@VickNV

Thanks for the tip. I installed NVIDIA Drive SDK into my machine. SDK install went ok without issues but when compiling the sample files. I am getting the following:

[ 94%] Linking CXX executable sample_stereo_disparity
/usr/bin/x86_64-linux-gnu-ld: warning: libcuda.so.1, needed by /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so, not found (try using -rpath or -rpath-link)
/usr/bin/x86_64-linux-gnu-ld: warning: libnvcuvid.so.1, needed by /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so, not found (try using -rpath or -rpath-link)
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuGetErrorName' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidGetSourceVideoFormat’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidDestroyDecoder' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuDeviceGet’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidUnmapVideoFrame64' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidDestroyVideoParser’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidMapVideoFrame64' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidCtxLockCreate’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidGetDecoderCaps' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidParseVideoData’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidCreateVideoSource' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidDecodePicture’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidCreateDecoder' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuCtxGetCurrent’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuGetErrorString' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidCtxLockDestroy’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidCreateVideoParser' /usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to cuvidDestroyVideoSource’
/usr/local/driveworks-2.2/targets/x86_64-Linux/lib/libdriveworks.so: undefined reference to `cuInit’
collect2: error: ld returned 1 exit status
src/imageprocessing/stereo/stereo/CMakeFiles/sample_stereo_disparity.dir/build.make:122: recipe for target ‘src/imageprocessing/stereo/stereo/sample_stereo_disparity’ failed
make[2]: *** [src/imageprocessing/stereo/stereo/sample_stereo_disparity] Error 1
CMakeFiles/Makefile2:1625: recipe for target ‘src/imageprocessing/stereo/stereo/CMakeFiles/sample_stereo_disparity.dir/all’ failed
make[1]: *** [src/imageprocessing/stereo/stereo/CMakeFiles/sample_stereo_disparity.dir/all] Error 2
Makefile:129: recipe for target ‘all’ failed
make: *** [all] Error 2

The file resides on the SDK downloaded folder:

~/nvidia_drive/build$ locate libcuda.so.1
/home/jvilela/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DDPX/DRIVEOS/drive-t186ref-linux/lib-target/libcuda.so.1
/home/jvilela/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DDPX/DRIVEOS/drive-t186ref-linux/targetfs_a/usr/lib/libcuda.so.1
/home/jvilela/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DDPX/DRIVEOS/drive-t186ref-linux/targetfs_b/usr/lib/libcuda.so.1

Do I need to do additional steps to incorporate it into the PATH/system?

Please advise

Did you see any warning or error messages in running cmake command? Did you see the same problem as Problem in cmaking an application topic?

I was able to go around this. From the sample projects, when running remotely to my GPU Server, in specific the LaneDetection sample, it crashes due to X11 which should be working

~/build/src/laneDetection$ ./sample_lane_detection
WindowGLFW: Failed create window
terminate called after throwing an instance of ‘std::exception’
what(): std::exception
Aborted (core dumped)

Host System shows:

~/build/src/hello_world$ ./sample_hello_world


Welcome to Driveworks SDK
[13-07-2020 16:50:45] Platform: Detected Generic x86 Platform
[13-07-2020 16:50:45] TimeSource: monotonic epoch time offset is 1588339735176813
[13-07-2020 16:50:45] Platform: number of GPU devices detected 4
[13-07-2020 16:50:45] Platform: currently selected GPU device discrete ID 0
[13-07-2020 16:50:45] SDK: Resources mounted from /usr/local/driveworks-2.2/data/
[13-07-2020 16:50:45] TimeSource: monotonic epoch time offset is 1588339735176813
[13-07-2020 16:50:45] Initialize DriveWorks SDK v2.2.3136
[13-07-2020 16:50:45] Release build with GNU 7.4.0 from heads/buildbrain-branch-0-gca7b4b26e65
Context of Driveworks SDK successfully initialized.
Version: 2.2.3136
GPU devices detected: 4
[13-07-2020 16:50:45] Platform: currently selected GPU device discrete ID 0

Device: 0, Tesla V100-SXM2-16GB
CUDA Driver Version / Runtime Version : 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.0
Total amount of global memory in MBytes:16160.5
Memory Clock rate Khz: 877000
Memory Bus Width bits: 4096
L2 Cache Size: 6291456
Maximum 1D Texture Dimension Size (x): 131072
Maximum 2D Texture Dimension Size (x,y): 131072, 65536
Maximum 3D Texture Dimension Size (x,y,z): 16384, 16384, 16384
Maximum Layered 1D Texture Size, (x): 32768 num: 2048
Maximum Layered 2D Texture Size, (x,y): 32768, 32768 num: 2048
Total amount of constant memory bytes: 65536
Total amount of shared memory per block bytes: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): 1024,1024,64
Max dimension size of a grid size (x,y,z): 2147483647,65535,65535
Maximum memory pitch bytes: 2147483647
Texture alignment bytes: 512
Concurrent copy and kernel execution: Yes, copy engines num: 5
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID: 0, Device PCI Bus ID: 4, Device PCI location ID: 0
Compute Mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
Concurrent kernels: 1
Concurrent memory: 1

[13-07-2020 16:50:45] Platform: currently selected GPU device discrete ID 1

Device: 1, Tesla V100-SXM2-16GB
CUDA Driver Version / Runtime Version : 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.0
Total amount of global memory in MBytes:16160.5
Memory Clock rate Khz: 877000
Memory Bus Width bits: 4096
L2 Cache Size: 6291456
Maximum 1D Texture Dimension Size (x): 131072
Maximum 2D Texture Dimension Size (x,y): 131072, 65536
Maximum 3D Texture Dimension Size (x,y,z): 16384, 16384, 16384
Maximum Layered 1D Texture Size, (x): 32768 num: 2048
Maximum Layered 2D Texture Size, (x,y): 32768, 32768 num: 2048
Total amount of constant memory bytes: 65536
Total amount of shared memory per block bytes: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): 1024,1024,64
Max dimension size of a grid size (x,y,z): 2147483647,65535,65535
Maximum memory pitch bytes: 2147483647
Texture alignment bytes: 512
Concurrent copy and kernel execution: Yes, copy engines num: 5
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID: 0, Device PCI Bus ID: 6, Device PCI location ID: 0
Compute Mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
Concurrent kernels: 1
Concurrent memory: 1

[13-07-2020 16:50:45] Platform: currently selected GPU device discrete ID 2

Device: 2, Tesla V100-SXM2-16GB
CUDA Driver Version / Runtime Version : 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.0
Total amount of global memory in MBytes:16160.5
Memory Clock rate Khz: 877000
Memory Bus Width bits: 4096
L2 Cache Size: 6291456
Maximum 1D Texture Dimension Size (x): 131072
Maximum 2D Texture Dimension Size (x,y): 131072, 65536
Maximum 3D Texture Dimension Size (x,y,z): 16384, 16384, 16384
Maximum Layered 1D Texture Size, (x): 32768 num: 2048
Maximum Layered 2D Texture Size, (x,y): 32768, 32768 num: 2048
Total amount of constant memory bytes: 65536
Total amount of shared memory per block bytes: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): 1024,1024,64
Max dimension size of a grid size (x,y,z): 2147483647,65535,65535
Maximum memory pitch bytes: 2147483647
Texture alignment bytes: 512
Concurrent copy and kernel execution: Yes, copy engines num: 5
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID: 0, Device PCI Bus ID: 7, Device PCI location ID: 0
Compute Mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
Concurrent kernels: 1
Concurrent memory: 1

[13-07-2020 16:50:45] Platform: currently selected GPU device discrete ID 3

Device: 3, Tesla V100-SXM2-16GB
CUDA Driver Version / Runtime Version : 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.0
Total amount of global memory in MBytes:16160.5
Memory Clock rate Khz: 877000
Memory Bus Width bits: 4096
L2 Cache Size: 6291456
Maximum 1D Texture Dimension Size (x): 131072
Maximum 2D Texture Dimension Size (x,y): 131072, 65536
Maximum 3D Texture Dimension Size (x,y,z): 16384, 16384, 16384
Maximum Layered 1D Texture Size, (x): 32768 num: 2048
Maximum Layered 2D Texture Size, (x,y): 32768, 32768 num: 2048
Total amount of constant memory bytes: 65536
Total amount of shared memory per block bytes: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): 1024,1024,64
Max dimension size of a grid size (x,y,z): 2147483647,65535,65535
Maximum memory pitch bytes: 2147483647
Texture alignment bytes: 512
Concurrent copy and kernel execution: Yes, copy engines num: 5
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID: 0, Device PCI Bus ID: 8, Device PCI location ID: 0
Compute Mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
Concurrent kernels: 1
Concurrent memory: 1

[13-07-2020 16:50:45] Releasing Driveworks SDK Context
Happy autonomous driving!

Please try with running “export DISPLAY=:0” prior to running sample_lane_detection application.

Still getting issues:

jv@gpu07:~/build/src/drivenet/drivenet$ export DISPLAY=:0
jv@gpu07:~/build/src/drivenet/drivenet$ ls
CMakeFiles cmake_install.cmake Makefile sample_drivenet
jv@gpu07:~/build/src/drivenet/drivenet$ ./sample_drivenet
WindowGLFW: Failed initialize GLFW
terminate called after throwing an instance of ‘std::exception’
what(): std::exception
Aborted (core dumped)

Interesting thing is when running:
jv@gpu07:~/build/src/laneDetection$ xclock
Error: Can’t open display: :0

but if I log off and re-login and running xclock directly, it works

You can rung “$ who” to get assigned display number and then set DISPLAY variable with the number by export command. But if you can run “xclock” in a shell, you should be able to run sample_drivenet application there.

1 Like

Hello @jpvans,

it seems you are trying to run the DW samples on a DGX station or equivalent where the 4 V100 GPUs do not have a display port.

You need to have a display connected and run the sample locally (or using a remote desktop client/server application like VNC/teamviewer/noMachine
).

running the samples through ssh while port forwarding the X11 will not work (if you execute echo $DISPLAY after login while forwarding X11 port you probably will get something like localhost:10.0, please refer to the following post:

1 Like