NGC PyTorch 19.07 and newer (Ubuntu 18.04) GDB issues

mneilly · October 14, 2020, 11:01pm

I have been trying to figure out why the PyTorch NGC container (PyTorch | NVIDIA NGC) cannot run GDB successfully. Both the installed cuda-gdb and the distribution’s gdb fail with complaints about not being able to set breakpoints and not being able to access memory at very low addresses.

The following using 19.07 fails:

docker run --gpus=all --ipc=host --cap-add=ALL --privileged -it nvcr.io/nvidia/pytorch:19.07-py3 /bin/bash -c 'apt update && apt install -y gdb && echo "int main() {}" > /tmp/foo.c && gcc -g -o /tmp/foo /tmp/foo.c && gdb -batch -ex "b main" -ex r /tmp/foo -ex c'

with:

Breakpoint 1 at 0x603: file /tmp/foo.c, line 1.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x5fa

Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x5fa

Command aborted.
$

but the same with 19.06 passes:

docker run --gpus=all --ipc=host --cap-add=ALL --privileged -it nvcr.io/nvidia/pytorch:19.06-py3 /bin/bash -c 'apt update && apt install -y gdb && echo "int main() {}" > /tmp/foo.c && gcc -g -o /tmp/foo /tmp/foo.c && gdb -batch -ex "b main" -ex r /tmp/foo -ex c'

Running the same commands with the Ubuntu 16.04 and 18.04 base images also pass. And running with the TensorRT image (TensorRT | NVIDIA NGC) fails the same way as the PyTorch image.

Has anybody run into this before in these specific containers?

Thanks in advance!

Topic		Replies	Views
TensorRT Docker Container; Debugging Not Working with GDB Docker and NVIDIA Docker	4	1159	April 8, 2021
NGC pytorch docker container. The NVIDIA Driver was not detected Docker and NVIDIA Docker	0	957	February 23, 2023
Docker accessing GPU for Pytorch error Docker and NVIDIA Docker cuda	0	766	July 29, 2021
cuda-gdb misses breakpoints depending on "compute capability" cuda-gdb, breakpoint, misses CUDA Programming and Performance	3	5819	September 23, 2009
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53774	August 1, 2011
NGC Container Release Noted For CUDA DL BASE missing Docker and NVIDIA Docker cuda	0	11	November 15, 2024
Unable to install latest CUDA libraries on new DGX DGX User Forum cuda	1	742	October 3, 2022
NVIDIA GPU Optimized AMI is missing drivers and can't run the PyTorch NGC Dockerfile Amazon Web Services (AWS)	1	1192	January 12, 2024
Fail to run docker at NVIDIA Clara AGX Holoscan SDK ai	4	1340	September 20, 2023
General Doubt Regarding Frameworks Support Matrix of NGC Containers Container: CUDA cuda , pytorch	1	570	April 1, 2024

NGC PyTorch 19.07 and newer (Ubuntu 18.04) GDB issues

Related topics