I have been trying to figure out why the PyTorch NGC container (PyTorch | NVIDIA NGC) cannot run GDB successfully. Both the installed cuda-gdb and the distribution’s gdb fail with complaints about not being able to set breakpoints and not being able to access memory at very low addresses.
The following using 19.07 fails:
docker run --gpus=all --ipc=host --cap-add=ALL --privileged -it nvcr.io/nvidia/pytorch:19.07-py3 /bin/bash -c 'apt update && apt install -y gdb && echo "int main() {}" > /tmp/foo.c && gcc -g -o /tmp/foo /tmp/foo.c && gdb -batch -ex "b main" -ex r /tmp/foo -ex c'
with:
Breakpoint 1 at 0x603: file /tmp/foo.c, line 1.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x5fa
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x5fa
Command aborted.
$
but the same with 19.06 passes:
docker run --gpus=all --ipc=host --cap-add=ALL --privileged -it nvcr.io/nvidia/pytorch:19.06-py3 /bin/bash -c 'apt update && apt install -y gdb && echo "int main() {}" > /tmp/foo.c && gcc -g -o /tmp/foo /tmp/foo.c && gdb -batch -ex "b main" -ex r /tmp/foo -ex c'
Running the same commands with the Ubuntu 16.04 and 18.04 base images also pass. And running with the TensorRT image (TensorRT | NVIDIA NGC) fails the same way as the PyTorch image.
Has anybody run into this before in these specific containers?
Thanks in advance!