cuda-gdbserver cannot be connected, Unknown register ymm0h requested

How to reproduce:

  1. cuda-gdbserver :2345 some_cuda_app
  2. cuda-gdb
  3. target remote localhost:2345

Then the remote connection will be closed, and the gdbserver will print some error as follows:

Process csr_bfs created; pid = 114385
Listening on port 2345
Remote debugging from host 127.0.0.1
cuda-gdb/7.12/gdb/gdbserver/regcache.c:264: A problem internal to GDBserver has been detected.
Unknown register ymm0h requested

Spec:
OS: Ubuntu 16.04 / Ubuntu 18.04
CUDA toolkits: 9.2
CPU: Intel(R) Xeon(R) Gold 6140 CPU / Silver 4108
GPU: Titan V / V100 PCIE 16G

We have several machines with the above specs will face such error.

But, another machine whose specs is:

OS: Ubuntu 16.04
CUDA toolkits: 9.2
CPU: Intel I7 6800k
GPU: 1080ti

will be fine.

So, is this error related to Xeon® Scalable Processors family or volta-based GPU architecture?

Thanks for any comments!

Hi,xiaodai

Sorry we cann’t reproduce this issue in below configuration

OS: Ubuntu 16.04
CUDA toolkits: 9.2
GPU: GV100

I will propose your problem to the dev to see if they have any idea about the error.

Hi, xiaodai

Can you share the source file that can trigger the issue ?

Or if you just want to workaround, may you can “set cuda memcheck off” in cuda-gdb.

Hi veraj,

Thanks for the reply.

When I further explore this issue, I can confirm that it is related to gdb (which means that it is not introduced by cuda-gdb).

As my side, gdb version lower than 8.1 will reproduce this issue on my machine. I highly suspect it is related to the Xeon® Scalable Processors family.

I have tried to build gdb 7.12, 8.0 and the latest 8.11. Only the latest 8.11 do NOT has this issue.

So, please try to reproduce this issue on Xeon® Scalable Processors family. If it can be reproduced, it is appreciated to fix it asap.

I download gdb source file from here: Index of /gnu/gdb

7.12.1: Unknown register ymm0h requested
8.0: Unknown register pkru requested
8.01: Unknown register pkru requested
8.1: good!

It can just be reproduced by a print hello-world c++.

https://sourceware.org/bugzilla/show_bug.cgi?id=22137

The gdb bug report about pkru register is here.

But I am not sure whether ymm0h and pkru is one issue. Since current cuda-gdb bases on gdb 7.12 (and reports error on ymm0h), I am not sure whether this bug is related to what we are discussing.

Thanks.

del

Sorry, I shouldn’t ask you for the source. I’m replying in the wrong topic.

For you problem, I already raised a bug for our developer to ask details

Please also attach my following posts, maybe useful, thanks.

I got the same error:

OS: Ubuntu 16.04
CUDA toolkits: 10.0
GPU: GTX 1080 TI

following the tutorial https://devblogs.nvidia.com/building-cuda-applications-cmake/
and using remote gdb

cuda-gdbserver :5000 test_particles
Process test_particles created; pid = 18774
Listening on port 5000
Remote debugging from host 192.168.10.86
cuda-gdb/7.12/gdb/gdbserver/regcache.c:264: A problem internal to GDBserver has been detected.
Unknown register ymm0h requested

Hi, xiaodai and wumo

We have an internal issue tracking this. Thanks for reporting.
Dev are working on this to fix.

Dear @veraj,

Any good news? Thanks!

Hi, xiaodai

Sorry. The internal issue has not fixed.

Just an update: We are actively working on the underlying issue that is causing this problem and, while not imminent, it a fix is expected in an upcoming release (200439277)

I got the same error, and i have three machines, one is host machines, another two are target, and when i using remote debug, one target machines is fine, but another not, and then console show the same msg as :unknown register ymm0h requested

my host and target machines’ configuration as follows:

OS: ubuntu 16.04
cuda Toolkit; cuda 10.0
GPU : tesla V100
cuda-gdb version: 7.12

if there are any method to avoid this?
Thanks

Unfortunately there isn’t a workaround for this problem on late-model CPUs. The update to resolve this is still pending and will be in an upcoming release.

Hi,
Do you have estimation when the version that resolve this will release?

Hi,

I have the same problem.
Is it already clear when a new release is expected?
Hope to hear from you, thanks in advance.

any good news?

Hello! Could you try the latest cuda-gdb from CUDA toolkit 11.2 (which is based on GDB 8.2+)? This issue should be fixed in that version.