cuda-gdbserver cannot be connected, Unknown register ymm0h requested

xiaodai · August 6, 2018, 12:06pm

How to reproduce:

cuda-gdbserver :2345 some_cuda_app
cuda-gdb
target remote localhost:2345

Then the remote connection will be closed, and the gdbserver will print some error as follows:

Process csr_bfs created; pid = 114385
Listening on port 2345
Remote debugging from host 127.0.0.1
cuda-gdb/7.12/gdb/gdbserver/regcache.c:264: A problem internal to GDBserver has been detected.
Unknown register ymm0h requested

Spec:
OS: Ubuntu 16.04 / Ubuntu 18.04
CUDA toolkits: 9.2
CPU: Intel(R) Xeon(R) Gold 6140 CPU / Silver 4108
GPU: Titan V / V100 PCIE 16G

We have several machines with the above specs will face such error.

But, another machine whose specs is:

OS: Ubuntu 16.04
CUDA toolkits: 9.2
CPU: Intel I7 6800k
GPU: 1080ti

will be fine.

So, is this error related to Xeon® Scalable Processors family or volta-based GPU architecture?

Thanks for any comments!

veraj · August 10, 2018, 5:44am

Hi,xiaodai

Sorry we cann’t reproduce this issue in below configuration

OS: Ubuntu 16.04
CUDA toolkits: 9.2
GPU: GV100

I will propose your problem to the dev to see if they have any idea about the error.

veraj · August 10, 2018, 5:52am

Hi, xiaodai

Can you share the source file that can trigger the issue ?

Or if you just want to workaround, may you can “set cuda memcheck off” in cuda-gdb.

xiaodai · August 10, 2018, 5:54am

Hi veraj,

Thanks for the reply.

When I further explore this issue, I can confirm that it is related to gdb (which means that it is not introduced by cuda-gdb).

As my side, gdb version lower than 8.1 will reproduce this issue on my machine. I highly suspect it is related to the Xeon® Scalable Processors family.

I have tried to build gdb 7.12, 8.0 and the latest 8.11. Only the latest 8.11 do NOT has this issue.

So, please try to reproduce this issue on Xeon® Scalable Processors family. If it can be reproduced, it is appreciated to fix it asap.

xiaodai · August 10, 2018, 6:24am

I download gdb source file from here: Index of /gnu/gdb

7.12.1: Unknown register ymm0h requested
8.0: Unknown register pkru requested
8.01: Unknown register pkru requested
8.1: good!

It can just be reproduced by a print hello-world c++.

xiaodai · August 10, 2018, 6:29am

https://sourceware.org/bugzilla/show_bug.cgi?id=22137

The gdb bug report about pkru register is here.

But I am not sure whether ymm0h and pkru is one issue. Since current cuda-gdb bases on gdb 7.12 (and reports error on ymm0h), I am not sure whether this bug is related to what we are discussing.

Thanks.

xiaodai · August 10, 2018, 6:30am

del

veraj · August 10, 2018, 6:30am

Sorry, I shouldn’t ask you for the source. I’m replying in the wrong topic.

For you problem, I already raised a bug for our developer to ask details

xiaodai · August 10, 2018, 6:32am

Please also attach my following posts, maybe useful, thanks.

wumo · October 14, 2018, 6:39am

I got the same error:

OS: Ubuntu 16.04
CUDA toolkits: 10.0
GPU: GTX 1080 TI

following the tutorial https://devblogs.nvidia.com/building-cuda-applications-cmake/
and using remote gdb

cuda-gdbserver :5000 test_particles
Process test_particles created; pid = 18774
Listening on port 5000
Remote debugging from host 192.168.10.86
cuda-gdb/7.12/gdb/gdbserver/regcache.c:264: A problem internal to GDBserver has been detected.
Unknown register ymm0h requested

veraj · October 29, 2018, 2:42am

Hi, xiaodai and wumo

We have an internal issue tracking this. Thanks for reporting.
Dev are working on this to fix.

xiaodai · January 24, 2019, 3:36am

Dear @veraj,

Any good news? Thanks!

veraj · January 24, 2019, 5:11am

Hi, xiaodai

Sorry. The internal issue has not fixed.

rbischof · February 6, 2019, 11:31am

Just an update: We are actively working on the underlying issue that is causing this problem and, while not imminent, it a fix is expected in an upcoming release (200439277)

zhangzhenkang · May 31, 2019, 6:57am

I got the same error, and i have three machines, one is host machines, another two are target, and when i using remote debug, one target machines is fine, but another not, and then console show the same msg as :unknown register ymm0h requested

my host and target machines’ configuration as follows:

OS: ubuntu 16.04
cuda Toolkit; cuda 10.0
GPU : tesla V100
cuda-gdb version: 7.12

if there are any method to avoid this?
Thanks

rbischof · May 31, 2019, 9:03am

Unfortunately there isn’t a workaround for this problem on late-model CPUs. The update to resolve this is still pending and will be in an upcoming release.

yoel_berger · July 24, 2019, 9:46am

Hi,
Do you have estimation when the version that resolve this will release?

teun.vankuppeveld · December 4, 2019, 6:57am

Hi,

I have the same problem.
Is it already clear when a new release is expected?
Hope to hear from you, thanks in advance.

johnnumber · February 22, 2021, 8:23am

any good news？

AKravets · March 11, 2021, 10:17am

Hello! Could you try the latest cuda-gdb from CUDA toolkit 11.2 (which is based on GDB 8.2+)? This issue should be fixed in that version.