Cuda-gdb doesnt fit hopper?(tested on cutlass-example-48)

202476410arsmart · August 5, 2024, 8:31am

(base) zyhuang@sdzx-h100-1:~/cutlass/build/examples/48_hopper_warp_specialized_gemm$ cuda-gdb ./48_hopper_warp_specialized_gemm
NVIDIA (R) cuda-gdb 12.4
Portions Copyright (C) 2007-2023 NVIDIA Corporation
Based on GNU gdb 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This CUDA-GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://forums.developer.nvidia.com/c/developer-tools/cuda-developer-tools/cuda-gdb>.
Find the CUDA-GDB manual and other documentation resources online at:
    <https://docs.nvidia.com/cuda/cuda-gdb/index.html>.
--Type <RET> for more, q to quit, c to continue without paging--

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./48_hopper_warp_specialized_gemm...
(cuda-gdb) run
Starting program: /home/zyhuang/cutlass/build/examples/48_hopper_warp_specialized_gemm/48_hopper_warp_specialized_gemm 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5baa000 (LWP 774620)]
[New Thread 0x7ffff48cc000 (LWP 774621)]
[Detaching after fork from child process 774622]
Enter sm90_mma_tma_gmma_ss_warpspecialized
[New Thread 0x7ffde0f9c000 (LWP 774954)]
warning: Cuda API error detected: cudaLaunchKernelExC returned (0x1)

[ ERROR: CUDA Runtime ] /home/zyhuang/cutlass/include/cutlass/cluster_launch.hpp:176: invalid argument
warning: Cuda API error detected: cudaGetLastError returned (0x1)

Got cutlass error: Error Internal at: 439
[Thread 0x7ffff48cc000 (LWP 774621) exited]
[Thread 0x7ffde0f9c000 (LWP 774954) exited]
[Thread 0x7ffff5baa000 (LWP 774620) exited]
[Inferior 1 (process 774616) exited with code 01]

AKravets · August 5, 2024, 8:37am

Hi @202476410arsmart,
Does you application work without the cuda-gdb? The log you posted suggests that there might be an issue with cudaLaunchKernelExC call.

202476410arsmart · August 5, 2024, 3:21pm

good idea, let me try and feedback to you tomorrow

202476410arsmart · August 6, 2024, 2:54am

Well, the code is correct(exactly from cutlass and can run without cuda-gdb and -g -G). But -g -G compile is very slow and it shows many scaring things like:

setmaxnreg would be eliminated…wmma sth will be serialized…

I think this is nvcc’s problem?

And adding -g -G makes compiling verrrry slow.

AKravets · August 6, 2024, 10:32am

Thank you for the reply! We are investigating the issue.

veraj · August 8, 2024, 4:03am

Hi, @202476410arsmart

I can reproduce the error. But note the debug version sample run fail directly without cuda-gdb.

local-veraj@ipp2-0051:~/cutlass/examples/examples/48_hopper_warp_specialized_gemm$ /usr/local/cuda-12.6/bin/cuda-gdb ./48_hopper_warp_specialized_gemm
NVIDIA (R) cuda-gdb 12.6
Portions Copyright (C) 2007-2024 NVIDIA Corporation
Based on GNU gdb 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This CUDA-GDB was configured as “x86_64-pc-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
https://forums.developer.nvidia.com/c/developer-tools/cuda-developer-tools/cuda-gdb.
Find the CUDA-GDB manual and other documentation resources online at:
https://docs.nvidia.com/cuda/cuda-gdb/index.html.

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from ./48_hopper_warp_specialized_gemm…
(cuda-gdb) r
Starting program: /localhome/local-veraj/cutlass/examples/examples/48_hopper_warp_specialized_gemm/48_hopper_warp_specialized_gemm
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7ffff43ff000 (LWP 4065)]
[New Thread 0x7ffff2fff000 (LWP 4066)]
[Detaching after fork from child process 4067]
[New Thread 0x7ffff1383000 (LWP 4077)]
warning: Cuda API error detected: cudaLaunchKernelExC returned (0x2)

warning: Cuda API error detected: cudaGetLastError returned (0x2)

Got cutlass error: Error Internal at: 415
[Thread 0x7ffff2fff000 (LWP 4066) exited]
[Thread 0x7ffff1383000 (LWP 4077) exited]
[Thread 0x7ffff43ff000 (LWP 4065) exited]
[Inferior 1 (process 4061) exited with code 01]
(cuda-gdb) exit

local-veraj@ipp2-0051:~/cutlass/examples/examples/48_hopper_warp_specialized_gemm$ ./48_hopper_warp_specialized_gemm
Got cutlass error: Error Internal at: 415

So this is not an issue for cuda-gdb.
This is related with cutlass. Can you please check with cutlass team in the github directly ?

202476410arsmart · August 10, 2024, 11:22am

Thanks! OK! I will do that at once.

202476410arsmart · August 15, 2024, 2:22am

The -g is fixed, but -G is not fixed. I am stilling tracking their progress!

veraj · August 29, 2024, 2:23am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cuda-gdb/13/gdb/cuda/cuda-state.c:250: internal-error: create_module: Assertion `context' failed CUDA-GDB	10	480	May 29, 2025
Cuda-gdb internal-error of copy_type on basic fortran example CUDA-GDB cuda-gdb	5	988	January 2, 2024
Cuda-gdb aborted CUDA-GDB	7	259	November 24, 2024
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53795	August 1, 2011
cuda-gdb error CUDA Programming and Performance	6	5345	June 1, 2012
cuda-gdb hangs in the CUDA 2.3 beta CUDA Programming and Performance	0	1120	June 30, 2009
Cuda-gdb crashes in demangler CUDA-GDB	3	1173	October 12, 2021
Cuda-gdb crash when trying to debug kernel launched through `cudaLaunchCooperativeKernel` CUDA-GDB cuda-gdb	11	2434	April 29, 2024
attach cuda-gdb to a running process failed CUDA-GDB	10	3137	November 29, 2017
Cuda-gdb does not work in wsl2 CUDA-GDB cuda , wsl	11	2457	November 7, 2023

Cuda-gdb doesnt fit hopper?(tested on cutlass-example-48)

Related topics