Cuda-gdb internal-error of copy_type on basic fortran example

joanib14 · December 27, 2023, 8:21pm

Hello,

I’m trying to familiarize myself with cuda-gdb before using it on my large application. I have tried using it on the first example from the cuda Fortran programing guide but have been encountering internal errors.
I’m just getting started with GPU programing, so I could have overlooked something obvious.

Here is the example code I have been trying to run:

module mytests
    contains
    attributes(global)  &
    subroutine test1( a )
    integer, device :: a(*)
    i = threadIdx%x
    a(i) = i
    return
    end subroutine test1
end module mytests

program t1
    use cudafor
    use mytests
    integer, parameter :: n = 100
    integer, allocatable, device :: iarr(:)
    integer h(n)
    istat = cudaSetDevice(0)
    allocate(iarr(n))
    h = 0; iarr = h
    call test1<<<1,n>>> (iarr)
    h = iarr
    print *,&
    "Errors: ", count(h.ne.(/ (i,i=1,n) /))
    deallocate(iarr)
end program t1
! set break point with 
! break count.F90:6

Which I compile using nvfortran -cuda -g -gpu=debug -o count count.F90

When using cuda-gdb to stop at a breakpoint within the function I get an “internal error”. Below is the log of my commands and the errors

> cuda-gdb ./count
NVIDIA (R) CUDA Debugger
CUDA Toolkit 12.2 release
Portions Copyright (C) 2007-2023 NVIDIA Corporation
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
--Type <RET> for more, q to quit, c to continue without paging--
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Using python library libpython3.6m.so
--Type <RET> for more, q to quit, c to continue without paging--
Reading symbols from ./count...
(cuda-gdb) break count.F90:6
Breakpoint 1 at 0x4015ee: file count.F90, line 9.
(cuda-gdb) c
The program is not being run.
(cuda-gdb) run
Starting program: /vast_swbuild/swbuild3/janibal/LAVA_RESEARCH/src/curv/test_mat_mul/count 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x15554947e000 (LWP 88918)]
[Detaching after fork from child process 88919]
[New Thread 0x15554901c000 (LWP 88929)]
[New Thread 0x155548e1b000 (LWP 88930)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Thread 1 "count" hit Breakpoint 1, mytests::test1<<<(1,1,1),(100,1,1)>>> (cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
0x62d087 ???
0x9e1344 ???
0x9e169c ???
0xb94ac1 ???
0x7a3036 ???
0x7a5967 ???
0x7a6469 ???
0x9f6bdb ???
0x9e56fa ???
0x71a479 ???
0x7303c2 ???
0x730a45 ???
0x78f849 ???
0x97071e ???
0x971239 ???
0x972b32 ???
0x973315 ???
0x7daacb ???
0x65ff9a ???
0x7e9813 ???
0x7dc283 ???
0x7e7030 ???
0xb9554c ???
0xb95736 ???
0x828fe4 ???
0x82a8a4 ???
0x5679e4 ???
0x153fc14b3d84 ???
0x56d8f4 ???
0xffffffffffffffff ???
---------------------
cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.

cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
a=<error reading variable: copy_type: Assertion `type->is_objfile_owned ()' failed.>) at count.F90:6
6           i = threadIdx%x
(cuda-gdb) p iarr
No symbol "iarr" in current context.
(cuda-gdb) n
7           a(i) = i
(cuda-gdb) p a(0)
cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
0x62d087 ???
0x9e1344 ???
0x9e169c ???
0xb94ac1 ???
0x7a3036 ???
0x7a5967 ???
0x7a6469 ???
0x9f6bdb ???
0x9e56fa ???
0x71a479 ???
0x7303c2 ???
0x730a45 ???
0x78f849 ???
0x9e68ef ???
0x772882 ???
0x78937a ???
0x771ca5 ???
0x8a1b0e ???
0x8a248a ???
0x65c653 ???
0x9c4249 ???
0x777bdb ???
0x777efa ???
0x7784e8 ???
0xa1dd74 ???
0x776fe5 ???
0x7783cc ???
0x776dbf ???
0xb9554c ???
0xb9571a ???
0x828fe4 ???
0x82a8a4 ???
0x5679e4 ???
0x153fc14b3d84 ???
0x56d8f4 ???
0xffffffffffffffff ???
---------------------
cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.

cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
copy_type: Assertion `type->is_objfile_owned ()' failed.
(cuda-gdb) l
2           contains
3           attributes(global)  &
4           subroutine test1( a )
5           integer, device :: a(*)
6           i = threadIdx%x
7           a(i) = i
8           return
9           end subroutine test1
10      end module mytests
11
(cuda-gdb)

I can hit no through all the errors, but If I try to access information about the variables within the kernel then the errors reappear.

I’m using nvhpc 23.7 (the latest we have on our cluster).
The output of nvidia-smi is

Wed Dec 27 12:16:18 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB           Off| 00000000:4C:00.0 Off |                    0 |
| N/A   35C    P0               53W / 400W|      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

I’ve tried modifying the compiler flags, adding -gpu=cc80 and removing -gpu=debug, but it had no effect.

veraj · December 28, 2023, 2:29am

Hi, @joanib14

Thanks for reporting the issue to us ! From the content you paste, it looks like you are using 12.1 driver - 530.30.02 and 12.2 toolkit ?

We try to reproduce internally, and found this only happens when driver/toolkit version not consistent. We’ll check internally if this need to be fixed.

Would you please update your driver to R535 to have a try ?

joanib14 · December 28, 2023, 4:04pm

Hi,

Thanks for the quick reply. I tried a node type with the updated driver (535.104.12) but still got the same result.

Here is the log of what I tried and the output of nvidia-cuda.

> nvfortran -cuda -g -gpu=debug,cc80 -o count count.F90
> cuda-gdb ./count
NVIDIA (R) CUDA Debugger
CUDA Toolkit 12.2 release
Portions Copyright (C) 2007-2023 NVIDIA Corporation
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Using python library libpython3.6m.so
Reading symbols from ./count...
(cuda-gdb) break count.F90:6
Breakpoint 1 at 0x4015ee: file count.F90, line 9.
(cuda-gdb) run
Starting program: /vast_swbuild/swbuild3/janibal/LAVA_RESEARCH/src/curv/test_mat_mul/count 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x15554953e000 (LWP 24821)]
[Detaching after fork from child process 24822]
[New Thread 0x155548dfb000 (LWP 24835)]
[New Thread 0x155548bfa000 (LWP 24836)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Thread 1 "count" hit Breakpoint 1, mytests::test1<<<(1,1,1),(100,1,1)>>> (cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
0x62d087 ???
0x9e1344 ???
0x9e169c ???
0xb94ac1 ???
0x7a3036 ???
0x7a5967 ???
0x7a6469 ???
0x9f6bdb ???
0x9e56fa ???
0x71a479 ???
0x7303c2 ???
0x730a45 ???
0x78f849 ???
0x97071e ???
0x971239 ???
0x972b32 ???
0x973315 ???
0x7daacb ???
0x65ff9a ???
0x7e9813 ???
0x7dc283 ???
0x7e7030 ???
0xb9554c ???
0xb95736 ???
0x828fe4 ???
0x82a8a4 ???
0x5679e4 ???
0x155553ee4d84 ???
0x56d8f4 ???
0xffffffffffffffff ???
---------------------
cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

This is a bug, please report it.  For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.

cuda-gdb/12/gdb/gdbtypes.c:5831: internal-error: copy_type: Assertion `type->is_objfile_owned ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
> nvidia-smi
Thu Dec 28 08:03:31 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:09:00.0 Off |                    0 |
| N/A   29C    P0              62W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  | 00000000:C7:00.0 Off |                    0 |
| N/A   35C    P0              72W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

veraj · December 29, 2023, 3:15am

Hi, @joanib14

We tried more combinations today.
12.2 toolkit +12.1 driver (530.30.02)： fail
12.2 toolkit + 12.2 driver (535.104.12) ：fail
12.3 toolkit +12.1 driver (530.30.02)： pass
12.3 toolkit +12.2 driver (535.104.12) : pass
12.3 toolkit + 12.3 driver (545.23.08): pass

So it seems the issue is already fixed in CUDA12.3.
Would you please try with our latest CUDA12.3 ? Sorry for the inaccurate info in my last reply.

joanib14 · January 2, 2024, 7:23pm

After the HPC admins added nvhpc 23.11 I am no longer having an issues with my toy problem. Thanks for your help!

system · January 16, 2024, 7:24pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Assertion failure in cuda-gdb printing a variable gives 'cuda-gdb internal error' CUDA Programming and Performance	1	1753	September 29, 2009
CUDA-GDB: The CUDA driver has hit an internal error CUDA-GDB cuda	1	675	March 2, 2021
attach cuda-gdb to a running process failed CUDA-GDB	10	3160	November 29, 2017
cuda-gdb internal error... CUDA-GDB	0	1602	January 13, 2015
Internal compiler error for CUDA Fortran code Legacy PGI Compilers	3	5030	March 23, 2018
Cuda-gdb 2.2 fails internal assert CUDA Programming and Performance	0	1713	July 2, 2009
Error running simple CUDA Fortran program Legacy PGI Compilers	9	21322	February 26, 2010
Different results with -Mcuda=emu / -Mcuda with simple code Legacy PGI Compilers	17	15289	December 10, 2009
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53880	August 1, 2011
Cuda-gdb/13/gdb/cuda/cuda-state.c:250: internal-error: create_module: Assertion `context' failed CUDA-GDB	10	584	May 29, 2025

Cuda-gdb internal-error of copy_type on basic fortran example

Related topics