CUDA-GDB captured "Illegal access to address" exception when I invoke child kernel, but the result is correct when free run

Lennoxwu · March 20, 2017, 5:41pm

Hi all,
I invoke a child kernel to do sorting of thrust, the result is correct.
But when I use CUDA-GDB/Nsingt to check the program, the “Illegal access to address” exception was captured.
I have read related documents, I’m sure the passed arguments are not local variables. So I try to pass
arguments only, and the kernel so nothing. However, the exception still occurs. Only the kernel without argument can
pass the check. Is it a real bug in my program?

Provide some experiments

__global__ void kernel_sorting(int a){
//nothing
}
__device__ int v;
__global__ void kernel_main(....){
int tid = threadIdx.x + blockDim.x * blockIdx.x;
....
if(tid==0)
    kernel_sorting<<<gridDim.x,blockDim.x>>>(v); //pass int
}

GDB show: Illegal access to address (@global)0x70382d540 detected. 
So v is in the global memory

__global__ void kernel_sorting(int *a){
//nothing
}
__device__ int v;
__global__ void kernel_main(....){
int tid = threadIdx.x + blockDim.x * blockIdx.x;
....
if(tid==0)
    kernel_sorting<<<gridDim.x,blockDim.x>>>(&v); //pass address
}
//GDB show: Illegal access to address (@global)0x70382d540 detected. 
//So v is in the global memory

__global__ void kernel_sorting(int *a){
//nothing
}
__device__ int v;
__global__ void kernel_main(int *input){
int tid = threadIdx.x + blockDim.x * blockIdx.x;
....
if(tid==0)
    kernel_sorting<<<gridDim.x,blockDim.x>>>(input); //pass an argument of parent kernel, a buffer in global memory 
}

provide more GDB’s information

Breakpoint 1, kernel_main<<<(1,1,1),(64,1,1)>>> (…) at …/fpg.cu:583
583 kernel_sorting<<<gridDim.x,blockDim.x>>>(&v);
(cuda-gdb) p &v
$1 = (@global int *) 0x70528dc00
(cuda-gdb) c
Continuing.

Illegal access to address (@global)0x703623d40 detected.

Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 0, grid 22, block (0,0,0), thread (0,0,0), device 0, sm 4, warp 1, lane 0]
0x00007ffff06340a8 in kernel_main<<<(1,1,1),(64,1,1)>>> (…)at …/fpg.cu:583
583 kernel_sorting<<<gridDim.x,blockDim.x>>>(&v);

//The address triggers the exception is not the address of v. Who touch the address?

Robert_Crovella · March 20, 2017, 5:46pm

Here is my suggestion, if you want help:

[url]https://devtalk.nvidia.com/default/topic/999733/cuda-programming-and-performance/function-only-works-correctly-with-cuda-8-0-in-release-mode-/post/5109126/#5109126[/url]

Lennoxwu · March 20, 2017, 6:39pm

Thanks for your suggestion, I create a project with Nsignt. The case can be reproduce.
I tried cuda-memcheck, but the program can be executed successfully with error.
You can switch the child kernel, all of them cause exception on my PC.
CUDA version : 8.0
GPU driver version:375.26
OS details : OpenSuse 42
the GPU model you are running on : Maxwell

#include <iostream>
#include <numeric>
#include <stdlib.h>
#include <stdio.h>
static void CheckCudaErrorAux (const char *, unsigned, const char *, cudaError_t);
#define CUDA_CHECK_RETURN(value) CheckCudaErrorAux(__FILE__,__LINE__, #value, value)

/**
 * CUDA kernel that computes reciprocal values for a given vector
 */

__global__ void kernel_sorting1(int *data) {
unsigned idx = blockIdx.x*blockDim.x+threadIdx.x;
	printf("tid:%d\n",idx);

}
__global__ void kernel_sorting2(int *data) {
unsigned idx = blockIdx.x*blockDim.x+threadIdx.x;
	printf("tid:%d\n",idx);

}

__global__ void kernel_sorting3(int data) {
	unsigned idx = blockIdx.x*blockDim.x+threadIdx.x;
	printf("tid:%d\n",idx);

}

__device__ int v;
__global__ void kernel_main(int *data) {
	unsigned idx = blockIdx.x*blockDim.x+threadIdx.x;
	int x =1;
	if (idx==0){
		kernel_sorting1<<<1,64>>>(data);
		//kernel_sorting2<<<1,64>>>(&v);
		//kernel_sorting3<<<1,64>>>(x);
	}
}

int main(void)
{
	int *input;
	cudaMalloc((void**) &input, sizeof(int) * 100 );
	cudaMemset(input, 0, sizeof(int) * 100 );
	kernel_main<<<1,64>>>(input);
	cudaFree(input);
	return 0;
}

NVIDIA (R) CUDA Debugger
8.0 release
Portions Copyright (C) 2007-2016 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/…
Reading symbols from /home/…/work/pp/Debug/pp…done.
(cuda-gdb) r
Starting program: /home/…/work/pp/Debug/./pp
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
[New Thread 0x7ffff5f31700 (LWP 5103)]
[New Thread 0x7ffff5730700 (LWP 5104)]
[New Thread 0x7ffff4f2f700 (LWP 5105)]

Illegal access to address (@global)0x7032fb540 detected.

Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 0, grid 3, block (0,0,0), thread (0,0,0), device 0, sm 4, warp 1, lane 0]
0x0000000000d0d2c8 in kernel_main<<<(1,1,1),(64,1,1)>>> (data=0x705380000) at …/src/pp.cu:44
44 kernel_sorting1<<<1,64>>>(data);

Robert_Crovella · March 20, 2017, 6:56pm

I don’t have any trouble running your code on a sm_35 device, cuda 8, CentOS7:

$ nvcc -g -G -arch=sm_35 -o t1303 t1303.cu -rdc=true -lcudadevrt
t1303.cu(32): warning: variable "x" was declared but never referenced

t1303.cu(5): warning: function "CheckCudaErrorAux" was declared but never referenced

t1303.cu(32): warning: variable "x" was declared but never referenced

t1303.cu(5): warning: function "CheckCudaErrorAux" was declared but never referenced

$ cuda-gdb ./t1303
NVIDIA (R) CUDA Debugger
8.0 release
Portions Copyright (C) 2007-2016 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/user2/misc/t1303...done.
(cuda-gdb) r
Starting program: /home/user2/misc/./t1303
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff5f8e700 (LWP 837)]
[New Thread 0x7ffff578d700 (LWP 838)]
tid:0
tid:1
tid:2
tid:3
tid:4
tid:5
tid:6
tid:7
tid:8
tid:9
tid:10
tid:11
tid:12
tid:13
tid:14
tid:15
tid:16
tid:17
tid:18
tid:19
tid:20
tid:21
tid:22
tid:23
tid:24
tid:25
tid:26
tid:27
tid:28
tid:29
tid:30
tid:31
tid:32
tid:33
tid:34
tid:35
tid:36
tid:37
tid:38
tid:39
tid:40
tid:41
tid:42
tid:43
tid:44
tid:45
tid:46
tid:47
tid:48
tid:49
tid:50
tid:51
tid:52
tid:53
tid:54
tid:55
tid:56
tid:57
tid:58
tid:59
tid:60
tid:61
tid:62
tid:63
[Thread 0x7ffff578d700 (LWP 838) exited]
[Thread 0x7ffff7fcc740 (LWP 810) exited]
[Inferior 1 (process 810) exited normally]
(cuda-gdb)

It doesn’t appear to me that OpenSUSE 42 is an officially supported linux distro for CUDA:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

but I’m not sure that is related to your issue. I don’t think there is anything wrong with your cuda code as posted, and I think your cuda-memcheck results support that claim.

Lennoxwu · March 20, 2017, 7:04pm

Hi txbob,
Thanks again for your help. Maybe I should try other OS.
Actually, cuda-memcheck always reports “Internal error (7)” on my PC.
Even if I use it to check CUDA sample provided by NVIDIA.
Have you met the case?

========= CUDA-MEMCHECK
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 750 Ti                                                                                                                                                                
 Quick Mode                                                                                                                                                                                  
                                                                                                                                                                                             
 Host to Device Bandwidth, 1 Device(s)                                                                                                                                                       
 PINNED Memory Transfers                                                                                                                                                                     
   Transfer Size (Bytes)        Bandwidth(MB/s)                                                                                                                                              
   33554432                     12753.9                                                                                                                                                      

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12794.5

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     70159.5

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
========= Internal error (7)
========= No CUDA-MEMCHECK results found

I don’t have any trouble running your code on a sm_35 device, cuda 8, CentOS7:

$ nvcc -g -G -arch=sm_35 -o t1303 t1303.cu -rdc=true -lcudadevrt
t1303.cu(32): warning: variable "x" was declared but never referenced

t1303.cu(5): warning: function "CheckCudaErrorAux" was declared but never referenced

t1303.cu(32): warning: variable "x" was declared but never referenced

t1303.cu(5): warning: function "CheckCudaErrorAux" was declared but never referenced

$ cuda-gdb ./t1303
NVIDIA (R) CUDA Debugger
8.0 release
Portions Copyright (C) 2007-2016 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/user2/misc/t1303...done.
(cuda-gdb) r
Starting program: /home/user2/misc/./t1303
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff5f8e700 (LWP 837)]
[New Thread 0x7ffff578d700 (LWP 838)]
tid:0
tid:1
tid:2
tid:3
tid:4
tid:5
tid:6
tid:7
tid:8
tid:9
tid:10
tid:11
tid:12
tid:13
tid:14
tid:15
tid:16
tid:17
tid:18
tid:19
tid:20
tid:21
tid:22
tid:23
tid:24
tid:25
tid:26
tid:27
tid:28
tid:29
tid:30
tid:31
tid:32
tid:33
tid:34
tid:35
tid:36
tid:37
tid:38
tid:39
tid:40
tid:41
tid:42
tid:43
tid:44
tid:45
tid:46
tid:47
tid:48
tid:49
tid:50
tid:51
tid:52
tid:53
tid:54
tid:55
tid:56
tid:57
tid:58
tid:59
tid:60
tid:61
tid:62
tid:63
[Thread 0x7ffff578d700 (LWP 838) exited]
[Thread 0x7ffff7fcc740 (LWP 810) exited]
[Inferior 1 (process 810) exited normally]
(cuda-gdb)

It doesn’t appear to me that OpenSUSE 42 is an officially supported linux distro for CUDA:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

but I’m not sure that is related to your issue. I don’t think there is anything wrong with your cuda code as posted, and I think your cuda-memcheck results support that claim.

Robert_Crovella · March 20, 2017, 7:21pm

Sounds like there is something wrong with your CUDA install. I don’t know what it is exactly. Previously you said cuda-memcheck reported no errors. On my system, cuda-memcheck reported no errors on your code (and I didn’t get any in cuda-gdb either).

Lennoxwu · March 20, 2017, 7:27pm

Thanks. Because the internal error always occurs, even if The output is right, so I just ignore the error.
Thanks for your help, changing an OS is easier than debug :)

Topic		Replies	Views
Cuda-gdb aborted CUDA-GDB	7	119	November 24, 2024
Cuda-gdb internal-error of copy_type on basic fortran example CUDA-GDB cuda-gdb	5	963	January 2, 2024
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53618	August 1, 2011
Warning: Cuda API error detected: cuModuleLoadFatBinary returned (0xd1) CUDA-GDB	25	1041	July 30, 2024
Why do I get CUDA_EXCEPTION_10, Device Illegal Address? CUDA-GDB	9	11361	October 30, 2013
Correct output with emulation mode, wrong with GPU/Execution CUDA Programming and Performance	6	3324	March 25, 2010
Cuda-gdb crash when trying to debug kernel launched through `cudaLaunchCooperativeKernel` CUDA-GDB cuda-gdb	11	2351	April 29, 2024
CUDA Toolkit and SDK 2.3 released CUDA Programming and Performance	127	319410	November 3, 2009
Problems with CUDA CUDA Programming and Performance	8	2825	December 3, 2012
Intermittent CUDA_ERROR_ILLEGAL_ADDRESS error on Ubuntu 18.04 with TensorFlow 2.2.0 Frameworks cuda , tensorflow	3	7780	January 5, 2023

CUDA-GDB captured "Illegal access to address" exception when I invoke child kernel, but the result is correct when free run

Related topics