How to check GPU kernel's error?

Hello, guys.

I’ve got much information at here.

Firstly, thank you for helping me!

I’ve got a question.

I think GPU allocation and GPU memcpy are fine.

(In device kernel) When access to memory for getting or setting data, occurring illegal memory access error.

So I annotate setting and getting lines, there is no error…

I can’t find which line is an error… Please help me how can I get the kernel error message?

P.S. I already check the GPU memory allocation size. allocation size isn’t a problem.

Use cuda-memcheck

To find a specific line of kernel code that is doing an illegal access, use the method described here:

[url]cuda - Unspecified launch failure on Memcpy - Stack Overflow

Thank you so much Mr. Robert.

I remember you. you always help me.

Thank you so much!

I checked that you gave me the URL so I used -lineinfo and cuda-memcheck.

but I still don’t know why the error is occurring.

I will check the message again. thank you. have an awesome weekday

hello, Mr. Robert.

I checked a new error message…

that is “unspecified launch failure”.

I’ve seen the first time that error. I don’t know when occurring that error message.

total memory: 6377766912 bytes
current free memory: 92536832 bytes
       total memory: 6377766912 bytes
current free memory: 72876032 bytes
========= Invalid __global__ read of size 8
=========     at 0x00000060 in ./GPU_attribute_handler.cuh:95:gpu_attribute_setting_kernel(int, unsigned int, gpu_attribute_handler*, gpu_attribute_handler*,u_attribute_handler*)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x02a07f88 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x24881d]
=========     Host Frame:./pagerank_gpu [0x73692]
=========     Host Frame:./pagerank_gpu [0x73887]
=========     Host Frame:./pagerank_gpu [0xa2615]
=========     Host Frame:./pagerank_gpu [0xcd42]
=========     Host Frame:./pagerank_gpu [0xc132]
=========     Host Frame:./pagerank_gpu [0xc18c]
=========     Host Frame:./pagerank_gpu [0x1ff8e]
=========     Host Frame:./pagerank_gpu [0x1fde1]
=========     Host Frame:./pagerank_gpu [0x1e15e]
=========     Host Frame:./pagerank_gpu [0xbefb]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830]
=========     Host Frame:./pagerank_gpu [0xbcd9]
=========
Failed to async copy host to device (error code unspecified launch failure)!
Failed to async copy host to device (error code unspecified launch failure)!
Failed to async copy host to device (error code unspecified launch failure)!
Failed to async copy host to device (error code unspecified launch failure)!
Failed to async copy host to device (error code unspecified launch failure)!
========= ERROR SUMMARY: 1 error

The invalid global read error is occurring at line 95 of the file GPU_attribute_handler.cuh:

=========     at 0x00000060 in ./GPU_attribute_handler.cuh:95:gpu_a
                                                           ^^

That is as much information as you can get from cuda-memcheck. Beyond that, you have to use that information to debug your code.

Thank you Mr. Robert. I can resolve the problem. My code is C++. I used object that is class. The class is allicated on main memory. but there is no space on global memory. So at the kernel get the object pointer which is the class object. but that pointer exist on only main memory no global memory. so out of bounds error is occurring even data pointer is allocated, object isn’t allocated on global memory. I will allocate the object type! thank you!

Hello again.

I use C++.
The problem of my code presumes that class object does not exist on device memory.
Data pointer point to allocation space for calculation on device memory.
But object pointer which has data pointer only exist on main memory.
At the kernel, the data point is valid but the object pointer is invalid.
Because there is no object allocation space on device memory.
Is my presumption right?

I coded for allocation object space but occurred “segmentation fault” error… CUDA is so complicated…