Illegal memory access, __global__ read of size 4


In this code, line 313, i’m having difficulties tracking back the mentionned error.

This operation works if h_img[j] = 0 some lines above, but if it’s equal to something else (take 0.1f, or 1.0f), i’ve got either an illegal memory acces error from cuda, or an unspecified launch error.

With memchecks, diagnostic gives 200 times this:
========= Invalid global read of size 4
========= at 0x000001f8 in void csrMvT_hyb_kernel<float, float, float, int=7, int=2, int=8, int=5, int=0>(cusparseCsrMvParams<float, float, float>, int*)
========= by thread (116,0,0) in block (105,0,0)
========= Address 0x509d05d64 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\WINDOWS\system32\nvcuda.dll (cuTexRefSetAddress + 0x1aa848) [0x1b7e95]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\cusparse64_91.dll (cusparseDpruneCsr2csrByPercentage + 0x28fe) [0x26d89e]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\cusparse64_91.dll (cusparseDpruneCsr2csrByPercentage + 0x1eb1) [0x26ce51]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\cusparse64_91.dll (cusparseCsrmvEx_bufferSize + 0x4061) [0x10d341]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\cusparse64_91.dll (cusparseCsrmvEx_bufferSize + 0x802c) [0x11130c]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\cusparse64_91.dll (cusparseScsrmv + 0x85) [0x112165]
========= Host Frame:C:\Users\stagiaire\Desktop\Workstation\GPGPU_Tests\x64\Debug\GPGPU_Tests.exe (GPU_Method_2 + 0x11dc) [0x1312c]
========= Host Frame:C:\Users\stagiaire\Desktop\Workstation\GPGPU_Tests\x64\Debug\GPGPU_Tests.exe (main + 0xda) [0x17aaa]
========= Host Frame:C:\Users\stagiaire\Desktop\Workstation\GPGPU_Tests\x64\Debug\GPGPU_Tests.exe (invoke_main + 0x34) [0x1a504]
========= Host Frame:C:\Users\stagiaire\Desktop\Workstation\GPGPU_Tests\x64\Debug\GPGPU_Tests.exe (__scrt_common_main_seh + 0x127) [0x1a3c7]
========= Host Frame:C:\Users\stagiaire\Desktop\Workstation\GPGPU_Tests\x64\Debug\GPGPU_Tests.exe (__scrt_common_main + 0xe) [0x1a28e]
========= Host Frame:C:\Users\stagiaire\Desktop\Workstation\GPGPU_Tests\x64\Debug\GPGPU_Tests.exe (mainCRTStartup + 0x9) [0x1a529]
========= Host Frame:C:\WINDOWS\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x11fe4]
========= Host Frame:C:\WINDOWS\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6efc1]

when the error returned is illegal access memory.

when it is unspecified lauch error (20% of the time?) it doesn’t give any useful information. (it just mentions the unspecified launch error).

What’s weird is that it works if h_img is filled with zeroes, it works for the h_unitarian calculus and fo the h_v3 calculus, it just fails for the h_img => h_v1 operation. Program runs fine if I load h_img with 0.

Here’s the complete code base, with example files:

go to debug/x86/ , shift right click, open a powershell console or a traditional console, and “./GPGPU_Tests.exe 35” to reproduce the error. you can mess with the silence variable for tracing.

cuSPARSE requires cc 2+

PS: Please don’t hesitate to give any additional comment on general programming, i’m fairly new to all this i’ll take anything to improve. :)

PPS: At some point starting this program gave me a nice Windows 10 blue screen, couldn’t reproduce yet, consider saving anything you’ve got opened if you’re taking a look

Additional informations:

I’ve modified some bits of the file, which now looks like this:
(ignore line 277’s comment, copy paste)

And i’ve noticed something weird:

On lines 273+:
makeMatrixVectorOperation(CUSPARSE_OPERATION_NON_TRANSPOSE, h_1, h_unitarian, mdata.Largeurmdata.Nproj, mdata.Largeurmdata.Largeur);
parse(cpu, “Unitarian vector first value: %f.”, h_unitarian[0]); // Should be ~150.019470
parse(cpu, “Unitarian vector last value: %f.”, h_unitarian[mdata.Largeur*mdata.Largeur-1]); // ~Should be 150.019485

The operation doesn’t fill the h_unitarian vector, it stops at val 3202 / ~7200. Coincidentally, mdata.Largeur*mdata.Nproj = 3204.

I think there’s something wrong with either the matrix, or the way i feed in my arguments.

(reminder of the makeMatrix macro:
#define makeMatrixVectorOperation(TransposeOP,h_a,h_b,size_a,size_b)
cudaStat1 = cudaAllocateFloat(&d_a, size_a);
parseError(gpu, "Allocating memory for d_a: ", cudaStat1);

cudaStat1 = cudaAllocateFloat(&d_b, size_b);
parseError(gpu, "Allocating memory for d_b: ", cudaStat1);

cudaStat1 = cudaMemcpy(d_a, h_a, sizeof(float) * size_a, cudaMemcpyHostToDevice);
parseError(gpu, "Copying memory for d_a: ", cudaStat1);

cudaStat1 = cudaMemcpy(d_b, h_b, sizeof(float) * size_b, cudaMemcpyHostToDevice);
parseError(gpu, "Copying memory for d_b: ", cudaStat1);

LogLineEnd(csp, “Calculating Matrix x Vector”);

cusparseStat = cusparseScsrmv(cusparseH, TransposeOP, size_a, size_b, NonZeroElements, &one, descrRadon, d_SparseRadonMatrix, d_NZERows, d_NZEColumns, d_a, &zero, d_b);
cusparseCheck(cusparseStat, “cuSPARSE”, “d_b vector calculated”, “Failed to do d_b vector operation.”);

cudaStat1 = cudaMemcpy(h_b, d_b, sizeof(float) * size_b, cudaMemcpyDeviceToHost);
parseError(mem, "Retrieving data from device d_b: ", cudaStat1);

I’ll still be looking into it, if I see anythign that can help you help me, i’ll get back here and drop a message.