Parallel processing of same memory address(es)

ChoCChoK · August 11, 2020, 11:09pm

Hello there,

I’m using CUDA 5 in C++ with a GTX 950M on Windows 10.

Just for test purpose, I allocated “double” memory addresses as much as threads number (1024) on device (with cudaMalloc) and then I simply increment these (with “variable[threadIdx.x] += 1”) in a kernel, but results are strangely not corrects (the sum of all the array must normally be the total number of blocks * threads, which is not). I guess that maybe for optimization purpose, the same thread of two different blocks can interfer?

More generally, my problem is that I must allocate too much memory (a big integers’ array with 6 dimensions representing approximately 300MB) to even do it once per thread. So can I somehow handle the same memory addresses with all blocks/threads, somehow? Maybe using wisely a device_vector? (With for example one thread that only manage some kind of memory transfer between the device_vector and the only big array, which permits to progressively remove elements from vector?)

In short term, I prefer a solution on CUDA 5, but if a solution only exists with later versions of CUDA (5.5 or later), I’m also interested.

Thanks a lot for every answer, and sorry if I look stupid, I’m new in this beautiful world of GPU processing.

ChoCChoK.

ChoCChoK · August 16, 2020, 10:38pm

Up, just in case. (I’m surprised that a question like this remains unanswered, I’m pretty sure that this kind of problems must often occur, or am I totally wrong?)

striker159 · August 18, 2020, 3:17pm

variable[threadIdx.x] += 1

This does not work if multiple blocks run concurrently. It is a race condition. You need atomic operations.

ChoCChoK · August 21, 2020, 7:49am

Thanks for your answer.

Topic		Replies	Views
Shared Memory Buffer CUDA Programming and Performance	1	2690	May 13, 2011
Newbie Question: Threads What's going on here? CUDA Programming and Performance	5	2230	July 18, 2008
Using <<<...>>> CUDA Programming and Performance	6	2482	June 19, 2011
problem about the GPU thread CUDA Programming and Performance	2	1039	May 7, 2009
kernel malloc() efficiency really bad CUDA Programming and Performance	3	8271	January 18, 2011
Execution Of Thread-Blocks CUDA Programming and Performance	4	5286	June 18, 2007
Here are my timing results, not impressive. Help. CUDA Programming and Performance	5	7019	January 30, 2008
memory for CUDA threads memory utilzation is directly proportional to number of threads? CUDA Programming and Performance	2	2473	January 22, 2012
Multiple GPUs not working CUDA Programming and Performance	1	821	July 9, 2009
Using shared memory where a variable number of threads shares some data. CUDA Programming and Performance	3	4317	May 14, 2011

Parallel processing of same memory address(es)

Related topics