Is uninitialized shared memory undefined behavior?

forrest4 · January 23, 2023, 8:16pm

Hi all - I just solved an error where a kernel was operating on a shared memory space that was not fully initialized.

Some pseudocode of erroneous code:

static device forceinline void processing (float *shared_array)
shared float array_copy [array_size];

if (threadIdx.x < array_size / 2 )
array_copy[threadIdx.x] = shared_array[threadIdx.x];
__syncthreads();

if (threadIdx.x == 0)
find_median(array_copy);

__syncthreads();

When running this code the find_median function (which is recursive and operates on the bounds of array_size) would hang after several recursive iterations while operating on array_copy without fail. (I’m aware this small snippet of code lacks context and thus is not fully reproducible. )

The first thing you might notice is that not all of array_copy is being initialized (this was the bug I fixed that solved the issue).

So while I fixed the issue but I am interested in finding out why the previous version would hang. With the older version it would hang unless I included the -G -g tags, in which case it would return

Unspecified launch failure - cudaError 719

when running the kernel.

Using cuda-gdb I would get

Thread 1 “test_controller” received signal SIGTRAP, Trace/breakpoint trap

during the find_median function.
if continued:

CUDA Exception: Warp Out-of-range Address

I am thinking the shared memory is somehow deallocated during runtime which leads to the recursive median function hitting a out-of-range address after a few loops (17 to be exact). This seems to be the case since it does happen to execute successfully if I reduce the memory load (removing other kernels, increasing time between kernels using sleep(), etc) .

Is this “runtime shared memory deallocation” even possible?

Robert_Crovella · January 23, 2023, 8:58pm

reading from memory that is uninitialized will return an unpredictable value. Technically it may be UB but I wouldn’t go beyond the statement that I just made.

I’m not aware of any situation in which a shared memory address that was valid/accessible during one portion of the execution of a particular kernel launch is no longer accessible (“deallocated” or whatever) at some other point during the execution of that same particular kernel launch.

forrest4 · January 28, 2023, 12:31am

Thanks Robert, you are right that it was not UB but simply garbage data. The conclusion we came to was that the uninitialized memory included occasional NANs which caused forever loop issues in the median function.

Thanks again.
Forrest

system · February 11, 2023, 12:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Introduction of a 'printf("");' leads to different results CUDA Programming and Performance	1	1990	October 13, 2011
shared memory bug. CUDA Programming and Performance	2	2969	July 26, 2010
Problem with cudaMalloc CUDA Programming and Performance	4	10102	October 29, 2008
unspecified launch failure kernel fails if a loop is too long CUDA Programming and Performance	8	42841	April 25, 2007
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37699	April 23, 2009
Looping kernel calls Unspecified launch error on cudaFree() ?? CUDA Programming and Performance	5	1723	May 13, 2009
efficient static arrays in kernel CUDA Programming and Performance	2	2307	March 31, 2009
cryptic 'invalid device function'... when returning value from shared mem CUDA Programming and Performance	2	2925	July 28, 2008
Strannge behaviour of kernel: unspecified launch failure CUDA Programming and Performance	4	1111	August 15, 2018
cudaThreadSynchronize() stalls application CUDA Programming and Performance	10	10989	November 17, 2009

Is uninitialized shared memory undefined behavior?

Related topics