Is uninitialized shared memory undefined behavior?

Hi all - I just solved an error where a kernel was operating on a shared memory space that was not fully initialized.

Some pseudocode of erroneous code:

static device forceinline void processing (float *shared_array)
shared float array_copy [array_size];

if (threadIdx.x < array_size / 2 )
array_copy[threadIdx.x] = shared_array[threadIdx.x];
__syncthreads();

if (threadIdx.x == 0)
find_median(array_copy);

__syncthreads();

When running this code the find_median function (which is recursive and operates on the bounds of array_size) would hang after several recursive iterations while operating on array_copy without fail. (I’m aware this small snippet of code lacks context and thus is not fully reproducible. )

The first thing you might notice is that not all of array_copy is being initialized (this was the bug I fixed that solved the issue).

So while I fixed the issue but I am interested in finding out why the previous version would hang. With the older version it would hang unless I included the -G -g tags, in which case it would return

Unspecified launch failure - cudaError 719

when running the kernel.

Using cuda-gdb I would get

Thread 1 “test_controller” received signal SIGTRAP, Trace/breakpoint trap

during the find_median function.
if continued:

CUDA Exception: Warp Out-of-range Address

I am thinking the shared memory is somehow deallocated during runtime which leads to the recursive median function hitting a out-of-range address after a few loops (17 to be exact). This seems to be the case since it does happen to execute successfully if I reduce the memory load (removing other kernels, increasing time between kernels using sleep(), etc) .

Is this “runtime shared memory deallocation” even possible?

reading from memory that is uninitialized will return an unpredictable value. Technically it may be UB but I wouldn’t go beyond the statement that I just made.

I’m not aware of any situation in which a shared memory address that was valid/accessible during one portion of the execution of a particular kernel launch is no longer accessible (“deallocated” or whatever) at some other point during the execution of that same particular kernel launch.

Thanks Robert, you are right that it was not UB but simply garbage data. The conclusion we came to was that the uninitialized memory included occasional NANs which caused forever loop issues in the median function.

Thanks again.
Forrest

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.