I’ve been working with some CUDA code I believe to be a couple of years old. It comes as a visual studio solution which I have been moving across to a new solution with 2 other similar models. (I am producing a comparison of 3 pedestrian models).
Anyway, I have been debugging this project, at first I received an Unknown Error killing runtime (nsight later tracked this down to a CUDA_SYNCHRONISE call. After eventually sorting the nvcc parameters to build debug info I was able to find the line of code which causes the Illegal shared memory exception. Prior to using nsight however I had been placing printf’s throughout kernel parts in an attempt to trace down to the cause of the problem.
I know the line of code causing the exception, so as I knew the original solution to work I added the -arch sm20 build parameter and a printf to the same place, to check whether the values being called were the same. This then caused the old solution to have the same runtime error as my code (even with the printf removed). Removing the -arch sm20 parameter allows the code to run again without exception.
Does anyone know where I would be able to find a detailed explanation of the changes that -arch sm20 cause in an attempt to find the cause behind this problem.
For anyone who thinks they might have an idea about the cause, the nsight analysis and breakpointed code are attached below.
CUDA Memory Checker detected 3 threads caused an access violation:
Launch Parameters
CUcontext = 07399f60
CUstream = 07418fb8
CUmodule = 07484750
CUfunction = 0ee4fbb8
FunctionName = _Z26GPUFLAME_avoid_pedestriansP26xmachine_memory_agent_listP41xmachine_message_pedestrian_location_listP40xmachine_message_pedestrian_location_PBMP10RNG_rand48
GridId = 194
gridDim = {1,1,1}
blockDim = {64,1,1}
sharedSize = 3200
Parameters:
agents = 0x06320000 {_position = {0, 1, 2, -1, -1, -1, -1, -1, -1, -1, ...}, _scan_input = {-25, -5810432, -16777216, -16742457, -16777129, -5773313, -14457, -16777129, -1, -1, ...}, x = {-0.58984375, 0.54296875, 0.83203125, NaN, NaN, NaN, NaN, NaN, NaN, NaN, ...}, y = {-0.90234375, -0.31640625, 0.78515625, NaN, NaN, NaN, NaN, NaN, NaN, NaN, ...}, ...}
pedestrian_location_messages = 0x08960000 {_position = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ...}, _scan_input = {0, 1, 2, 3, -842150451, -842150451, -842150451, -842150451, -842150451, -842150451, ...}, x = {-0.58984375, 0.54296875, 0.83203125, 0, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, ...}, y = {-0.90234375, -0.31640625, 0.78515625, 0, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, -4.3160208e+08, ...}, ...}
partition_matrix = 0x08aa0000 {start = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ...}, end = {-986896, -986896, -986896, -986896, -986896, -986896, -986896, -986896, -986896, -986896, ...}}
rand48 = 0x098a0000 {A = {...}, C = {...}, seeds = {{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, ...}}
Parameters (raw):
0x06320000 0x08960000 0x08aa0000 0x098a0000
GPU State:
Address Size Type Mem Block Thread blockIdx threadIdx PC Source
-----------------------------------------------------------------------------------------------
00000014 16 mis st s 0 0 {0,0,0} {0,0,0} 006f90 c:\users\robadob\dropbox\level3project\code\pedestrianflamegpu\vectorfieldpedmodel\src\dynamic\flamegpu_kernals.cu:634
00000044 16 mis st s 0 1 {0,0,0} {1,0,0} 006f90 c:\users\robadob\dropbox\level3project\code\pedestrianflamegpu\vectorfieldpedmodel\src\dynamic\flamegpu_kernals.cu:634
00000074 16 mis st s 0 2 {0,0,0} {2,0,0} 006f90 c:\users\robadob\dropbox\level3project\code\pedestrianflamegpu\vectorfieldpedmodel\src\dynamic\flamegpu_kernals.cu:634
Summary of access violations:
c:\users\robadob\dropbox\level3project\code\pedestrianflamegpu\vectorfieldpedmodel\src\dynamic\flamegpu_kernals.cu(634): error MemoryChecker: #misaligned=3 #invalidAddress=0
================================================================================
Memory Checker detected 3 access violations.
error = misaligned store (shared memory)
gridid = 194
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x00000014
accessSize = 16
(The line which the code breaks at is marked by >>)
//Using texture cache
temp_message.x = tex1Dfetch(tex_xmachine_message_pedestrian_location_x, cell_index + d_tex_xmachine_message_pedestrian_location_x_offset); temp_message.y = tex1Dfetch(tex_xmachine_message_pedestrian_location_y, cell_index + d_tex_xmachine_message_pedestrian_location_y_offset); temp_message.z = tex1Dfetch(tex_xmachine_message_pedestrian_location_z, cell_index + d_tex_xmachine_message_pedestrian_location_z_offset);
//load it into shared memory (no sync as no sharing between threads)
int message_index = SHARE_INDEX(threadIdx.x, sizeof(xmachine_message_pedestrian_location));
printf("prememerror %i, %i, %i\n", d_SM_START, d_PADDING, message_index);
xmachine_message_pedestrian_location* sm_message = ((xmachine_message_pedestrian_location*)&message_share[message_index]);
>> sm_message[0] = temp_message;
return true;
Any input is appreciated, Thanks