Illegal memory access when adding parameters in PyOptiX

Hi, starting from the triangle example of PyOptiX, I added an arbitrary value in the params list in the launch function named « horse » :

params = [
        ( 'u8', 'image', ),
        ( 'u4', 'image_width',  pix_width      ),
        ( 'u4', 'image_height', pix_height     ),
        ( 'u4', 'horse',        123            ),
        ( 'f4', 'cam_eye_x',    0              ),
        ( 'f4', 'cam_eye_y',    0              ),
        ( 'f4', 'cam_eye_z',    2.0            ),
        ( 'f4', 'cam_U_x',      1.10457        ),
        ( 'f4', 'cam_U_y',      0              ),
        ( 'f4', 'cam_U_z',      0              ),
        ( 'f4', 'cam_V_x',      0              ),
        ( 'f4', 'cam_V_y',      0.828427       ),
        ( 'f4', 'cam_V_z',      0              ),
        ( 'f4', 'cam_W_x',      0              ),
        ( 'f4', 'cam_W_y',      0              ),
        ( 'f4', 'cam_W_z',      -2.0           ),
        ( 'u8', 'trav_handle',  trav_handle    )

And its equivalent in the header :

struct Params
    uchar4*                image;
    unsigned int           image_width;
    unsigned int           image_height;
    unsigned int           horse;
    float3                 cam_eye;
    float3                 cam_u, cam_v, cam_w;
    OptixTraversableHandle handle;

But then, without even touching the shaders, I end up with cuda illegal memory access error

Launching ... 
Traceback (most recent call last):
  File "", line 459, in <module>
  File "", line 448, in main
    pix              = launch( pipeline, sbt, gas_handle ) 
  File "", line 424, in launch
  File "cupy/cuda/stream.pyx", line 252, in
  File "cupy_backends/cuda/api/runtime.pyx", line 851, in cupy_backends.cuda.api.runtime.streamSynchronize
  File "cupy_backends/cuda/api/runtime.pyx", line 143, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered

But when I place the value « horse » at the end of all of the others, from both python params list and the header .h, the program works perfectly.

By printing the values in the raygen shader, the values added after « horse » seems to be shifted :

Launching ... 
handle: 0x7ff843000a0c
image: 0x7ff7b6400000
image_width: 1024
image_height: 768
horse: 0
cam_eye: (0.000000,2.000000,1.104570)
cam_u: (0.000000,0.000000,0.000000)
cam_v: (0.828427,0.000000,0.000000)
cam_w: (0.000000,-2.000000,0.000000)

Also, if I remove parameters (let’s say cam_U_x , cam_U_y and cam_U_z ) from both python params list and the header .h and I replace in the shaders raw value, the program is giving the same illegal memory access error.

It seems that the when I add or remove parameters, it is shifting the values in the memory or the program is not updating the params structure. Maybe something related with byte alignement.

Have you an idea of where it could come from ?

I didn’t see topics concerning PyOptiX, I am sorry if this is not the appropriate place.

Thank you in advance for your answers.

I need to check with the author how that Python wrapper handles structure field placements.

Usually when there are illegal or misaligned access errors when changing fields in a structure that can come from a mismatch of the fields’ alignment requirements.

CUDA has strict alignment requirements which must be fulfilled, like these for built-in vector types:
Search that CUDA programming manual for alignment, there are more requirements!

I usually order fields in structures which are used inside device code by their CUDA alignment requirements from big to small to avoid any unnecessary padding by the compilers. (Memory accesses affect the performance the most.)

In the above example the uchar4* and the OptixTraversableHandle types are both 64-bit and must lie on 8-byte aligned addresses.

So I would have written the structure like this:

struct Params
  // 8 byte aligned
  OptixTraversableHandle handle;
  uchar4* image;

  // 4 byte aligned
  float3 cam_eye;
  float3 cam_u, cam_v, cam_w;

  unsigned int image_width;
  unsigned int image_height;

  unsigned int horse;

It doesn’t matter where inside the 4 byte aligned fields you place the new unsigned int.
When you placed the new unsigned int before the OptixTraversableHandle, latter would have not ended up on an 8-byte offset inside the structure if the host compiler tightly packed the fields. So that would match your observations.

Usually C++ compilers do this right (and that is also why CUDA has a list supported host compilers in its release notes), but if Python packs the fields as tightly as possible, care needs to be taken to make sure the CUDA alignment requirements are correct.

That would also imply that when using arrays of structures you would need to make sure that each array element starts with the correct alignment again.
The C++ compilers usually take care of that, but it’s good practice to add an align instruction to the struct, in this case to 8 bytes:
struct __align__(8) Params
Not sure what the syntax for that is in Python.

You could also manually add padding fields inside the structure to make its size a multiple of the required alignment.
I use that method often:

Thank you a lot !

It was indeed a problem of byte alignement from the Python side.
I tested by changing the type of horse into an unsigned long long 64bits to respect the 8 bytes alignement and the problem disapear.

I created a little function that align the parameters in params by adding paddings when the alignement is not respected, so that it can compile everytime :

def align_params( params ):
	current_type = params[0][0]
	sequence_size = 0
	padid = 0
	params_aligned = []
	for i in range(len(params)):
		typep = params[i][0]
		if typep == current_type:
			sequence_size += ctypes.sizeof(typep)
			if sequence_size % 8 != 0:
				padsize = 8 - sequence_size % 8
				if padsize >= 4:
					params_aligned.append((ctypes.c_uint32, "pad"+str(padid), 0)); padid+=1
					padsize -= 4
				if padsize >= 2:
					params_aligned.append((ctypes.c_ushort, "pad"+str(padid), 0)); padid+=1
					padsize -= 2
				if padsize >= 1:
					params_aligned.append((ctypes.c_ubyte, "pad"+str(padid), 0)); padid+=1
			sequence_size = ctypes.sizeof(typep)
			current_type = typep

	return params_aligned

Thank you for your help and for all the documentations.

Mind that bigger types like float4 or uint4 are 16 byte aligned, in case you want to handle these as well.

yes, alignment is indeed an issue that pops up here. The numpy dtype struct allows for specification of padding, either per field or a min field padding. You can use this or use types with matching alignments (as you did) or rearrange your fields so that larger types always line up with their alignment boundaries (often by ordering types from largest to smallest in the struct).

Thank you a lot for the confirmation.

Sorting the parameters by size seems indeed a easier solution.
I replaced my first method with a sorting method in descendant order using this lambda function :

lambda type : (size of the primitive type) + ( 0.1 * (nb of element of the type))
example for unsigned int*: 8 + 0.1 * 1 = 8.1
example for float3: 4 + 0.1 * 3 = 4.3

Which leads with a params structure like this:

struct Params
    OptixTraversableHandle   handle; // 8.1
    uchar4*                  image; // 8.1
    float3                   cam_eye; // 4.3
    float3                   cam_u, cam_v, cam_w; // 4.3
    unsigned int             image_width; // 4.1
    unsigned int             image_height; // 4.1
    unsigned int             horse; // 4.1
    short4                   duck; //2.4

And it seems to work pretty well, hopping that it will help.
Thank you again for your help and documentations.