cuLaunchKernel with variable size parameters

Hello everyone,

I’m trying to develop a modular architecture on Jetson TX 2 (Jetpack 4.3) for running user pre-defined kernels, so I need my code to be the most generic as possible.
I’m having issues in running a CUDA kernel by calling the “cuLaunchKernel” driver API function:

  • I have different pre-compiled .cubin kernels
  • all the kernels have a fixed number of parameters + a variable number of variable-size parameters

The kernels that do not require the variable part of the parameters are working fine.
The kernels that require the variable part always return CUDA_ERROR_INVALID_VALUE.

Here is a snippet of how the fixed-size parameters are assigned, and how the kernel is launched (using the “extra” parameter) as shown in the MatrixMulDrv.cpp sample.

*(reinterpret_cast<int *>(&argBuffer[offset])) = my_parameter;
 offset += sizeof(my_parameter);
//...
void *kernel_launch_config[5] = {
    CU_LAUNCH_PARAM_BUFFER_POINTER, static_cast<void*>(&argBuffer[0]),
    CU_LAUNCH_PARAM_BUFFER_SIZE, &offset,
    CU_LAUNCH_PARAM_END};
cuLaunchKernel(func, grid.x, grid.y, grid.z, block.x, block.y, block.z,
        sharedMemB, hStream, NULL, reinterpret_cast<void**>(kernel_launch_config));

Since this is not applicable for variable-size parameters, I’m using a std::vector of chars, which was resized to the total size of the user parameters, and copying it into the buffer:

copy(userparams.begin(), userparams.begin() + usersize, *(reinterpret_cast<char**>(&argBuffer[offset])));

But as explained this makes a sort of malformation of the parameters and the kernel launch fails.
Why is this happening??
Do you have any suggestion on how to implement this variable-size parameter launch?

Please help!
Thanks,
Andrea