NvEncode fails cuda-memcheck

Trying to validate my program, cuda-memcheck reports issues inside of the nvEncEncodePicture call.

I was wondering if somehow may I been feeding the API with some sort of misaligned buffers, or is it there some other parameter I may have ignored?

========= Invalid __global__ write of size 4
=========     at 0x00000cb8 in parallelReductionAdd
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7fd34fc00600 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchGrid + 0x18a) [0x255a9a]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 [0xd6eda]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 [0xd7484]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 [0x9a8b5]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0x9057]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0x9275]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0xb2f9]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0xb457]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0x78cb]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0x797e]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0x7e7b]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0x6b09]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 [0x12edb]
=========     my code locations here

I am using nvEncode 8.2 in linux and the session is initialized on a CUDA device.

Line #4 gives a good indication of the problem: the very first thread is already unable to write to memory.
Double-check the pointer being passed, memory allocation, index going out of bound…

Thanks saulocpp for your answer, I was dead worried about the pointer alignment but I am afraid this is an internal issue in the nvencode API, after all, there is not much more to do in my side than to allocate a buffer of WxHxChannels and pass it to the API.

I continued investigating this bug. and I found something:
I tested the following geometries as input buffer sizes:

  {32, 32}, // this size locks the encoder
  {140, 140},
   {225, 225}, // this geometry crashes cuda-memcheck
  {250, 250}, // this geometry crashes cuda-memcheck
  {800, 600},
  {800 * 2, 600},
  {800, 600 * 2},
  {1920, 1080},
  {1920 * 2, 1080},
  {1920, 1080 * 2},
  {2560, 1440},
  {2560, 2560}};

It seems that the internal implementation of nvEncEncodePicture is geometry-sensible, and certain buffer sizes are just not supported.

It would be great if this would be a documented feature, as so far there is no way to validate the input buffer sizes, and the API will just crash instead of gently reporting an error.