cudaErrorInvalidValue returned on GTX 680X

I compiled and ran addWithCuda sample from Toolkit 5.5. Tested the sample on GTX 275 and 680X. The sample worked as expected on both computers. Then I changed the sample to call addWithCuda from a different Windows thread. It worked correctly on GTX275, but returned error 11 on GTX 680X.

I tried to simplify the kernel to an empty procedure without parameters, but it always returns cudaErrorInvalidValue if launched not from the main thread on 680X. I also experimented by compiling for different Compute Capability, including 1.3 and 3.0, but… no success. Stuck for 2 days, please help!

Here is code that executes addWithCuda in a separate thread:

boost::thread consumer([&]() {
  const int arraySize = 5;
  const int a[arraySize] = { 1, 2, 3, 4, 5 };
  const int b[arraySize] = { 10, 20, 30, 40, 50 };
  int c[arraySize] = { 0 };

  err = addWithCuda(c, a, b, arraySize);
  if (err != cudaSuccess) {
    fprintf(stderr, "addWithCuda failed, err=%d!", err);
    return 1;
  }
  printf("{1,2,3,4,5} + {10,20,30,40,50} = {%d,%d,%d,%d,%d}\n", c[0], c[1], c[2], c[3], c[4]);
});

Any ideas?

try adding:

cudaSetDevice(0);

immediately prior to your line that calls addWithCuda

Thank you for the suggestion! cudaSetDevice(0) is the first line in the addWithCuda procedure. So, adding it one more time did not help. However, after experimenting with the code further I found a workaround. I noticed if I call addWithCuda in a loop, it return “invalid parameter” only very first time, and all later calls succeed!

I’m curious if this a documented behavior. Also, I found that both GTX 275 and 680X are actually consistent, i.e. first call to any kernel from not a main thread returns the error 11, and later calls work. What a weird issue… Just a time killer.