Question about number of threads

Hi, I’m new to Cuda and try to get my head aroundt it. Excuse me if my question is stupid, but if one block can manage 1024 threads, but I have an array with, let’s say 1028 (257*4) items, how do i tell the kernel not to run 2048 threads?
dim3 block(4, 256)
dim3 grid(2)
I ask because I allocated memeory for those 1028 items, then printed from the kernel what the value of each item was, and it printed out 2048 items although there only was 1028. Why is there no access violation when the kernel writes to item[1028] or [1758] that is not there, and how can it be set to a value?

__global__ void mykernel(..., size_t data_size){
  int idx=threadIdx.x+blockDim.x*blockIdx.x;
  if (idx < data_size){
    // profit
    }
}

const int ds = 1028;
dim3 block(1024);
dim3 grid((ds+block.x-1)/block.x);
mykernel<<<grid,block>>>(..., ds);

That’s an example for the 1D grid/threadblock case. You could easily extend it to 2D. Regarding the second question, if you want to see the access violation use proper cuda error checking (don’t know what that is? google “proper cuda error checking” and take the first hit, then read it and apply it to your code) and run your code with cuda-memcheck.

Small (not very far beyond the end of an array, for example) violations don’t necessarily trigger the runtime access violation mechanism either in CPU code or GPU code. But cuda-memcheck will find them in GPU code, just as valgrind (or similar) will find them in CPU code.

Great! Thanks!

I tried cuda-memcheck and found a number of issues.
I was using cudaError_t that comes with Nvidias Visual Studio extention, and that did not find any problems.
Now I wonder, dear Nvidia, why provide an error-finding tool that does not find all errors?

There is a difference between the errors that can be caught by the runtime mechanism, and the errors that can be caught by a special tool like cuda-memcheck. I already indicated that in my previous comment. And a similar comment can be made about CPU codes. You can do illegal things in CPU code that won’t throw any sort of error unless you use a tool like valgrind.

That might be the case. However, I run my code from inside VS, that is from some preset environment. If I can use a tool from the command prompt that calls my application, than I should also be able to call some tool from the application. It should be possible, it’s just not been done. Nvidia tries to make Cuda accessible to users that have no time to become experts in Cuda (like me) but who are experts in a different field and need Cuda to accelerate their application. Cuda documentation in general might be thrilling reading to insiders, but are terrible for outsiders like me. And a error-finding tool that actually finds all errors would be a thing that would make the world a nicer place. Lucky, though, that there are guys like you, that use their time and skills to help guys like me. Thank you again.

That might be the case. However, I run my code from inside VS, that is from some preset environment. If I can use a tool from the command prompt that calls my application, than I should also be able to call some tool from the application. It should be possible, it’s just not been done. Nvidia tries to make Cuda accessible to users that have no time to become experts in Cuda (like me) but who are experts in a different field and need Cuda to accelerate their application. Cuda documentation in general might be thrilling reading to insiders, but are terrible for outsiders like me. And a error-finding tool that actually finds all errors would be a thing that would make the world a nicer place. Lucky, though, that there are guys like you, that use their time and skills to help guys like me. Thank you again.

That might be the case. However, I run my code from inside VS, that is from some preset environment. If I can use a tool from the command prompt that calls my application, than I should also be able to call some tool from the application. It should be possible, it’s just not been done. Nvidia tries to make Cuda accessible to users that have no time to become experts in Cuda (like me) but who are experts in a different field and need Cuda to accelerate their application. Cuda documentation in general might be thrilling reading to insiders, but are terrible for outsiders like me. And a error-finding tool that actually finds all errors would be a thing that would make the world a nicer place. Lucky, though, that there are guys like you, that use their time and skills to help guys like me. Thank you again.

You can turn on a memory checking function from within VS that does essentially the same thing as cuda-memcheck:

http://developer.download.nvidia.com/NsightVisualStudio/2.2/Documentation/UserGuide/HTML/Content/Use_Memory_Checker.htm

Note that there will never be tools that can “find all the errors”. Some tools will find more errors than others, and the ones that can find more errors often also produce more false positives (and some expertise is required to separate those out).

The VS cuda memory checker does not find any problems, but cuda-memcheck does:
Program hit cudaErrorLaunchFailure (error 4) due to “unspecified launch failure” on CUDA API call to cudaMemcpy.
========= Saved host backtrace up to driver entry point at error

Where can I find that backtrace?
cuda-memcheck MMCuda05cv.exe workes fine but cuda-memcheck --show-backtrace MMCuda05cv.exe only produces a general help page. Is my syntax wrong?

You need to do something like:

–show-backtrace yes

http://docs.nvidia.com/cuda/cuda-memcheck/index.html#command-line-options

http://docs.nvidia.com/cuda/cuda-memcheck/index.html#stack-backtraces

The default is yes, so the backtrace should already be displayed. The “saved” here has a special meaning. Please read the docs.