i have a fucntion f(x), where X varies from 0x00000000 … 0xFFFFFFFF
the result f(x) compare with some constants and if equal, i have to write them down to file.
what is the best way to compute such task in cuda?
how i can pass the x to many threads?
if you have more than one: put all of them into an array (int x[…]), copy it to the gpu, let every thread read its “x” (meaning x[threadIdx.x+blockDim.x*blockIdx.x] or something like that) and compute the value, check against the constants (which you also copied to the device) and write back whether to save it or not and its value.
if it’s a single “x” and a function f() that’s taking quite a long time to compute, you will have to try to parallelize whatever f() does. (which in most cases will be a bit more tricky ;-))
first, you are most likely doing too much work in one thread…
second, if it freezes, use a smaller range, optimizie your code and then go up with the range again ;-)
here an example, how i would do it:
__global__ kernel(int offset){
int idx=threadIdx.x+blockIdx.x*blockDim.x+offset;
//compute...
}
int main(...){
//...
kernel<<<4096,256>>>(0x87000000);
//...
}
that’s 1M values you check, blockDim is just a guess, look what’s most efficient for you. you can also use 2 dimensional gridDims, if you want to have more threads.
measure the time it takes to do this, optimize (keywords: coalescing, shared memory, texture memory, constant memory).
once you’ve got an acceptable exec. time, crank the numbers up to spawn the range you want.
tanks, seems it freeses and didnt work, if execution time of single thread above ~5,5 seconds… with lover reanges which leads to lower time execution all ok,