hello, my name is asaf na dim new with cuda.
i have a geForce GTX 1080 using visual studio and try to tun this code:
global void test(int *x)
typedef unsigned long long ull;
dim3 dimBlock(8, 32,4);
dim3 dimGrid(8000, 3, 51615);
the code do nothing special and its only for testing,
as you can see, the kernal is empty and yet the compiling take to much time(around 12 sec).
when i changed the grid to “dim3 dimGrid(1,1,1)” it run very fast.
i would like to know few things:
- i can see that im ok with the limits of grid and block(1024 threads in the block and the grid can be
around 65,000 blocks each dimension so what is the problem? is there any differnet between grid(1,1,1) or
- how to calculate the correct bounded limits for best performance?
- if needed so how to calculate how many gpu’s do i need for best performance?
thanks a lot,