How do I determine how to maximize these blocks and grid for Jetson Xavier? Thanks in advance!
Example
dimBlock=dim3(32,32);
int yBlocks = eglFrame.height/dimBlock.y+((eglFrame.height%dimBlock.y)==0?0:1);
int xBlocks = eglFrame.width/dimBlock.x+((eglFrame.width%dimBlock.x)==0?0:1);
dimGrid=dim3(xBlocks,yBlocks);
You can get this information from the deviceQuery sample:
$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Xavier"
CUDA Driver Version / Runtime Version 11.4 / 11.4
CUDA Capability Major/Minor version number: 7.2
...
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
...
Number of thread block size that works for me is 32,32. What else would need to be changed when I change these values? I get this message if I use anything bigger(64,64 for example).
Cuda failure nvsample_cudaprocess.cu:945: invalid configuration argument. But I believe this is a side effect. How does this effect the grid size allowed? Here is my code snippet.
dimBlock=dim3(32,32); // this works fine using 64,64 fails
int yBlocks = eglFrame.height/dimBlock.y+((eglFrame.height%dimBlock.y)==0?0:1);
int xBlocks = eglFrame.width/dimBlock.x+((eglFrame.width%dimBlock.x)==0?0:1);
dimGrid=dim3(xBlocks,yBlocks);