I would like to set the number of threads in a block and number of blocks from the python script. If it is C/C++ code it is straight forward all I do is specify it when I invoke the kernel. What would be that equivalent in python ? Any pointers would be greatly appreciated.
TensorFlow’s API exposes a fairly high level of abstraction. As such, setting CUDA kernel launch configurations is something the framework takes care of for you. If you want to tweak that kind of low-level functionality, you would need to modify the C++ source.
Thank you for the reply. I understand that TensorFlow keeps the most optimized values for these parameters. Can I know what is the value it is set to ?