CUDA Pro Tip: The Fast Way to Query Device Properties

Originally published at:

CUDA applications often need to know the maximum available shared memory per block or to query the number of multiprocessors in the active GPU. One way to do this is by calling cudaGetDeviceProperties(). Unfortunately, calling this function inside a performance-critical section of your code lead to huge slowdowns, depending on your code. We found out…

CUDA should provide an balance between everything per API (cudaGetDeviceProperties) v/s one query per API (cudaDeviceGetAttribute()).
An API with dynamic number of queries would allow user to choose what it needs without multiple function calls.
for an example: vulkan has

Another suggestion, Avoid doing a PCIE read access for many of the properties which are constant and can be cached at cuInit/ cuDevice creation.

i regular read your website all the blog of theses websites is amazing

Using Titan V, cuda V10.1.105, driver version 418.56, I got, for example,

cudaGetDeviceProperties -> 194847us
cudaDeviceGetAttribute -> 160394us

They are quite similar. Why?

Interesting. Did you run the code in the post as-is, or did you query different attributes using cudaGetDeviceAttribute ?

Yes, I run the code in the post as it is.

Where I find the names for each property (e.g. warp size, etc)?