I’ve seen a common pattern with many open source projects (Deep Learning libraries, etc) in which developers manually list the possible compute capabilities somewhere in their CMakeList.txt file. This is not optimal as it means that the file has to be continually updated as devices with new compute capabilities are released. It also means source packages released before a certain compute capabilities is released will never use that capability, even when compiled with supported hardware and an up-to-date nvcc compiler. Another problem is that sometimes framework developers are simply slow to update their CMakeList file which means that NVIDIA customers with cutting-edge hardware are not able to use the hardware’s features.
I was wondering if there was an easy way from CMake that we could retrieve a supported, up-to-date list of compute capabilities from the installed compiler.
A few examples of what I’m talking about:
Note, none of these libraries (at the time of this writing) have been updated to sm75, which means users of pytorch, etc are going to be slow to receive optimizations.
Can we use CUDA_SELECT_NVCC_ARCH_FLAGS for this purpose? Are there any examples of projects using this option>