Can I use GPU ids enumerated from NVAPI to create a CUDA device? I am hoping that I can use the result of NvAPI_EnumPhysicalGPUs to get a device handle via function call cuDeviceGet. I need the ability to target a specific GPU for computation while not in SLI mode.
I do not believe this is the case; CUDA device IDs are CUDA specific because we reserve the right to reorder them at our sole discretion (as we have done with 2.1).
“Frame Rendering - Ability to control Video and DX rendering not available in DX runtime.”
What’s that?
“GPU Topology - Ability to enable SLI and Hybrid GPU topologies.”
Can that be used by a CUDA app to temporarily control SLI?
“GPU Management - Enumeration of physical and logical GPUs. Thermal and Cooling controls.”
I suppose this is very useful, to monitor/control temps and fans to ensure reliable calculation. (High temps == bit errors.) Does such a thing also exist on Linux? Does it work with Tesla?
How does high temperature influence bit error probability?
Regarding CUDA device enumeration:
I can use cuD3D9GetDevice() to retrieve CUDA device number, as long as an adapter identifier for the device exist. So far this approach works for video cards. Tesla cards don’t have these identifiers.
Have you come across any techniques to correlate pci device information to CUDA device?
Very simply. The higher the temperature, the higher the probability of a bit error. (Remember people who use liquid nitrogen for overclocking? By lowering the temperature significantly, they suppress the very high error rates that extreme overclocking would induce. It’s a direct relationship.)
Incidentally, the problem with consumer cards is that they’re tuned to be quiet, and don’t spin up their fans until the card gets very hot. This is why I find this API very significant. Is there really no equivalent on Linux?
Btw, why is the normal CUDA device enumeration API insufficient for you?
To my knowledge, the nvidia-settings tool is open-source (and I know it displays the temperature), so you can just look at what it does (or otherwise use strace etc. to find out).
I am quite sure that the API it uses is public and documented, and nvclock uses it to change clock frequencies, so maybe look at the nvclock source, too.
I think it might only work when an X server is running, as I understood it the so-called “NV-CONTROL” stuff is an X protocol extension.
I am writing a non-interactive GPU-computing program using an architecture that enumerates video cards, with information such as PCI ids, display ids, bus id, slot id, etc. The user of this application runs GPU computing by selecting a video device provided by the architecture. With video cards, I can enumerate CUDA device using a Direct3D device, because I have display id to identify a video card uniquely. Unfortunately this approach does not work with Tesla cards. Other than going the Direct3D route, I can’t find any unique CUDA device information from the CUDA enumeration API.
Is this still true? No correlation between CUDA device ID and NVAPI stuff?
How fixed are CUDA device ID v.s. NVAPI IDs? I’m thinking that one could do a once-off map on a given machine (some soft of GPU burn maybe) to figure out which device is which, and use that for further use.