I noticed that the accelerator enabled program works smoothly even without using acc_set_device( )& call acc_init( ) to initialize GPU card. Then why we have to add these two functions in the code? thanks!
If you have multiple GPUs attached you can select which you want to use. For the init it might perform the initialisation that is normally done at the first GPU-related call (you might want to do this for timing purposes, for example).
Then why we have to add these two functions in the code?
They’re optional. The set device is only needed when selecting a device other then the default.
The init call is useful for performance timing. By default the device is initialized on first use, but since the initialization time is approximately 1 second per attached device (Linux only), this can have impact on timing. Placing the acc_init call outside of your timers removes this overhead.
acc_set_device() is useful for when you have multiple devices. Say, two Tesla cards or a Tesla S-box. With that command you can set which device you want to run on, so you can do multi-GPU jobs, say, or test how your code runs on different devices with different compute capabilities in the same machine. If you don’t use acc_set_device() on a multi-device machine, you’ll always get the “default” device, which I think is device 0…though which device is 0 might not be what you think should be “default”.
As for acc_init(), in the old days before PGI autoinitialized, I used acc_init() early in a program so that I could “hide” the initialization spinup from timing routines. (ETA: Looks like this is still the case at times. Thanks for letting me know, Mat.)
Looks like this is still the case at times.
Nothing’s changed here. Though we did add an external utility “pgcudainit” which holds the device open and eliminates the initialization cost.
Hmm. I might have encountered something with this. I found that I have to call cudaThreadExit() before setting a device sometimes. If I don’t, it says there is a context already running or something. Maybe due to the “pre-initialization” routine?
“acc_init” is for the PGI Accelerator model while cudaThreadExit if for CUDA Fortran so I’m not sure how mixing the two effects things. We starting work on allowing the two models to be mixed so please feel free to send me an example of the behavior you’re seeing.
Oh, no, I’m not mixing them. I was reacting to your comment about “…holds the device open and eliminates the initialization cost”. I wondered if I was seeing something like that when I had to cudaThreadExit() before cudaSetDevice’ing. That is, some initialization routine higher than my code had established a context upon, say, “use cudafor” that I had to exit before setting the device.