I’ve been playing around with Cuda over the weekend whilst working through the docs on the site as well as “Cuda by Example”. I have a very simple question but don’t seem to be able to find the answer.
When I run a simple kernel on the device, from the book this starts with summing two vectors, how do I know how many parallel process I have running? For example when using DataSynapse etc, you know how many engines you have and how many tasks are running and how many are in the back log. When running threads on a CPU you know the core count.
Thanks in advance
You use tools such as “gpu caps viewer” to figure out how many “cores” you have on your gpu. Also cuda has functions which provide you with such info.
Thanks. I was also looking at the cuda profiler, am I right in thinking that the visual profiler is no longer in use? The latest docs only seem to refer to the command line profiler?
** Edit, ignore that I just found nvpp!
Yeah I’ve been playing with that, but I guess my question was more around running a given example and then being able to look at the metrics to see how well I was using the GPU for that given application
The Profiler will indeed show you the number of threads/block and number of blocks/grid. It’s a very useful tool for optimizing code by looking at memory I/O efficiency and tuning launch conditions for best occupancy.