I’m trying to use cuda dynamic parallelism feature, but I find that sometime child Grid only have 1 active thread, through I assign 256 threads to it.
The thing is that, I launch one child Grid and let every thread of it print their own thread id, some time I got:
worker 5 at sm 0 tid 0 is child grid
worker 5 at sm 0 tid 1 is child grid
worker 5 at sm 0 tid 2 is child grid
worker 5 at sm 0 tid 3 is child grid
worker 5 at sm 0 tid 4 is child grid
worker 5 at sm 0 tid 5 is child grid
worker 5 at sm 0 tid 6 is child grid
worker 5 at sm 0 tid 7 is child grid
worker 5 at sm 0 tid 8 is child grid
worker 5 at sm 0 tid 9 is child grid
worker 5 at sm 0 tid 10 is child grid
......
which indicates all threads output their tid, but some time I go
worker 5 at sm 1 tid 0 is child grid
worker 6 at sm 4 tid 0 is child grid
worker 7 at sm 2 tid 0 is child grid
worker 5 at sm 5 tid 0 is child grid
....
which means all block of child grid only have one active thread. I’m pretty sure two tests run the same code and all output is readed. What’s the wrong with it, is this a bug of CDP? My GPU is 1050 Ti nvidia-smi output is:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A |
| N/A 58C P0 N/A / N/A | 670MiB / 4040MiB | 6% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------
system ubuntu 18.04 with kernel 5.4.0-65-generic