nprof works to profile C++ CUDA executable, but not python with Pytorch code:
python -c “import torch; torch.randperm(10, device=‘cuda’)”
======== Warning: No CUDA application was profiled, exiting
According to How do I know randperm is performed on GPU - #2 by ptrblck - C++ - PyTorch Forums it should work. Is there anything else I need to configure ?
The pytorch code uses torch.cuda.profiler.cudart().cudaProfilerStart()/End(), but still nothing.
Thank you !