I wish to collect statistics for particular memory sections. Consider a data structure like a B+ tree, I wish to collect the number of loads and stores to the data structure and the throughput requirements for the same. Is it possible using nsight or is there any other tool that can be used for the same?
Nsight Compute does not support collecting metrics only for specific memory ranges. If you can isolate access to this datastructure to a single kernel, and avoid other memory accesses within the same kernel, you can use Nsight Compute to profile only this individual kernel. While this will give you an estimate on the performance of the datastructure, it will be off for mixed-algorithm kernels due to caching effects. Alternatively, if your datastructure is accessed via specific functions, you can use Nsight Compute to count the number of warp- and thread-level instruction executions of the code related to the datastructure, and infer an estimate for the loads and stores from this. You can inspect those metrics on the Source page of the Nsight Compute UI.
If you are not interested in performance metrics, you can alternatively use the CUDA Sanitizer API to record loads and stores to specific memory addresses: https://docs.nvidia.com/cuda/compute-sanitizer/index.html
Thanks a lot. There is an instrumentation called SASSI provided by Nvidia itself. That tool can be used to check specific memory accesses. The CUDA Sanitizer API also looks promising. I will surely look into it.