nvprof question

Hi,
I’m trying to get the memtransferhostmemtype property out of the profiler (mainly to understand if the memories I use are pinned or not) but I can’t get this to be shown…

Any idea?

The best I’ve managed (on linux) is this:
COMPUTE_PROFILE=1 nvprof --print-gpu-trace ./bandwidthTest

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 94651.4

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==6682== Profiling application: ./bandwidthTest
==6682== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput Device Context Stream Name
350.27ms 2.5616ms - - - - - 32.000MB 12.200GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
352.84ms 2.5595ms - - - - - 32.000MB 12.210GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
355.40ms 2.5590ms - - - - - 32.000MB 12.212GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
357.96ms 2.5604ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
360.52ms 2.5602ms - - - - - 32.000MB 12.206GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
363.08ms 2.5605ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
365.64ms 2.5582ms - - - - - 32.000MB 12.215GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
368.20ms 2.5624ms - - - - - 32.000MB 12.196GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
370.77ms 2.5615ms - - - - - 32.000MB 12.200GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
373.33ms 2.5629ms - - - - - 32.000MB 12.193GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
375.89ms 2.5605ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
378.45ms 2.5602ms - - - - - 32.000MB 12.206GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
381.02ms 2.5605ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]

on CUDA 9.2, I get completely different output than yours, and it seems to have the info you are looking for:

$ nvprof --print-gpu-trace /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

==18223== NVPROF is profiling process 18223, command: /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest
 Device 0: Tesla V100-PCIE-32GB
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     11837.4

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12389.9

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     728273.6

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==18223== Profiling application: /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest
==18223== Profiling result:
   Start  Duration            Grid Size      Block Size     Regs*    SSMem*    DSMem*      Size  Throughput  SrcMemType  DstMemType           Device   Context    Stream  Name
870.11ms  2.7645ms                    -               -         -         -         -  32.000MB  11.304GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
872.88ms  2.7621ms                    -               -         -         -         -  32.000MB  11.314GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
875.64ms  2.7626ms                    -               -         -         -         -  32.000MB  11.312GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
878.41ms  2.7639ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
881.18ms  2.7643ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
883.94ms  2.7639ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
886.71ms  2.7643ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
889.48ms  2.7617ms                    -               -         -         -         -  32.000MB  11.316GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
892.24ms  2.7648ms                    -               -         -         -         -  32.000MB  11.303GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
895.01ms  2.7629ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
897.78ms  2.7637ms                    -               -         -         -         -  32.000MB  11.307GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
900.54ms  2.7633ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
903.31ms  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
906.07ms  2.7638ms                    -               -         -         -         -  32.000MB  11.307GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
908.84ms  2.7633ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
911.61ms  2.7636ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
914.37ms  2.7627ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
917.14ms  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
919.91ms  2.7627ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
922.67ms  2.7643ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
925.44ms  2.7623ms                    -               -         -         -         -  32.000MB  11.313GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
928.21ms  2.7628ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
930.97ms  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
933.74ms  2.7639ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
936.50ms  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
939.27ms  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
942.04ms  2.7639ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
944.80ms  2.7623ms                    -               -         -         -         -  32.000MB  11.313GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
947.57ms  2.7634ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
950.34ms  2.7637ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
953.10ms  2.7637ms                    -               -         -         -         -  32.000MB  11.307GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
955.87ms  2.7643ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
958.64ms  2.7634ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
961.40ms  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
964.17ms  2.7623ms                    -               -         -         -         -  32.000MB  11.313GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
966.94ms  2.7642ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
969.70ms  2.7642ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
972.47ms  2.7628ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
975.24ms  2.7627ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
978.00ms  2.7632ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
980.77ms  2.7635ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
983.53ms  2.7637ms                    -               -         -         -         -  32.000MB  11.307GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
986.30ms  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
989.07ms  2.7633ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
991.83ms  2.7627ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
994.60ms  2.7621ms                    -               -         -         -         -  32.000MB  11.314GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
997.36ms  2.7627ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.00013s  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.00290s  2.7636ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.00566s  2.7622ms                    -               -         -         -         -  32.000MB  11.314GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.00843s  2.7803ms                    -               -         -         -         -  32.000MB  11.240GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.01121s  2.7810ms                    -               -         -         -         -  32.000MB  11.237GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.01400s  2.7641ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.01676s  2.7784ms                    -               -         -         -         -  32.000MB  11.247GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.01954s  2.7793ms                    -               -         -         -         -  32.000MB  11.244GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.02233s  2.7633ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.02509s  2.7638ms                    -               -         -         -         -  32.000MB  11.307GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.02786s  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.03063s  2.7797ms                    -               -         -         -         -  32.000MB  11.242GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.03341s  2.7628ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.03618s  2.7629ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.03894s  2.7824ms                    -               -         -         -         -  32.000MB  11.231GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.04173s  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.04449s  2.7633ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.04726s  2.7635ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.05003s  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.05279s  2.7631ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.05556s  2.7636ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.05833s  2.7619ms                    -               -         -         -         -  32.000MB  11.315GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.06109s  2.7622ms                    -               -         -         -         -  32.000MB  11.313GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.06386s  2.7642ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.06662s  2.7632ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.06939s  2.7633ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.07216s  2.7636ms                    -               -         -         -         -  32.000MB  11.308GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.07492s  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.07769s  2.7627ms                    -               -         -         -         -  32.000MB  11.312GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.08045s  2.7639ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.08322s  2.7638ms                    -               -         -         -         -  32.000MB  11.307GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.08599s  2.7640ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.08875s  2.7695ms                    -               -         -         -         -  32.000MB  11.284GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.09153s  2.7704ms                    -               -         -         -         -  32.000MB  11.280GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.09430s  2.7696ms                    -               -         -         -         -  32.000MB  11.283GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.09707s  2.7719ms                    -               -         -         -         -  32.000MB  11.274GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.09985s  2.7726ms                    -               -         -         -         -  32.000MB  11.271GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.10262s  2.7710ms                    -               -         -         -         -  32.000MB  11.277GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.10540s  2.7649ms                    -               -         -         -         -  32.000MB  11.302GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.10817s  2.7639ms                    -               -         -         -         -  32.000MB  11.306GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.11093s  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.11370s  2.7634ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.11647s  2.7624ms                    -               -         -         -         -  32.000MB  11.313GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.11923s  2.7627ms                    -               -         -         -         -  32.000MB  11.312GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.12200s  2.7625ms                    -               -         -         -         -  32.000MB  11.312GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.12476s  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.12753s  2.7628ms                    -               -         -         -         -  32.000MB  11.311GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.13029s  2.7630ms                    -               -         -         -         -  32.000MB  11.310GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.13306s  2.7639ms                    -               -         -         -         -  32.000MB  11.307GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.13583s  2.7632ms                    -               -         -         -         -  32.000MB  11.309GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.13859s  2.7619ms                    -               -         -         -         -  32.000MB  11.315GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.14136s  2.7623ms                    -               -         -         -         -  32.000MB  11.313GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.14412s  2.7642ms                    -               -         -         -         -  32.000MB  11.305GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.27095s  2.7474ms                    -               -         -         -         -  32.000MB  11.375GB/s      Pinned      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.27376s  2.6313ms                    -               -         -         -         -  32.000MB  11.876GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.27640s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.27904s  2.6420ms                    -               -         -         -         -  32.000MB  11.828GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.28169s  2.6303ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.28432s  2.6430ms                    -               -         -         -         -  32.000MB  11.824GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.28697s  2.6314ms                    -               -         -         -         -  32.000MB  11.876GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.28961s  2.6308ms                    -               -         -         -         -  32.000MB  11.879GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.29225s  2.6569ms                    -               -         -         -         -  32.000MB  11.762GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.29491s  2.6316ms                    -               -         -         -         -  32.000MB  11.875GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.29755s  2.6305ms                    -               -         -         -         -  32.000MB  11.880GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.30019s  2.6347ms                    -               -         -         -         -  32.000MB  11.861GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.30283s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.30546s  2.6312ms                    -               -         -         -         -  32.000MB  11.877GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.30810s  2.6313ms                    -               -         -         -         -  32.000MB  11.876GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.31074s  2.6471ms                    -               -         -         -         -  32.000MB  11.805GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.31339s  2.6434ms                    -               -         -         -         -  32.000MB  11.822GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.31604s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.31868s  2.6423ms                    -               -         -         -         -  32.000MB  11.827GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.32133s  2.6401ms                    -               -         -         -         -  32.000MB  11.837GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.32397s  2.6433ms                    -               -         -         -         -  32.000MB  11.822GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.32662s  2.6356ms                    -               -         -         -         -  32.000MB  11.857GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.32926s  2.6708ms                    -               -         -         -         -  32.000MB  11.701GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.33194s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.33458s  2.6313ms                    -               -         -         -         -  32.000MB  11.876GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.33722s  2.6362ms                    -               -         -         -         -  32.000MB  11.854GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.33986s  2.6405ms                    -               -         -         -         -  32.000MB  11.835GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.34250s  2.6391ms                    -               -         -         -         -  32.000MB  11.841GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.34515s  2.6323ms                    -               -         -         -         -  32.000MB  11.872GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.34779s  2.6430ms                    -               -         -         -         -  32.000MB  11.824GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.35044s  2.6447ms                    -               -         -         -         -  32.000MB  11.816GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.35309s  2.6308ms                    -               -         -         -         -  32.000MB  11.879GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.35573s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.35836s  2.6301ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.36100s  2.6314ms                    -               -         -         -         -  32.000MB  11.876GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.36364s  2.6312ms                    -               -         -         -         -  32.000MB  11.877GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.36627s  2.6307ms                    -               -         -         -         -  32.000MB  11.879GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.36891s  2.6332ms                    -               -         -         -         -  32.000MB  11.868GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.37155s  2.6341ms                    -               -         -         -         -  32.000MB  11.864GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.37419s  2.6328ms                    -               -         -         -         -  32.000MB  11.870GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.37683s  2.6315ms                    -               -         -         -         -  32.000MB  11.875GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.37947s  2.6432ms                    -               -         -         -         -  32.000MB  11.823GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.38212s  2.6432ms                    -               -         -         -         -  32.000MB  11.823GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.38477s  2.6345ms                    -               -         -         -         -  32.000MB  11.862GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.38741s  2.6430ms                    -               -         -         -         -  32.000MB  11.824GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.39006s  2.6383ms                    -               -         -         -         -  32.000MB  11.845GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.39270s  2.6322ms                    -               -         -         -         -  32.000MB  11.872GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.39534s  2.6301ms                    -               -         -         -         -  32.000MB  11.882GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.39798s  2.6304ms                    -               -         -         -         -  32.000MB  11.880GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.40061s  2.6565ms                    -               -         -         -         -  32.000MB  11.764GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.40328s  2.6322ms                    -               -         -         -         -  32.000MB  11.872GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.40591s  2.6310ms                    -               -         -         -         -  32.000MB  11.877GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.40855s  2.6419ms                    -               -         -         -         -  32.000MB  11.829GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.41120s  2.6722ms                    -               -         -         -         -  32.000MB  11.695GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.41388s  2.6593ms                    -               -         -         -         -  32.000MB  11.751GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.41654s  2.6306ms                    -               -         -         -         -  32.000MB  11.880GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.41918s  2.6436ms                    -               -         -         -         -  32.000MB  11.821GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.42183s  2.6301ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.42447s  2.6308ms                    -               -         -         -         -  32.000MB  11.878GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.42710s  2.6338ms                    -               -         -         -         -  32.000MB  11.865GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.42974s  2.7005ms                    -               -         -         -         -  32.000MB  11.572GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.43245s  2.6275ms                    -               -         -         -         -  32.000MB  11.893GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.43509s  2.6275ms                    -               -         -         -         -  32.000MB  11.893GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.43772s  2.6261ms                    -               -         -         -         -  32.000MB  11.900GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.44035s  2.6434ms                    -               -         -         -         -  32.000MB  11.822GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.44300s  2.6259ms                    -               -         -         -         -  32.000MB  11.901GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.44563s  2.6811ms                    -               -         -         -         -  32.000MB  11.656GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.44832s  2.6368ms                    -               -         -         -         -  32.000MB  11.852GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.45096s  2.6431ms                    -               -         -         -         -  32.000MB  11.823GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.45361s  2.6301ms                    -               -         -         -         -  32.000MB  11.882GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.45625s  2.6380ms                    -               -         -         -         -  32.000MB  11.846GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.45889s  2.6301ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.46153s  2.6316ms                    -               -         -         -         -  32.000MB  11.875GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.46417s  2.6316ms                    -               -         -         -         -  32.000MB  11.875GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.46681s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.46944s  2.6301ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.47208s  2.6365ms                    -               -         -         -         -  32.000MB  11.853GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.47472s  2.6429ms                    -               -         -         -         -  32.000MB  11.824GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.47737s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.48001s  2.6363ms                    -               -         -         -         -  32.000MB  11.854GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.48265s  2.6432ms                    -               -         -         -         -  32.000MB  11.823GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.48530s  2.6396ms                    -               -         -         -         -  32.000MB  11.839GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.48794s  2.6365ms                    -               -         -         -         -  32.000MB  11.853GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.49059s  2.6301ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.49322s  2.6313ms                    -               -         -         -         -  32.000MB  11.876GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.49586s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.49850s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.50113s  2.6551ms                    -               -         -         -         -  32.000MB  11.770GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.50380s  2.6377ms                    -               -         -         -         -  32.000MB  11.847GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.50644s  2.6430ms                    -               -         -         -         -  32.000MB  11.824GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.50909s  2.6430ms                    -               -         -         -         -  32.000MB  11.824GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.51174s  2.6417ms                    -               -         -         -         -  32.000MB  11.830GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.51439s  2.6518ms                    -               -         -         -         -  32.000MB  11.785GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.51704s  2.6430ms                    -               -         -         -         -  32.000MB  11.824GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.51969s  2.6313ms                    -               -         -         -         -  32.000MB  11.876GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.52233s  2.6309ms                    -               -         -         -         -  32.000MB  11.878GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.52497s  2.6338ms                    -               -         -         -         -  32.000MB  11.865GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.52761s  2.6308ms                    -               -         -         -         -  32.000MB  11.878GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.53025s  2.6983ms                    -               -         -         -         -  32.000MB  11.581GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.53295s  2.6302ms                    -               -         -         -         -  32.000MB  11.881GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.53559s  2.6301ms                    -               -         -         -         -  32.000MB  11.882GB/s      Device      Pinned  Tesla V100-PCIE         1         7  [CUDA memcpy DtoH]
1.61703s  8.7750ms                    -               -         -         -         -  32.000MB  3.5613GB/s    Pageable      Device  Tesla V100-PCIE         1         7  [CUDA memcpy HtoD]
1.62588s  93.571us                    -               -         -         -         -  32.000MB  333.97GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62597s  88.290us                    -               -         -         -         -  32.000MB  353.95GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62606s  87.778us                    -               -         -         -         -  32.000MB  356.01GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62615s  88.258us                    -               -         -         -         -  32.000MB  354.08GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62624s  88.450us                    -               -         -         -         -  32.000MB  353.31GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62633s  88.547us                    -               -         -         -         -  32.000MB  352.92GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62642s  88.898us                    -               -         -         -         -  32.000MB  351.53GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62651s  88.226us                    -               -         -         -         -  32.000MB  354.20GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62660s  88.162us                    -               -         -         -         -  32.000MB  354.46GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62669s  88.322us                    -               -         -         -         -  32.000MB  353.82GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62678s  88.035us                    -               -         -         -         -  32.000MB  354.97GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62686s  87.810us                    -               -         -         -         -  32.000MB  355.88GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62695s  87.874us                    -               -         -         -         -  32.000MB  355.62GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62704s  88.994us                    -               -         -         -         -  32.000MB  351.15GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62713s  87.778us                    -               -         -         -         -  32.000MB  356.01GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62722s  88.003us                    -               -         -         -         -  32.000MB  355.10GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62731s  88.290us                    -               -         -         -         -  32.000MB  353.95GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62740s  88.098us                    -               -         -         -         -  32.000MB  354.72GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62749s  88.290us                    -               -         -         -         -  32.000MB  353.95GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62758s  88.483us                    -               -         -         -         -  32.000MB  353.18GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62767s  88.322us                    -               -         -         -         -  32.000MB  353.82GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62776s  88.450us                    -               -         -         -         -  32.000MB  353.31GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62785s  88.834us                    -               -         -         -         -  32.000MB  351.78GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62794s  88.514us                    -               -         -         -         -  32.000MB  353.05GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62802s  88.547us                    -               -         -         -         -  32.000MB  352.92GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62811s  88.226us                    -               -         -         -         -  32.000MB  354.20GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62820s  88.194us                    -               -         -         -         -  32.000MB  354.33GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62829s  88.898us                    -               -         -         -         -  32.000MB  351.53GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62838s  88.034us                    -               -         -         -         -  32.000MB  354.98GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62847s  88.323us                    -               -         -         -         -  32.000MB  353.81GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62856s  88.546us                    -               -         -         -         -  32.000MB  352.92GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62865s  87.970us                    -               -         -         -         -  32.000MB  355.23GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62874s  88.002us                    -               -         -         -         -  32.000MB  355.11GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62883s  88.258us                    -               -         -         -         -  32.000MB  354.08GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62892s  88.098us                    -               -         -         -         -  32.000MB  354.72GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62901s  88.130us                    -               -         -         -         -  32.000MB  354.59GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62909s  88.194us                    -               -         -         -         -  32.000MB  354.33GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62918s  88.578us                    -               -         -         -         -  32.000MB  352.80GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62927s  88.675us                    -               -         -         -         -  32.000MB  352.41GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62936s  88.034us                    -               -         -         -         -  32.000MB  354.98GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62945s  88.290us                    -               -         -         -         -  32.000MB  353.95GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62954s  88.482us                    -               -         -         -         -  32.000MB  353.18GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62963s  88.130us                    -               -         -         -         -  32.000MB  354.59GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62972s  88.131us                    -               -         -         -         -  32.000MB  354.59GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62981s  88.194us                    -               -         -         -         -  32.000MB  354.33GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62990s  88.514us                    -               -         -         -         -  32.000MB  353.05GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.62999s  87.842us                    -               -         -         -         -  32.000MB  355.75GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63008s  87.746us                    -               -         -         -         -  32.000MB  356.14GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63016s  87.843us                    -               -         -         -         -  32.000MB  355.75GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63025s  88.066us                    -               -         -         -         -  32.000MB  354.85GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63034s  88.450us                    -               -         -         -         -  32.000MB  353.31GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63043s  88.610us                    -               -         -         -         -  32.000MB  352.67GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63052s  88.546us                    -               -         -         -         -  32.000MB  352.92GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63061s  88.163us                    -               -         -         -         -  32.000MB  354.46GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63070s  89.058us                    -               -         -         -         -  32.000MB  350.89GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63079s  88.898us                    -               -         -         -         -  32.000MB  351.53GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63088s  87.874us                    -               -         -         -         -  32.000MB  355.62GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63097s  88.515us                    -               -         -         -         -  32.000MB  353.05GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63106s  88.386us                    -               -         -         -         -  32.000MB  353.56GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63115s  88.610us                    -               -         -         -         -  32.000MB  352.67GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63124s  88.098us                    -               -         -         -         -  32.000MB  354.72GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63133s  88.226us                    -               -         -         -         -  32.000MB  354.20GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63141s  88.707us                    -               -         -         -         -  32.000MB  352.28GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63150s  88.546us                    -               -         -         -         -  32.000MB  352.92GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63159s  88.194us                    -               -         -         -         -  32.000MB  354.33GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63168s  88.322us                    -               -         -         -         -  32.000MB  353.82GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63177s  88.162us                    -               -         -         -         -  32.000MB  354.46GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63186s  88.227us                    -               -         -         -         -  32.000MB  354.20GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63195s  88.226us                    -               -         -         -         -  32.000MB  354.20GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63204s  88.738us                    -               -         -         -         -  32.000MB  352.16GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63213s  87.842us                    -               -         -         -         -  32.000MB  355.75GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63222s  87.715us                    -               -         -         -         -  32.000MB  356.27GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63231s  88.418us                    -               -         -         -         -  32.000MB  353.43GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63240s  88.194us                    -               -         -         -         -  32.000MB  354.33GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63248s  88.546us                    -               -         -         -         -  32.000MB  352.92GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63257s  88.066us                    -               -         -         -         -  32.000MB  354.85GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63266s  88.355us                    -               -         -         -         -  32.000MB  353.69GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63275s  88.162us                    -               -         -         -         -  32.000MB  354.46GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63284s  88.482us                    -               -         -         -         -  32.000MB  353.18GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63293s  88.610us                    -               -         -         -         -  32.000MB  352.67GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63302s  88.130us                    -               -         -         -         -  32.000MB  354.59GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63311s  88.867us                    -               -         -         -         -  32.000MB  351.65GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63320s  88.386us                    -               -         -         -         -  32.000MB  353.56GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63329s  88.258us                    -               -         -         -         -  32.000MB  354.08GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63338s  89.058us                    -               -         -         -         -  32.000MB  350.89GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63347s  88.483us                    -               -         -         -         -  32.000MB  353.18GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63356s  88.546us                    -               -         -         -         -  32.000MB  352.92GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63365s  87.970us                    -               -         -         -         -  32.000MB  355.23GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63374s  88.162us                    -               -         -         -         -  32.000MB  354.46GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63383s  88.034us                    -               -         -         -         -  32.000MB  354.98GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63391s  87.875us                    -               -         -         -         -  32.000MB  355.62GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63400s  88.258us                    -               -         -         -         -  32.000MB  354.08GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63409s  87.682us                    -               -         -         -         -  32.000MB  356.40GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63418s  88.226us                    -               -         -         -         -  32.000MB  354.20GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63427s  87.938us                    -               -         -         -         -  32.000MB  355.36GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63436s  89.091us                    -               -         -         -         -  32.000MB  350.76GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63445s  88.802us                    -               -         -         -         -  32.000MB  351.91GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63454s  87.970us                    -               -         -         -         -  32.000MB  355.23GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63463s  88.802us                    -               -         -         -         -  32.000MB  351.91GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]
1.63472s  88.258us                    -               -         -         -         -  32.000MB  354.08GB/s      Device      Device  Tesla V100-PCIE         1         7  [CUDA memcpy DtoD]

Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows.
SSMem: Static shared memory allocated per CUDA block.
DSMem: Dynamic shared memory allocated per CUDA block.
SrcMemType: The type of source memory accessed by memory operation/copy
DstMemType: The type of destination memory accessed by memory operation/copy
$

Note that according to the CUDA 9.2 programming guide, the COMPUTE_PROFILER environment variable has no effect on nvprof behavior:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

Thanks a lot :)
I was using CUDA 8.0 (although the document says it should work).
On a 9.1 it yields the same output as yours.

Eyal