Hi,
I’m trying to get the memtransferhostmemtype property out of the profiler (mainly to understand if the memories I use are pinned or not) but I can’t get this to be shown…
Any idea?
The best I’ve managed (on linux) is this:
COMPUTE_PROFILE=1 nvprof --print-gpu-trace ./bandwidthTest
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 94651.4
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==6682== Profiling application: ./bandwidthTest
==6682== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput Device Context Stream Name
350.27ms 2.5616ms - - - - - 32.000MB 12.200GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
352.84ms 2.5595ms - - - - - 32.000MB 12.210GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
355.40ms 2.5590ms - - - - - 32.000MB 12.212GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
357.96ms 2.5604ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
360.52ms 2.5602ms - - - - - 32.000MB 12.206GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
363.08ms 2.5605ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
365.64ms 2.5582ms - - - - - 32.000MB 12.215GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
368.20ms 2.5624ms - - - - - 32.000MB 12.196GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
370.77ms 2.5615ms - - - - - 32.000MB 12.200GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
373.33ms 2.5629ms - - - - - 32.000MB 12.193GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
375.89ms 2.5605ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
378.45ms 2.5602ms - - - - - 32.000MB 12.206GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
381.02ms 2.5605ms - - - - - 32.000MB 12.205GB/s GeForce GTX 105 1 7 [CUDA memcpy HtoD]
on CUDA 9.2, I get completely different output than yours, and it seems to have the info you are looking for:
$ nvprof --print-gpu-trace /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
==18223== NVPROF is profiling process 18223, command: /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest
Device 0: Tesla V100-PCIE-32GB
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 11837.4
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12389.9
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 728273.6
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==18223== Profiling application: /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest
==18223== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name
870.11ms 2.7645ms - - - - - 32.000MB 11.304GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
872.88ms 2.7621ms - - - - - 32.000MB 11.314GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
875.64ms 2.7626ms - - - - - 32.000MB 11.312GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
878.41ms 2.7639ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
881.18ms 2.7643ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
883.94ms 2.7639ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
886.71ms 2.7643ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
889.48ms 2.7617ms - - - - - 32.000MB 11.316GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
892.24ms 2.7648ms - - - - - 32.000MB 11.303GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
895.01ms 2.7629ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
897.78ms 2.7637ms - - - - - 32.000MB 11.307GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
900.54ms 2.7633ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
903.31ms 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
906.07ms 2.7638ms - - - - - 32.000MB 11.307GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
908.84ms 2.7633ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
911.61ms 2.7636ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
914.37ms 2.7627ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
917.14ms 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
919.91ms 2.7627ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
922.67ms 2.7643ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
925.44ms 2.7623ms - - - - - 32.000MB 11.313GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
928.21ms 2.7628ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
930.97ms 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
933.74ms 2.7639ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
936.50ms 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
939.27ms 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
942.04ms 2.7639ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
944.80ms 2.7623ms - - - - - 32.000MB 11.313GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
947.57ms 2.7634ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
950.34ms 2.7637ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
953.10ms 2.7637ms - - - - - 32.000MB 11.307GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
955.87ms 2.7643ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
958.64ms 2.7634ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
961.40ms 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
964.17ms 2.7623ms - - - - - 32.000MB 11.313GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
966.94ms 2.7642ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
969.70ms 2.7642ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
972.47ms 2.7628ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
975.24ms 2.7627ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
978.00ms 2.7632ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
980.77ms 2.7635ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
983.53ms 2.7637ms - - - - - 32.000MB 11.307GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
986.30ms 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
989.07ms 2.7633ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
991.83ms 2.7627ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
994.60ms 2.7621ms - - - - - 32.000MB 11.314GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
997.36ms 2.7627ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.00013s 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.00290s 2.7636ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.00566s 2.7622ms - - - - - 32.000MB 11.314GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.00843s 2.7803ms - - - - - 32.000MB 11.240GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.01121s 2.7810ms - - - - - 32.000MB 11.237GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.01400s 2.7641ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.01676s 2.7784ms - - - - - 32.000MB 11.247GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.01954s 2.7793ms - - - - - 32.000MB 11.244GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.02233s 2.7633ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.02509s 2.7638ms - - - - - 32.000MB 11.307GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.02786s 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.03063s 2.7797ms - - - - - 32.000MB 11.242GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.03341s 2.7628ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.03618s 2.7629ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.03894s 2.7824ms - - - - - 32.000MB 11.231GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.04173s 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.04449s 2.7633ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.04726s 2.7635ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.05003s 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.05279s 2.7631ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.05556s 2.7636ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.05833s 2.7619ms - - - - - 32.000MB 11.315GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.06109s 2.7622ms - - - - - 32.000MB 11.313GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.06386s 2.7642ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.06662s 2.7632ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.06939s 2.7633ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.07216s 2.7636ms - - - - - 32.000MB 11.308GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.07492s 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.07769s 2.7627ms - - - - - 32.000MB 11.312GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.08045s 2.7639ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.08322s 2.7638ms - - - - - 32.000MB 11.307GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.08599s 2.7640ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.08875s 2.7695ms - - - - - 32.000MB 11.284GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.09153s 2.7704ms - - - - - 32.000MB 11.280GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.09430s 2.7696ms - - - - - 32.000MB 11.283GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.09707s 2.7719ms - - - - - 32.000MB 11.274GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.09985s 2.7726ms - - - - - 32.000MB 11.271GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.10262s 2.7710ms - - - - - 32.000MB 11.277GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.10540s 2.7649ms - - - - - 32.000MB 11.302GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.10817s 2.7639ms - - - - - 32.000MB 11.306GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.11093s 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.11370s 2.7634ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.11647s 2.7624ms - - - - - 32.000MB 11.313GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.11923s 2.7627ms - - - - - 32.000MB 11.312GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.12200s 2.7625ms - - - - - 32.000MB 11.312GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.12476s 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.12753s 2.7628ms - - - - - 32.000MB 11.311GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.13029s 2.7630ms - - - - - 32.000MB 11.310GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.13306s 2.7639ms - - - - - 32.000MB 11.307GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.13583s 2.7632ms - - - - - 32.000MB 11.309GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.13859s 2.7619ms - - - - - 32.000MB 11.315GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.14136s 2.7623ms - - - - - 32.000MB 11.313GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.14412s 2.7642ms - - - - - 32.000MB 11.305GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.27095s 2.7474ms - - - - - 32.000MB 11.375GB/s Pinned Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.27376s 2.6313ms - - - - - 32.000MB 11.876GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.27640s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.27904s 2.6420ms - - - - - 32.000MB 11.828GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.28169s 2.6303ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.28432s 2.6430ms - - - - - 32.000MB 11.824GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.28697s 2.6314ms - - - - - 32.000MB 11.876GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.28961s 2.6308ms - - - - - 32.000MB 11.879GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.29225s 2.6569ms - - - - - 32.000MB 11.762GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.29491s 2.6316ms - - - - - 32.000MB 11.875GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.29755s 2.6305ms - - - - - 32.000MB 11.880GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.30019s 2.6347ms - - - - - 32.000MB 11.861GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.30283s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.30546s 2.6312ms - - - - - 32.000MB 11.877GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.30810s 2.6313ms - - - - - 32.000MB 11.876GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.31074s 2.6471ms - - - - - 32.000MB 11.805GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.31339s 2.6434ms - - - - - 32.000MB 11.822GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.31604s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.31868s 2.6423ms - - - - - 32.000MB 11.827GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.32133s 2.6401ms - - - - - 32.000MB 11.837GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.32397s 2.6433ms - - - - - 32.000MB 11.822GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.32662s 2.6356ms - - - - - 32.000MB 11.857GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.32926s 2.6708ms - - - - - 32.000MB 11.701GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.33194s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.33458s 2.6313ms - - - - - 32.000MB 11.876GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.33722s 2.6362ms - - - - - 32.000MB 11.854GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.33986s 2.6405ms - - - - - 32.000MB 11.835GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.34250s 2.6391ms - - - - - 32.000MB 11.841GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.34515s 2.6323ms - - - - - 32.000MB 11.872GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.34779s 2.6430ms - - - - - 32.000MB 11.824GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.35044s 2.6447ms - - - - - 32.000MB 11.816GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.35309s 2.6308ms - - - - - 32.000MB 11.879GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.35573s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.35836s 2.6301ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.36100s 2.6314ms - - - - - 32.000MB 11.876GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.36364s 2.6312ms - - - - - 32.000MB 11.877GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.36627s 2.6307ms - - - - - 32.000MB 11.879GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.36891s 2.6332ms - - - - - 32.000MB 11.868GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.37155s 2.6341ms - - - - - 32.000MB 11.864GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.37419s 2.6328ms - - - - - 32.000MB 11.870GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.37683s 2.6315ms - - - - - 32.000MB 11.875GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.37947s 2.6432ms - - - - - 32.000MB 11.823GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.38212s 2.6432ms - - - - - 32.000MB 11.823GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.38477s 2.6345ms - - - - - 32.000MB 11.862GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.38741s 2.6430ms - - - - - 32.000MB 11.824GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.39006s 2.6383ms - - - - - 32.000MB 11.845GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.39270s 2.6322ms - - - - - 32.000MB 11.872GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.39534s 2.6301ms - - - - - 32.000MB 11.882GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.39798s 2.6304ms - - - - - 32.000MB 11.880GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.40061s 2.6565ms - - - - - 32.000MB 11.764GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.40328s 2.6322ms - - - - - 32.000MB 11.872GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.40591s 2.6310ms - - - - - 32.000MB 11.877GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.40855s 2.6419ms - - - - - 32.000MB 11.829GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.41120s 2.6722ms - - - - - 32.000MB 11.695GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.41388s 2.6593ms - - - - - 32.000MB 11.751GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.41654s 2.6306ms - - - - - 32.000MB 11.880GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.41918s 2.6436ms - - - - - 32.000MB 11.821GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.42183s 2.6301ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.42447s 2.6308ms - - - - - 32.000MB 11.878GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.42710s 2.6338ms - - - - - 32.000MB 11.865GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.42974s 2.7005ms - - - - - 32.000MB 11.572GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.43245s 2.6275ms - - - - - 32.000MB 11.893GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.43509s 2.6275ms - - - - - 32.000MB 11.893GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.43772s 2.6261ms - - - - - 32.000MB 11.900GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.44035s 2.6434ms - - - - - 32.000MB 11.822GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.44300s 2.6259ms - - - - - 32.000MB 11.901GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.44563s 2.6811ms - - - - - 32.000MB 11.656GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.44832s 2.6368ms - - - - - 32.000MB 11.852GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.45096s 2.6431ms - - - - - 32.000MB 11.823GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.45361s 2.6301ms - - - - - 32.000MB 11.882GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.45625s 2.6380ms - - - - - 32.000MB 11.846GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.45889s 2.6301ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.46153s 2.6316ms - - - - - 32.000MB 11.875GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.46417s 2.6316ms - - - - - 32.000MB 11.875GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.46681s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.46944s 2.6301ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.47208s 2.6365ms - - - - - 32.000MB 11.853GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.47472s 2.6429ms - - - - - 32.000MB 11.824GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.47737s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.48001s 2.6363ms - - - - - 32.000MB 11.854GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.48265s 2.6432ms - - - - - 32.000MB 11.823GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.48530s 2.6396ms - - - - - 32.000MB 11.839GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.48794s 2.6365ms - - - - - 32.000MB 11.853GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.49059s 2.6301ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.49322s 2.6313ms - - - - - 32.000MB 11.876GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.49586s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.49850s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.50113s 2.6551ms - - - - - 32.000MB 11.770GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.50380s 2.6377ms - - - - - 32.000MB 11.847GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.50644s 2.6430ms - - - - - 32.000MB 11.824GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.50909s 2.6430ms - - - - - 32.000MB 11.824GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.51174s 2.6417ms - - - - - 32.000MB 11.830GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.51439s 2.6518ms - - - - - 32.000MB 11.785GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.51704s 2.6430ms - - - - - 32.000MB 11.824GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.51969s 2.6313ms - - - - - 32.000MB 11.876GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.52233s 2.6309ms - - - - - 32.000MB 11.878GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.52497s 2.6338ms - - - - - 32.000MB 11.865GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.52761s 2.6308ms - - - - - 32.000MB 11.878GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.53025s 2.6983ms - - - - - 32.000MB 11.581GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.53295s 2.6302ms - - - - - 32.000MB 11.881GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.53559s 2.6301ms - - - - - 32.000MB 11.882GB/s Device Pinned Tesla V100-PCIE 1 7 [CUDA memcpy DtoH]
1.61703s 8.7750ms - - - - - 32.000MB 3.5613GB/s Pageable Device Tesla V100-PCIE 1 7 [CUDA memcpy HtoD]
1.62588s 93.571us - - - - - 32.000MB 333.97GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62597s 88.290us - - - - - 32.000MB 353.95GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62606s 87.778us - - - - - 32.000MB 356.01GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62615s 88.258us - - - - - 32.000MB 354.08GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62624s 88.450us - - - - - 32.000MB 353.31GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62633s 88.547us - - - - - 32.000MB 352.92GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62642s 88.898us - - - - - 32.000MB 351.53GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62651s 88.226us - - - - - 32.000MB 354.20GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62660s 88.162us - - - - - 32.000MB 354.46GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62669s 88.322us - - - - - 32.000MB 353.82GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62678s 88.035us - - - - - 32.000MB 354.97GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62686s 87.810us - - - - - 32.000MB 355.88GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62695s 87.874us - - - - - 32.000MB 355.62GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62704s 88.994us - - - - - 32.000MB 351.15GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62713s 87.778us - - - - - 32.000MB 356.01GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62722s 88.003us - - - - - 32.000MB 355.10GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62731s 88.290us - - - - - 32.000MB 353.95GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62740s 88.098us - - - - - 32.000MB 354.72GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62749s 88.290us - - - - - 32.000MB 353.95GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62758s 88.483us - - - - - 32.000MB 353.18GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62767s 88.322us - - - - - 32.000MB 353.82GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62776s 88.450us - - - - - 32.000MB 353.31GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62785s 88.834us - - - - - 32.000MB 351.78GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62794s 88.514us - - - - - 32.000MB 353.05GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62802s 88.547us - - - - - 32.000MB 352.92GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62811s 88.226us - - - - - 32.000MB 354.20GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62820s 88.194us - - - - - 32.000MB 354.33GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62829s 88.898us - - - - - 32.000MB 351.53GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62838s 88.034us - - - - - 32.000MB 354.98GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62847s 88.323us - - - - - 32.000MB 353.81GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62856s 88.546us - - - - - 32.000MB 352.92GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62865s 87.970us - - - - - 32.000MB 355.23GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62874s 88.002us - - - - - 32.000MB 355.11GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62883s 88.258us - - - - - 32.000MB 354.08GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62892s 88.098us - - - - - 32.000MB 354.72GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62901s 88.130us - - - - - 32.000MB 354.59GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62909s 88.194us - - - - - 32.000MB 354.33GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62918s 88.578us - - - - - 32.000MB 352.80GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62927s 88.675us - - - - - 32.000MB 352.41GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62936s 88.034us - - - - - 32.000MB 354.98GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62945s 88.290us - - - - - 32.000MB 353.95GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62954s 88.482us - - - - - 32.000MB 353.18GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62963s 88.130us - - - - - 32.000MB 354.59GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62972s 88.131us - - - - - 32.000MB 354.59GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62981s 88.194us - - - - - 32.000MB 354.33GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62990s 88.514us - - - - - 32.000MB 353.05GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.62999s 87.842us - - - - - 32.000MB 355.75GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63008s 87.746us - - - - - 32.000MB 356.14GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63016s 87.843us - - - - - 32.000MB 355.75GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63025s 88.066us - - - - - 32.000MB 354.85GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63034s 88.450us - - - - - 32.000MB 353.31GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63043s 88.610us - - - - - 32.000MB 352.67GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63052s 88.546us - - - - - 32.000MB 352.92GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63061s 88.163us - - - - - 32.000MB 354.46GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63070s 89.058us - - - - - 32.000MB 350.89GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63079s 88.898us - - - - - 32.000MB 351.53GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63088s 87.874us - - - - - 32.000MB 355.62GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63097s 88.515us - - - - - 32.000MB 353.05GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63106s 88.386us - - - - - 32.000MB 353.56GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63115s 88.610us - - - - - 32.000MB 352.67GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63124s 88.098us - - - - - 32.000MB 354.72GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63133s 88.226us - - - - - 32.000MB 354.20GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63141s 88.707us - - - - - 32.000MB 352.28GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63150s 88.546us - - - - - 32.000MB 352.92GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63159s 88.194us - - - - - 32.000MB 354.33GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63168s 88.322us - - - - - 32.000MB 353.82GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63177s 88.162us - - - - - 32.000MB 354.46GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63186s 88.227us - - - - - 32.000MB 354.20GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63195s 88.226us - - - - - 32.000MB 354.20GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63204s 88.738us - - - - - 32.000MB 352.16GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63213s 87.842us - - - - - 32.000MB 355.75GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63222s 87.715us - - - - - 32.000MB 356.27GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63231s 88.418us - - - - - 32.000MB 353.43GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63240s 88.194us - - - - - 32.000MB 354.33GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63248s 88.546us - - - - - 32.000MB 352.92GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63257s 88.066us - - - - - 32.000MB 354.85GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63266s 88.355us - - - - - 32.000MB 353.69GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63275s 88.162us - - - - - 32.000MB 354.46GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63284s 88.482us - - - - - 32.000MB 353.18GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63293s 88.610us - - - - - 32.000MB 352.67GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63302s 88.130us - - - - - 32.000MB 354.59GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63311s 88.867us - - - - - 32.000MB 351.65GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63320s 88.386us - - - - - 32.000MB 353.56GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63329s 88.258us - - - - - 32.000MB 354.08GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63338s 89.058us - - - - - 32.000MB 350.89GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63347s 88.483us - - - - - 32.000MB 353.18GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63356s 88.546us - - - - - 32.000MB 352.92GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63365s 87.970us - - - - - 32.000MB 355.23GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63374s 88.162us - - - - - 32.000MB 354.46GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63383s 88.034us - - - - - 32.000MB 354.98GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63391s 87.875us - - - - - 32.000MB 355.62GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63400s 88.258us - - - - - 32.000MB 354.08GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63409s 87.682us - - - - - 32.000MB 356.40GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63418s 88.226us - - - - - 32.000MB 354.20GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63427s 87.938us - - - - - 32.000MB 355.36GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63436s 89.091us - - - - - 32.000MB 350.76GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63445s 88.802us - - - - - 32.000MB 351.91GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63454s 87.970us - - - - - 32.000MB 355.23GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63463s 88.802us - - - - - 32.000MB 351.91GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
1.63472s 88.258us - - - - - 32.000MB 354.08GB/s Device Device Tesla V100-PCIE 1 7 [CUDA memcpy DtoD]
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows.
SSMem: Static shared memory allocated per CUDA block.
DSMem: Dynamic shared memory allocated per CUDA block.
SrcMemType: The type of source memory accessed by memory operation/copy
DstMemType: The type of destination memory accessed by memory operation/copy
$
Note that according to the CUDA 9.2 programming guide, the COMPUTE_PROFILER environment variable has no effect on nvprof behavior:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Thanks a lot :)
I was using CUDA 8.0 (although the document says it should work).
On a 9.1 it yields the same output as yours.
Eyal