Replaying kernel "void sumconstant(int, double*, double)" (done) ms_mean=2634.656738 ==281308== Profiling application: ./sumconstant ==281308== Profiling result: ==281308== Metric result: Invocations Metric Name Metric Description Min Max Avg Device "Tesla V100-PCIE-16GB (0)" Kernel: printkernel(void) 1 inst_per_warp Instructions per warp 5.4950e+03 5.4950e+03 5.4950e+03 1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00% 1 warp_execution_efficiency Warp Execution Efficiency 3.12% 3.12% 3.12% 1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 2.27% 2.27% 2.27% 1 inst_replay_overhead Instruction Replay Overhead 0.001855 0.001855 0.001855 1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 1 local_load_transactions_per_request Local Memory Load Transactions Per Request 1.000000 1.000000 1.000000 1 local_store_transactions_per_request Local Memory Store Transactions Per Request 1.000000 1.000000 1.000000 1 gld_transactions_per_request Global Load Transactions Per Request 1.000000 1.000000 1.000000 1 gst_transactions_per_request Global Store Transactions Per Request 1.000000 1.000000 1.000000 1 shared_store_transactions Shared Store Transactions 0 0 0 1 shared_load_transactions Shared Load Transactions 0 0 0 1 local_load_transactions Local Load Transactions 506 506 506 1 local_store_transactions Local Store Transactions 87 87 87 1 gld_transactions Global Load Transactions 17 17 17 1 gst_transactions Global Store Transactions 218 218 218 1 sysmem_read_transactions System Memory Read Transactions 1 1 1 1 sysmem_write_transactions System Memory Write Transactions 223 223 223 1 l2_read_transactions L2 Read Transactions 39 39 39 1 l2_write_transactions L2 Write Transactions 326 326 326 1 dram_read_transactions Device Memory Read Transactions 0 0 0 1 dram_write_transactions Device Memory Write Transactions 4 4 4 1 global_hit_rate Global Hit Rate in unified l1/tex 90.21% 90.21% 90.21% 1 local_hit_rate Local Hit Rate 88.87% 88.87% 88.87% 1 gld_requested_throughput Requested Global Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1 gst_requested_throughput Requested Global Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1 gld_throughput Global Load Throughput 12.312MB/s 12.312MB/s 12.312MB/s 1 gst_throughput Global Store Throughput 157.89MB/s 157.89MB/s 157.89MB/s 1 local_memory_overhead Local Memory Overhead 92.88% 92.88% 92.88% 1 tex_cache_hit_rate Unified Cache Hit Rate 60.62% 60.62% 60.62% 1 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 94.44% 94.44% 94.44% 1 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 29.84% 29.84% 29.84% 1 dram_read_throughput Device Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1 dram_write_throughput Device Memory Write Throughput 2.8971MB/s 2.8971MB/s 2.8970MB/s 1 tex_cache_throughput Unified cache to Multiprocessor throughput 1.5193GB/s 1.5193GB/s 1.5193GB/s 1 l2_tex_read_throughput L2 Throughput (Texture Reads) 13.037MB/s 13.037MB/s 13.037MB/s 1 l2_tex_write_throughput L2 Throughput (Texture Writes) 220.90MB/s 220.90MB/s 220.90MB/s 1 l2_read_throughput L2 Throughput (Reads) 28.246MB/s 28.246MB/s 28.246MB/s 1 l2_write_throughput L2 Throughput (Writes) 236.11MB/s 236.11MB/s 236.11MB/s 1 sysmem_read_throughput System Memory Read Throughput 741.65KB/s 741.65KB/s 741.62KB/s 1 sysmem_write_throughput System Memory Write Throughput 161.51MB/s 161.51MB/s 161.51MB/s 1 local_load_throughput Local Memory Load Throughput 366.48MB/s 366.48MB/s 366.48MB/s 1 local_store_throughput Local Memory Store Throughput 63.011MB/s 63.011MB/s 63.011MB/s 1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1 gld_efficiency Global Memory Load Efficiency 0.00% 0.00% 0.00% 1 gst_efficiency Global Memory Store Efficiency 0.00% 0.00% 0.00% 1 tex_cache_transactions Unified cache to Multiprocessor transactions 537 537 537 1 flop_count_dp Floating Point Operations(Double Precision) 0 0 0 1 flop_count_dp_add Floating Point Operations(Double Precision Add) 0 0 0 1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 0 0 0 1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 0 0 0 1 flop_count_sp Floating Point Operations(Single Precision) 0 0 0 1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0 1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 1 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0 1 inst_executed Instructions Executed 5391 5391 5391 1 inst_issued Instructions Issued 5401 5401 5401 1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1) 1 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) 1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 6.46% 6.46% 6.46% 1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 28.83% 28.83% 28.83% 1 stall_memory_dependency Issue Stall Reasons (Data Request) 21.43% 21.43% 21.43% 1 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% 1 stall_sync Issue Stall Reasons (Synchronization) 42.47% 42.47% 42.47% 1 stall_other Issue Stall Reasons (Other) 0.07% 0.07% 0.07% 1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.73% 0.73% 0.73% 1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.00% 0.00% 0.00% 1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% 1 inst_fp_32 FP Instructions(Single) 0 0 0 1 inst_fp_64 FP Instructions(Double) 0 0 0 1 inst_integer Integer Instructions 4 4 4 1 inst_bit_convert Bit-Convert Instructions 0 0 0 1 inst_control Control-Flow Instructions 2 2 2 1 inst_compute_ld_st Load/Store Instructions 1 1 1 1 inst_misc Misc Instructions 5 5 5 1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 1 issue_slots Issue Slots 5401 5401 5401 1 cf_issued Issued Control-Flow Instructions 400 400 400 1 cf_executed Executed Control-Flow Instructions 400 400 400 1 ldst_issued Issued Load/Store Instructions 1053 1053 1053 1 ldst_executed Executed Load/Store Instructions 1053 1053 1053 1 atomic_transactions Atomic Transactions 5 5 5 1 atomic_transactions_per_request Atomic Transactions Per Request 1.000000 1.000000 1.000000 1 l2_atomic_throughput L2 Throughput (Atomic requests) 3.6213MB/s 3.6213MB/s 3.6213MB/s 1 l2_atomic_transactions L2 Transactions (Atomic requests) 10 10 10 1 l2_tex_read_transactions L2 Transactions (Texture Reads) 18 18 18 1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.01% 0.01% 0.01% 1 stall_not_selected Issue Stall Reasons (Not Selected) 0.00% 0.00% 0.00% 1 l2_tex_write_transactions L2 Transactions (Texture Writes) 305 305 305 1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 1 inst_fp_16 HP Instructions(Half) 0 0 0 1 ipc Executed IPC 0.101520 0.101520 0.101520 1 issued_ipc Issued IPC 0.102739 0.102739 0.102739 1 issue_slot_utilization Issue Slot Utilization 2.57% 2.57% 2.57% 1 sm_efficiency Multiprocessor Activity 1.22% 1.22% 1.22% 1 achieved_occupancy Achieved Occupancy 0.015625 0.015625 0.015625 1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.109403 0.109403 0.109403 1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) 1 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1) 1 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) 1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) 1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) 1 tex_fu_utilization Texture Function Unit Utilization Low (1) Low (1) Low (1) 1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1) 1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) 1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) 1 double_precision_fu_utilization Double-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) 1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% 1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00% 1 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.00% 0.00% 0.00% 1 sysmem_read_utilization System Memory Read Utilization Low (1) Low (1) Low (1) 1 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) 1 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% 1 pcie_total_data_transmitted PCIe Total Data Transmitted 7680 7680 7680 1 pcie_total_data_received PCIe Total Data Received 512 512 512 1 inst_executed_global_loads Warp level instructions for global loads 17 17 17 1 inst_executed_local_loads Warp level instructions for local loads 506 506 506 1 inst_executed_shared_loads Warp level instructions for shared loads 0 0 0 1 inst_executed_surface_loads Warp level instructions for surface loads 0 0 0 1 inst_executed_global_stores Warp level instructions for global stores 218 218 218 1 inst_executed_local_stores Warp level instructions for local stores 87 87 87 1 inst_executed_shared_stores Warp level instructions for shared stores 0 0 0 1 inst_executed_surface_stores Warp level instructions for surface stores 0 0 0 1 inst_executed_global_atomics Warp level instructions for global atom and atom cas 5 5 5 1 inst_executed_global_reductions Warp level instructions for global reductions 0 0 0 1 inst_executed_surface_atomics Warp level instructions for surface atom and atom cas 0 0 0 1 inst_executed_surface_reductions Warp level instructions for surface reductions 0 0 0 1 inst_executed_shared_atomics Warp level shared instructions for atom and atom CAS 0 0 0 1 inst_executed_tex_ops Warp level instructions for texture 0 0 0 1 dram_read_bytes Total bytes read from DRAM to L2 cache 0 0 0 1 dram_write_bytes Total bytes written from L2 cache to DRAM 128 128 128 1 global_load_requests Total number of global load requests from Multiprocessor 17 17 17 1 local_load_requests Total number of local load requests from Multiprocessor 506 506 506 1 surface_load_requests Total number of surface load requests from Multiprocessor 0 0 0 1 global_store_requests Total number of global store requests from Multiprocessor 218 218 218 1 local_store_requests Total number of local store requests from Multiprocessor 87 87 87 1 surface_store_requests Total number of surface store requests from Multiprocessor 0 0 0 1 global_atomic_requests Total number of global atomic requests from Multiprocessor 5 5 5 1 global_reduction_requests Total number of global reduction requests from Multiprocessor 0 0 0 1 surface_atomic_requests Total number of surface atomic requests from Multiprocessor 0 0 0 1 surface_reduction_requests Total number of surface reduction requests from Multiprocessor 0 0 0 1 l2_global_load_bytes Bytes read from L2 for misses in L1 for global loads 544 544 544 1 l2_local_load_bytes Bytes read from L2 for misses in L1 for local loads 32 32 32 1 l2_surface_load_bytes Bytes read from L2 for misses in L1 for surface loads 0 0 0 1 l2_global_atomic_store_bytes Bytes written to L2 from L1 for global atomics 160 160 160 1 l2_local_global_store_bytes Bytes written to L2 from L1 for local and global stores. 9760 9760 9760 1 l2_surface_store_bytes Bytes read from L2 for misses in L1 for surface stores 0 0 0 1 sysmem_read_bytes System Memory Read Bytes 32 32 32 1 sysmem_write_bytes System Memory Write Bytes 7136 7136 7136 1 l2_tex_hit_rate L2 Cache Hit Rate 33.44% 33.44% 33.44% 1 texture_load_requests Total number of texture Load requests from Multiprocessor 0 0 0 1 tensor_precision_fu_utilization Tensor-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) Kernel: void sumconstant(int, double*, double) 1000 inst_per_warp Instructions per warp 1.2494e+06 1.2494e+06 1.2494e+06 1000 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00% 1000 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00% 1000 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 96.82% 96.82% 96.82% 1000 inst_replay_overhead Instruction Replay Overhead 0.000079 0.000127 0.000100 1000 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 1000 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 1000 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 1000 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 1000 gld_transactions_per_request Global Load Transactions Per Request 8.000000 8.000000 8.000000 1000 gst_transactions_per_request Global Store Transactions Per Request 8.000000 8.000000 8.000000 1000 shared_store_transactions Shared Store Transactions 0 0 0 1000 shared_load_transactions Shared Load Transactions 0 0 0 1000 local_load_transactions Local Load Transactions 0 0 0 1000 local_store_transactions Local Store Transactions 0 0 0 1000 gld_transactions Global Load Transactions 1310720 1310720 1310720 1000 gst_transactions Global Store Transactions 1310720 1310720 1310720 1000 sysmem_read_transactions System Memory Read Transactions 0 0 0 1000 sysmem_write_transactions System Memory Write Transactions 5 5 5 1000 l2_read_transactions L2 Read Transactions 1310736 1340408 1311284 1000 l2_write_transactions L2 Write Transactions 1310750 1343961 1316265 1000 dram_read_transactions Device Memory Read Transactions 1310728 1322146 1311175 1000 dram_write_transactions Device Memory Write Transactions 1212495 1326178 1309138 1000 global_hit_rate Global Hit Rate in unified l1/tex 49.83% 50.00% 50.00% 1000 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% 1000 gld_requested_throughput Requested Global Load Throughput 4.5146GB/s 5.1470GB/s 4.7587GB/s 1000 gst_requested_throughput Requested Global Store Throughput 4.5146GB/s 5.1470GB/s 4.7587GB/s 1000 gld_throughput Global Load Throughput 4.5146GB/s 5.1470GB/s 4.7587GB/s 1000 gst_throughput Global Store Throughput 4.5146GB/s 5.1470GB/s 4.7587GB/s 1000 local_memory_overhead Local Memory Overhead 49.94% 50.00% 50.00% 1000 tex_cache_hit_rate Unified Cache Hit Rate 0.00% 0.00% 0.00% 1000 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 0.00% 0.00% 0.00% 1000 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 100.00% 100.00% 100.00% 1000 dram_read_throughput Device Memory Read Throughput 4.5151GB/s 5.1756GB/s 4.7604GB/s 1000 dram_write_throughput Device Memory Write Throughput 4.2459GB/s 5.1933GB/s 4.7530GB/s 1000 tex_cache_throughput Unified cache to Multiprocessor throughput 4.5147GB/s 5.1471GB/s 4.7589GB/s 1000 l2_tex_read_throughput L2 Throughput (Texture Reads) 4.5146GB/s 5.1470GB/s 4.7587GB/s 1000 l2_tex_write_throughput L2 Throughput (Texture Writes) 4.5146GB/s 5.1470GB/s 4.7587GB/s 1000 l2_read_throughput L2 Throughput (Reads) 4.5182GB/s 5.1701GB/s 4.7608GB/s 1000 l2_write_throughput L2 Throughput (Writes) 4.5156GB/s 5.2401GB/s 4.7789GB/s 1000 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1000 sysmem_write_throughput System Memory Write Throughput 18.058KB/s 20.588KB/s 19.034KB/s 1000 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1000 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1000 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1000 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s 1000 gld_efficiency Global Memory Load Efficiency 100.00% 100.00% 100.00% 1000 gst_efficiency Global Memory Store Efficiency 100.00% 100.00% 100.00% 1000 tex_cache_transactions Unified cache to Multiprocessor transactions 327688 327688 327688 1000 flop_count_dp Floating Point Operations(Double Precision) 5242880 5242880 5242880 1000 flop_count_dp_add Floating Point Operations(Double Precision Add) 5242880 5242880 5242880 1000 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 0 0 0 1000 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 0 0 0 1000 flop_count_sp Floating Point Operations(Single Precision) 0 0 0 1000 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 1000 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0 1000 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 1000 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0 1000 inst_executed Instructions Executed 1474624 9995152 5717846 1000 inst_issued Instructions Issued 1474741 1474811 1474771 1000 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1) 1000 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) 1000 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 1.00% 1.08% 1.05% 1000 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 6.77% 7.26% 7.09% 1000 stall_memory_dependency Issue Stall Reasons (Data Request) 91.58% 92.16% 91.79% 1000 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% 1000 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00% 1000 stall_other Issue Stall Reasons (Other) 0.02% 0.02% 0.02% 1000 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.01% 0.02% 0.01% 1000 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.01% 0.01% 0.01% 1000 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% 1000 inst_fp_32 FP Instructions(Single) 0 0 0 1000 inst_fp_64 FP Instructions(Double) 5242880 5242880 5242880 1000 inst_integer Integer Instructions 15729152 15729152 15729152 1000 inst_bit_convert Bit-Convert Instructions 0 0 0 1000 inst_control Control-Flow Instructions 5242880 5242880 5242880 1000 inst_compute_ld_st Load/Store Instructions 10485760 10485760 10485760 1000 inst_misc Misc Instructions 10486528 10486528 10486528 1000 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 1000 issue_slots Issue Slots 1474741 1474811 1474771 1000 cf_issued Issued Control-Flow Instructions 163864 163864 163864 1000 cf_executed Executed Control-Flow Instructions 163864 163864 163864 1000 ldst_issued Issued Load/Store Instructions 327696 327696 327696 1000 ldst_executed Executed Load/Store Instructions 327696 327696 327696 1000 atomic_transactions Atomic Transactions 0 0 0 1000 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 1000 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s 1000 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 1000 l2_tex_read_transactions L2 Transactions (Texture Reads) 1310720 1310720 1310720 1000 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.00% 0.00% 0.00% 1000 stall_not_selected Issue Stall Reasons (Not Selected) 0.03% 0.04% 0.03% 1000 l2_tex_write_transactions L2 Transactions (Texture Writes) 1310720 1310720 1310720 1000 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 1000 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 1000 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 1000 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 1000 inst_fp_16 HP Instructions(Half) 0 0 0 1000 ipc Executed IPC 0.139554 0.509024 0.324166 1000 issued_ipc Issued IPC 0.139389 0.150289 0.145903 1000 issue_slot_utilization Issue Slot Utilization 3.48% 3.76% 3.65% 1000 sm_efficiency Multiprocessor Activity 1.25% 1.26% 1.25% 1000 achieved_occupancy Achieved Occupancy 0.124891 0.124990 0.124959 1000 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.141409 0.152912 0.148409 1000 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) 1000 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1) 1000 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) 1000 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) 1000 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) 1000 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) 1000 special_fu_utilization Special Function Unit Utilization Idle (0) Idle (0) Idle (0) 1000 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) 1000 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) 1000 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1) 1000 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% 1000 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00% 1000 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.01% 0.01% 0.01% 1000 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) 1000 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) 1000 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% 1000 pcie_total_data_transmitted PCIe Total Data Transmitted 0 240640 17672 1000 pcie_total_data_received PCIe Total Data Received 0 221184 16090 1000 inst_executed_global_loads Warp level instructions for global loads 163840 163840 163840 1000 inst_executed_local_loads Warp level instructions for local loads 0 0 0 1000 inst_executed_shared_loads Warp level instructions for shared loads 0 0 0 1000 inst_executed_surface_loads Warp level instructions for surface loads 0 0 0 1000 inst_executed_global_stores Warp level instructions for global stores 163840 163840 163840 1000 inst_executed_local_stores Warp level instructions for local stores 0 0 0 1000 inst_executed_shared_stores Warp level instructions for shared stores 0 0 0 1000 inst_executed_surface_stores Warp level instructions for surface stores 0 0 0 1000 inst_executed_global_atomics Warp level instructions for global atom and atom cas 0 0 0 1000 inst_executed_global_reductions Warp level instructions for global reductions 0 0 0 1000 inst_executed_surface_atomics Warp level instructions for surface atom and atom cas 0 0 0 1000 inst_executed_surface_reductions Warp level instructions for surface reductions 0 0 0 1000 inst_executed_shared_atomics Warp level shared instructions for atom and atom CAS 0 0 0 1000 inst_executed_tex_ops Warp level instructions for texture 0 0 0 1000 dram_read_bytes Total bytes read from DRAM to L2 cache 41943296 42308672 41957601 1000 dram_write_bytes Total bytes written from L2 cache to DRAM 38799840 42437696 41892427 1000 global_load_requests Total number of global load requests from Multiprocessor 163840 163840 163840 1000 local_load_requests Total number of local load requests from Multiprocessor 0 0 0 1000 surface_load_requests Total number of surface load requests from Multiprocessor 0 0 0 1000 global_store_requests Total number of global store requests from Multiprocessor 163840 163840 163840 1000 local_store_requests Total number of local store requests from Multiprocessor 0 0 0 1000 surface_store_requests Total number of surface store requests from Multiprocessor 0 0 0 1000 global_atomic_requests Total number of global atomic requests from Multiprocessor 0 0 0 1000 global_reduction_requests Total number of global reduction requests from Multiprocessor 0 0 0 1000 surface_atomic_requests Total number of surface atomic requests from Multiprocessor 0 0 0 1000 surface_reduction_requests Total number of surface reduction requests from Multiprocessor 0 0 0 1000 l2_global_load_bytes Bytes read from L2 for misses in L1 for global loads 41943040 41943040 41943040 1000 l2_local_load_bytes Bytes read from L2 for misses in L1 for local loads 0 0 0 1000 l2_surface_load_bytes Bytes read from L2 for misses in L1 for surface loads 0 0 0 1000 l2_global_atomic_store_bytes Bytes written to L2 from L1 for global atomics 0 0 0 1000 l2_local_global_store_bytes Bytes written to L2 from L1 for local and global stores. 41943040 41943040 41943040 1000 l2_surface_store_bytes Bytes read from L2 for misses in L1 for surface stores 0 0 0 1000 sysmem_read_bytes System Memory Read Bytes 0 0 0 1000 sysmem_write_bytes System Memory Write Bytes 160 160 160 1000 l2_tex_hit_rate L2 Cache Hit Rate 50.00% 50.00% 50.00% 1000 texture_load_requests Total number of texture Load requests from Multiprocessor 0 0 0 1000 tensor_precision_fu_utilization Tensor-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)