How to Implement Performance Metrics in CUDA C/C++

anon95180265 March 11, 2020, 9:08pm 21

Yes, the whole kernel is timed. But since it's a bandwidth-bound kernel, we are effectively measuring bandwidth. We could calculate the compute throughput of the kernel, but it will be low relative to the peak compute throughput of the GPU (since bandwidth is the bottleneck in this case).

Topic		Replies	Views
Inconsistent concurrent transfer speed CUDA Programming and Performance	21	1107	April 17, 2023
Performance test sharedmemory <-> globalmemory CUDA Programming and Performance	2	3931	May 30, 2008
An Easy Introduction to CUDA C and C++ Technical Blog	48	1102	July 19, 2018
How to Optimize Data Transfers in CUDA C/C++ Technical Blog	12	1143	January 22, 2022
A few questions on CUDA performance with pictures! CUDA Programming and Performance	6	3349	January 10, 2009
GPU/CPU precision comparison and Kernel instructions question CUDA Programming and Performance	5	669	April 4, 2017
Very newbie questions on synchronisation between GPU & CPU, and time measurement CUDA Programming and Performance	4	485	December 17, 2017
CUDA Newbie bandwidth question CUDA Programming and Performance	0	7890	January 25, 2008
Using bandwidthTest tool, D2D performance More than the official given bandwidth CUDA Programming and Performance cuda	6	806	October 28, 2022
How to improve the performance of using CUDA IPC shared memory? CUDA Programming and Performance cuda	5	87	October 23, 2024

How to Implement Performance Metrics in CUDA C/C++

Related topics