CPU vs GPU performance

ndnparis · December 16, 2018, 9:23pm

Hi,
It was my understanding. I must not work on sunday evening :)
Sorry for the inconvenience.
Thanks

Robert_Crovella · December 16, 2018, 9:37pm

I don’t think that web article makes any claims that the code presented should run faster on the GPU. I don’t see any CPU performance measurements.

To discuss performance-related questions or comparisons, it’s usually important to provide:

the operating system you are using
how the code was compiled (what was the compile command line or project configuration e.g. release vs. debug)
which CUDA version you are using

However, probably some level of explanation can be given:

The original article is using Unified Memory in a pre-pascal regime (read the unified memory section of the programming guide: [url]https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-unified-memory-programming-hd[/url] )

This means that the data transfer is performed en-masse at kernel launch time, and there are no GPU page faults to slow the kernel down. In your case, you are running (I would guess) on linux in a pascal-type unified memory regime, which uses page-faulting for data transfer. This is slowing your kernel down dramatically.

You can mitigate these effects with prefetching, as discussed here:

[url]https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/[/url]

This activity should bring your measured kernel performance more closely in line with the original web article. (The original article reported a time of about 0.5s for the kernel with <<<1,1>>> configuration as you indicate in your post.)

After that, if you proceed with the suggestion in the comment below, you should be able to follow along with the subsequent code modifications and comparisons.

tera · December 16, 2018, 9:38pm

Read the introduction to the end - so far you are only half-way through.

ndnparis · December 16, 2018, 9:58pm

Hi Robert_Crovella & tera

Thanks for your replies.

@tera : I got it.

Thanks a lot

Topic		Replies	Views
Running kernel slow on Pascal (CUDA8) CUDA Programming and Performance	11	1379	August 21, 2017
Bad performance when using unified memory CUDA Programming and Performance	2	3483	April 21, 2019
Unified Memory in Pascal CUDA Programming and Performance	5	959	September 24, 2018
Unified Memory for CUDA Beginners Technical Blog	46	3061	December 1, 2023
Maximizing Unified Memory Performance in CUDA Technical Blog	18	1504	May 14, 2019
Why 2 GPUs is slower than 1 GPU CUDA Programming and Performance cuda , kernel	6	529	December 4, 2023
Managed memory slow to copy back to host CUDA Programming and Performance cuda	2	591	January 11, 2021
Multiple GPU very slow performance CUDA Programming and Performance	7	1488	November 10, 2022
Programming strategy for pascal/TX2 memory hierarchy CUDA Programming and Performance	0	376	July 31, 2018
Performance decrease on Unified GracehopperC CUDA Programming and Performance	10	527	July 27, 2025

CPU vs GPU performance

Related topics