GPU is slower than CPU

Hello, I am currently developing a GPU app. However, my GPU is slower than my CPU. What could be the problem? These are the specs of my comp and project environment:

-Windows 10 32bit
-Intel i5 2430M
-NVIDIA Geforce 540M
-CUDA Toolkit 6.5

I do have a lot of cudaMalloc and cudaMemcpy, but they’re not the problem (I’ve measured the time using event)

Any help will be highly appreciated.
Thanks a lot!

It’s actually not that hard for a GPU to be a lot slower than a CPU.

A lot of what makes a GPU faster than a CPU depends on things like the size of the data you’re working on and how computationally intense the code is. Small data with few calculations is a poor fit for a GPU, for example. CPUs aren’t as slow as we’d like to think so stuff like this does happen.

Hello, MutantJohn. Thanks for your reply
I’m working on BVH CUDA ray tracing project. I use 200 for both number of blocks and threads. The calculations include checking intersections and shadows using BVH and calculating colors (reflections and refractions too). I think those are quite a lot of computations…

Any other opinions please? Since the difference is 20 secs for a teapot obj… (with and without CUDA)

First thing’s first, make sure you’re compiling with the proper optimization flags. Do no -G or -O0 or anything like that. Balls to the wall -O2 or -O3.

Next up is to profile your code itself. It’s near impossible to look at source code and go, “Oh hey, that’s a bottleneck!”

Okay, it is possible to do that! Some things are so obvious! But not all things are. So you’ll need to use nvprof or nvvp to find the slowest kernel invocations and then you can figure out why those kernels invocations are slow.

This thread might help you understand why GPU is slower than CPU in some cases.
[url]https://devtalk.nvidia.com/default/topic/953975/sequential-code-is-faster-than-parallel-how-is-it-possible-/[/url]

Have you profiled your application yet? I strongly recommend you do that, it should point you in the right direction.

In GPU ray tracing, the bottleneck is typically not by computation, but by thread divergence and memory access. If you haven’t yet, you should also read this publication: Understanding the Efficiency of Ray Traversal on GPUs | Research

Hello, thanks for all the replies
The problem is solved. Thank you :)

What was the problem?

1 Like