Fastest CUDA Implementation?

jordan44 · May 23, 2014, 4:48pm

Hey all,

I am trying to accelerate a program that is written for CPU using CUDA and an NVIDIA GPU. From what I understand, CUDA comes in different programming languages, C, C++, Python, Fortran? Are there significant speed differences between these platforms?

Thank you.

MutantJohn · May 23, 2014, 4:53pm

How about you focus on making your program parallelized first? Don’t ask which language, ask yourself, “What’s the best way of using thousands of threads to accomplish my goal?”

We don’t need more language flame wars 'round these here parts.

Robert_Crovella · May 23, 2014, 5:30pm

You should be able to write GPU code (i.e. kernels) that runs in comparable speed on either CUDA C, CUDA C++, or CUDA Fortran. CUDA Fortran is a translation system that ultimately produces a CUDA C/C++ implementation (of the GPU code), that can be called from Fortran, but compiles using the CUDA C/C++ compiler. Python will depend on the specific CUDA/Python implementation. PyCUDA is essentially a wrapper environment that allows C/C++ kernels or CUDA libraries to be called from Python. In that case, an equivalent kernel should have the same performance whether it is called from PyCUDA or from CUDA C/C++ (or CUDA Fortran, if the CUDA Fortran-generated “translated” kernel is equivalent to how you would write the same function in CUDA C/C++).

The net of it is, all of the systems I describe ultimately use the CUDA C/C++ compiler and framework, at least for the CUDA kernel portions of the code, so you should not expect any inherent differences based on that.

For the portions of your code that run on the host CPU even in the CPU/GPU implementation, there will be language differences, and I’m not addressing those.

CudaaduC · May 23, 2014, 9:03pm

Does Python store data in a contiguous layout like C? If not then there will be overhead to get that data in the right form before it is transferred over to the device.

Keep in mind in real-life projects there usually is a mix of host and device code, and if you use anything other than C++/Fortran on the host, it will add significantly to the running time. I have found MATLAB to be much faster than Python.

On the CPU side Python is one of the slowest languages in existence;

[url]http://benchmarksgame.alioth.debian.org/u32/performance.php?test=meteor[/url]

jordan44 · May 23, 2014, 10:12pm

Thank you txbob and CudaaduC. I think I’ll go with CUDA C++.

Topic		Replies	Views
Preferred language for learning CUDA language concepts? Teaching & Curriculum Support	7	17191	May 14, 2013
Cuda CPP vs C CUDA Programming and Performance	1	1512	February 25, 2017
CUDA Python vs PyCUDA CUDA Programming and Performance	2	12367	June 7, 2022
CUDA + FORTRAN ? CUDA Programming and Performance	7	27846	February 1, 2009
[Beginner] Program in Fortran77 on GPU CUDA Programming and Performance	9	1681	October 29, 2010
PuCuda Pros and Cons CUDA Programming and Performance	1	1789	August 20, 2013
Newbie CUDA and CFD Some questions about CDA programming CUDA Programming and Performance	9	8422	September 16, 2008
Jetson Nano or Lowest End GT720 notebook for CUDA learning parallel programming. CUDA Programming and Performance	1	605	July 26, 2019
Performance Comparison: CUDA with Python vs. CUDA with C++ on a Low-End GPU and Large Datasets CUDA Programming and Performance	2	1235	January 13, 2025
Fortran to CUDA-C Translator Is the download available? CUDA Programming and Performance	0	2056	February 29, 2012

Fastest CUDA Implementation?

Related topics