Accelerating Fluid Dynamics Simulations with CUDA: Seeking Guidance for Novice Programmers

Hello experts,

I’m a doctoral student focusing on simulating fluid dynamics using Smoothed Particle Hydrodynamics (SPH) method. I’ve noticed the remarkable advancements in CUDA for accelerating computations in recent years. I’m interested in learning how to rewrite my existing codebase into CUDA to leverage its potential for acceleration. However, currently, my proficiency in CUDA programming is at a novice level. I’ve already attained a decent level of proficiency in C++ programming, sufficient for implementing desired functionalities.

I would like to seek advice from experienced experts on which learning resources to pursue to efficiently achieve my goal in the shortest time possible. Could you please recommend learning paths, study materials, or instructional videos to help me advance in CUDA programming?

Thank you in advance for your guidance and assistance.

(Thus far, I have reviewed some Chinese materials and videos on CUDA and have a basic understanding of CUDA execution and memory models.)

That depends very much on your personally preferred learning style, so you might want to add some specifics on that for best recommendations. For example, I am very much a hands-on learner, an experimenter at heart. Frontal presentations including instructional videos typically provide minimal value to me. I set myself a modest specific goal, write a program to accomplish it, then check back with documentation when I hit the inevitable snags. Rinse and repeat, with the goals becoming more ambitious over time.

From observing others whose primary job is not software engineering, it seems that one of the most promising routes to start with parallel programming on GPUs is to start at a high level of abstraction, with environments like Python with numpy, OpenMP directives, or the thrust library for those with some C++ background. I have seen domain specialists (e.g. an astrophysicist) get up and running in this fashion in as little as two weeks, while already reaping performance benefits compared to their CPU-only code. This then provides motivation to dive in more deeply and maybe progress to bare-bones CUDA C++ and all the CUDA-specific optimization aspects. Not all do, by the way. Some conclude that the results from the high-level approach are good enough for their needs.

1 Like

Adding to njuffa’s answer, you could try to write your code using C++ standard parallelism with a parallel execution policy. The nvc++ compiler from nvidia’s HPC toolkit provides a cuda backend for it. This could show you the potential performance improvements without writing CUDA-related code. More information about it can be found here for example: C++ Standard Parallelism | NVIDIA On-Demand.

Actually, I’m more inclined towards finding high-quality video tutorials, preferably accompanied by corresponding PowerPoint materials. Learning would be more manageable this way. By studying these top-notch tutorials, I believe I can gain a deeper understanding of CUDA. Additionally, I’m seeking some beginner-friendly PDF resources on CUDA. This way, I can deepen my understanding by examining relatively simple kernel function examples provided in these materials!

Thank you very much for your advice. So, if I understand correctly, I only need to write regular C++ code, and there are corresponding functions or directives that can automatically invoke CUDA-related elements, eliminating the need to write CUDA-specific programming, right? This approach is indeed very clever. I will continue to explore this avenue in the future. However, for now, I still want to take this opportunity to learn some CUDA programming content while also exploring the possibility of applying CUDA to my existing research directions.

There is a comprehensive 13 part Cuda course here.

Although the site relates to a long passed live presentation requiring registration, the content, (video and slides) seems to be available still for each topic.

1 Like

The following link may not be the fastest way, but it may help you get started.

The videos were once incorporated in a Udacity course, and the coding problems could be worked, submitted and automatically tested/graded online.

1 Like

Also, if you install the latest CUDA toolkit, you can study, compile and run several example programs.
Formerly included in the toolkit itself, the examples are now posted on GitHub.

See:

And:

1 Like

Thanks very much !!!

Can it really still be found? Although I opened it and there is a table of contents, I can’t see the content. Is there something wrong with my operation or did I not find the right place?

Yes. At the bottom of the page, under each lesson, there is a tab, “Presentation”.
In there are links to video and slides.

1 Like

That’s great!Thanks, hah, I have found it with your help!

Hi Donkim,
for giving you good advice, it would help to better understand your background, aims and application.

Which algorithmic methods is SPH using? Can you express it as an iterative method or as matrix operations? Will you directly write Cuda code or employ higher-level libraries?

How do you learn best? By understanding the basics and building up on it with more and more advanced topics? Or by getting higher-level applications to run and then dig down more and more.

What is your background in programming, computer science, maths, numerical computing, technical computing, algorithms?
Do you understand concepts as data types including floating-point; do you understand memory caches and their levels? Do you have a grasp of parallel programming, sychnronization? Do you understand scopes of variables, the difference between registers, shared memory, global memory?
Would you (with documentation) understand in a basic fashion PTX and SASS assembly? What other programming languages do you have been working on?

How fast do you want to get more experience? Days, weeks, months? Do you only want to solve your current problem or get a broader amount of experience? Do you want to get it running on Cuda at all (so it is faster than on the CPU) or optimize the program as far as possible (because you have very long testruns or have to rent high-performance servers).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.