Finite Difference Methods in CUDA C++, Part 2

Originally published at:

In the previous CUDA C++ post we dove in to 3D finite difference computations in CUDA C/C++, demonstrating how to implement the x derivative part of the computation. In this post, let’s continue by exploring how we can write efficient kernels for the y and z derivatives. As with the previous post, code for the examples in this post…

Hi, Dr. Mark Harris, I wanted to share my benchmarks with you and the rest of the NVIDIA CUDA dev community my benchmarks for 3-dim. finite difference derivatives. I had a few questions, which I wanted to throw out: beyond this 64^3 grid, how does this implementation and the concepts of pencils, extend to arbitrarily (large) sized grids? Naively, if you wanted a "big" grid to do 3-dim. finite difference derivatives on, e.g. 2560^3, does the "pencil" extend to size 2560 (entries)? Can we go even larger? More in general, are there any implementations out there for 3-dim. Navier-Stokes equation solvers using this finite difference with CUDA C/C++?

I asked this in part 1, but it may pertain here: Arbitrarily (large) sized grids - naively, I changed mx=my=mz for the
grid size (originally 64^3) to 92^3 (i.e. mx=my=mz=92) and anything
above 92, I obtain a Segmentation fault (core dumped). I was simply
curious what was happening; is it a limitation on the GPU hardware? If
so, which parameter? I'm on a NVIDIA GeForce GTX 980Ti.

Quite useful could be real-measures approximation vial finite sum of improper polynomial. Note improper integral as sum of improper integrals - trivially to parallelize, and less computational costly than f.e. rectangles method. I strongly feel that simple cost function of RMSE and LUT of precomputated polynomial integrals should do the job on GTX 1060 and Core2Duo.