2D wave equation instability

I have a kernel evaluating the 2D wave equation as a height map over a sample grid. Each timestep I am evaluating the height of each sample explicitly by summing the heights of its neighbors and storing the results in an acceleration array, then updating position and velocity arrays. Each array is double-buffered and passed to Cuda using cudaGLMapBufferObject so I can draw the heightmap and avoid transfers over the bus.

Unfortunately, within a few hundred timesteps the simulation blows up: I get spurious waves with wavelengths on the order of my sample distance. When one sample is low, its neighbors are all high and the waves keep growing in magnitude. I tried RK4 integration but it doesn’t make any difference. How can I make my implementation stable?

Are you taking the CFL limit into account when determining the integration time step size?

Debug it. Use Parallel Nsight.

How will a debugger find what is probably a mathematical problem?