CUDA+MPI = Unexplained Issues... Random Crashes, Errenous Output?!?

suhailrehman · July 3, 2008, 11:43am

I’m running a particular program in CUDA, and use MPI to create multiple processes of the same program to run on the different CPU cores, all of which are accessing the same GPU. Each program does the following:

Initialize MPI, CUDA
Get process rank and total no. of processes
perform CUDA Malloc
For N times do
4.1 cudaMemcpy (Host to Device)
4.2 Execute a Kernel on a portion of the data set
4.3 cudaMemcpy (Device to Host)
4.4 End Loop (No.4)
Perform CPU Compute
Verify Results
Perform Barrier Sync
Use MPI I/O calls to write timestamps of various events from a string buffer to file
Free memory and Cleanup

So I execute this program on varying input and various iterations using a bash script to execute mpirun some 200 or 500 times. The problems I face are:

Occasional Errors in output (non-reproducible) occurring seemingly at random iterations. Sometimes output is garbage, or sometimes all zeros.
Sometimes the process freezes before crashing.
On even rarer occasions, the system completely crashes, forcing me to do a cold reboot. While trying the program in text mode, I got the following message while running the program, just before It crashed

NVRM: Xid (0060:00) 13, 0004 00000000 000050c0 00000368 00000000 00000080

Each program execution works on a small amount of data (a cudamemcopy in the program copies not more than 2 KB of data to the GPU).

I’m using an NVIDIA Quadro FX 5600 on a dual Zeon quad-core workstation, running RHEL4, CUDA 1.1 SDK and Toolkit, default OpenMPI package (1.1.1).

The NVIDIA driver I’m using is v169.04

Does anyone have any Idea whats happening?

jordyvaneijk · July 3, 2008, 11:47am

What kind of errors do you get?

curryml · July 4, 2008, 7:03am

This, in general, does not work very well. When it does work, you will only get 1-10% of the capability of the GPU. Other times, the driver may even bring the whole machine down.

Is there any particular reason you want to have all CPU cores using the GPU in different contexts simultaneously?

suhailrehman · July 4, 2008, 9:42am

Sorry people, I didn’t really complete the post, I left it midway to do a double check on the program, and didn’t realize that the incomplete post was submitted, do take a look at the start of the thread as I’ve completed the post now.

[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

Sorry Again!

suhailrehman · July 7, 2008, 8:30am

Can anyone throw some more light on this topic? Any help would be much appreciated.

curryml · July 7, 2008, 9:35am

Upon rereading, I realize what I was saying in the first post isn’t very clear. :)

Currently, having the GPU accessed by many CUDA jobs simultaneously is not well-supported. The driver will often do its best to accommodate all of the requests, but unfortunately it’s not designed to do it. Having all of the requests happening at once tends to pull the driver in many different directions, and it tries to timeslice between the CUDA jobs. The result is poor performance overall, especially per-process, and even general system instability.

CUDA works best when you have a single thread accessing the GPU. If you are attempting to develop/debug a distributed memory CUDA program, it might make sense to make a wrapper which instead of directly performing calls on the GPU instead transfers the data to another process standing by outside of the MPI job which handles the GPU work.

I learned this through experience about a year ago, so please ask more specific questions if you have any.

Topic		Replies	Views
MPI + CUDA Problem CUDA Programming and Performance	0	6498	October 3, 2011
mpi for loop run cuda crashed CUDA Programming and Performance	4	1787	May 17, 2010
MPI and CUDA mixed programming General CUDA programming CUDA Programming and Performance	22	23962	July 27, 2010
about running cuda on a gpu cluster CUDA Programming and Performance	25	21798	May 31, 2010
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3638	March 10, 2011
Cuda streams vs Cuda+MPI How the different CPU processes access to the GPU? CUDA Programming and Performance	13	16155	March 20, 2011
Does sticky CUDA error affect other host processes using the same GPU? CUDA Programming and Performance	7	660	October 8, 2022
Sharing 1 GPU betwenn MPI tasks work fine with 4 mpi tasks but cudaMalloc "unknown error" wi CUDA Programming and Performance	4	6004	April 10, 2009
Multiple simultaneous CUDA applications (system crash on 100.14.11) CUDA Programming and Performance	14	12699	October 8, 2007
using all 4 GPUs in S1070 from multi-core cpu? how CUDA Programming and Performance	11	32523	December 13, 2010

CUDA+MPI = Unexplained Issues... Random Crashes, Errenous Output?!?

Related topics