My NAMD CUDA expirience thus far GTX 260 192sp

Luke_SLI · June 1, 2010, 12:04am

Hello all,

I am really hoping to sell my boss on the idea of CUDA. I have a ridiculously stubborn Research Adviser that would rather spend thousands on racks of old dual socket Barcelona U2 than take a chance on CUDA (even outside of our cluster :shrug:)

So I decided to venture a bit on my own. I was fortunate enough to attend a ‘Many Cores’ Seminar at the Ohio Supercomputer center last year and my cuda programming is slowing developing, but the only place my boss would care to see improvement is on our NAMD bio system jobs.

My desktop rig is not that hot for crunching numbers. I would prefer some more cores (maybe I’ll toss in a q9650 :shrug:)

E8600 @ 3.96Ghz (this is a dual core wolfdale with 6mb of l2)

GTX 260 (192sp cards) @ stock 576mhz core

CentOS 5.5

So I tossed on NAMD 2.7 as well as the NAMD 2.7 CUDA version and am running two separate systems.

System 1: Water box (50 x 50 x 50) in which I am growing a micelle from Perfluorooctanoate. The system is neutralized with sodium. Please ignore the periodic boundary caused flip flopping. The box is too small for this one, but it was a nice test npt simulation. (This was my undergrad senior project, I just graduated :) )

Here I am just representing the surfactant resi’s

It’s fun to watch them ‘bud’ from the water voids in the .dcd’s

System 2: I have a lipid bilayer than I am playing with. This system has many more atoms, so I was hoping to see some better CUDA benefits to show the boss.

Parameters:

Results:

My WallTimes were cut in HALF for my micelle system. Considering all of the bonded interactions are still only taking place on my dual core (albeit @ 3.9Ghz) I was very happy to see this. I am still waiting for my cpu only run of system 2. Extrapolating from past data, it looks like it may be about a 130% speed up.

External Media

Conclusions/Remarks: I am really trying to sell my boss on the idea of putting GTX470’s in a couple of the cluster machines or some of the desktops for submitting these type of jobs too. And I was hoping to get some ammunition from you guys. Is there anything obvious that I am missing?

I noticed that only one of my two GTX260’s is being used for the non-bonded calculations and it barely heats up at all. Is there any way I can unload more work on the GPU?

PS. I submit with the following command lines:

I cannot seem to find any options that would allow me to tweak how my card is being used here. I have grown to hate the NAMD manuals…

Thank you for your time,

Luke

seibert · June 1, 2010, 3:30am

I’d be happy to run a NAMD benchmark on my GTX 470 at work if you can give me instructions on what to type. (Use small words: I’m a particle physicist, so I don’t know what molecules are. :) )

Edit: Also, does NAMD make efficient use of multiple GPUs? The system with the GTX 470 also has three GTX 295 cards in it (7 GPUs!), so if you want a massively multi-GPU measurement, I can do that too.

tmurray · June 1, 2010, 3:55am

NAMD certainly makes use of multiple GPUs.

lmount · June 1, 2010, 7:15am

I am also trying to bench NAMD 2.7b CUDA with MPICH2 in comparison to NAMD 2.7b MPICH2. So far I have a 3x boost. When I have more results I might as well post them here.

The command I am using is

./charmrun +p4 ./namd2 +idlepoll +devices 0,0,0,0 run.txt

on an Intel® Core™2 Quad CPU @ 2.40GHz with a Tesla C1060.

Copying from the notes.txt

Each namd2 process can use only one GPU. Therefore you will need to run at least one process for each GPU you want to use. Multiple processes can share a single GPU, usually with an increase in performance. NAMD will automatically distribute processes equally among the GPUs on a node. Specific GPU device IDs can be requested via the ++devices argument on the namd2 command line, for example:

./charmrun ++local +p4 ./namd2 +idlepoll +devices 0,2

Devices are selected cyclically from those available, so in the above example processes 0 and 2 will share device 0 and processes 1 and 3 will share device 2. One could also specify +devices 0,0,2,2 to cause device 0 to be shared by processes 0 and 1, etc. GPUs with two or fewer multiprocessors are ignored unless specifically requested with ++devices.

Luke_SLI · June 1, 2010, 1:54pm

Should I not have SLI enabled? NAMD is telling me that it only is seeing one device. :shrug:

AWESOME! Thank you… that makes much more sense.

I would really like to the see the speed up from a GTX 470! How many CPU’s are your using to be able to utilized all of those GPU’s?

I guess I am confused as to how I could assign GPU #2 to processes as well (in my case). With my dual core, I only have the ‘two’ NAMD processes. I guess the best I could do is the following?

Thank you for all the responses guys! I have always loved the NVIDIA forums. I think this would be a neat place to post some NAMD CUDA renderings and speedup times.

-Luke

seibert · June 1, 2010, 2:45pm

What CUDA driver are you using? SLI used to hide devices from CUDA, but I thought that was fixed a year ago. (SLI provides no benefit to CUDA, so unless you need it for OpenGL, you should turn it off.)

It is a quad-core 2.66 GHz Intel Core i7 processor with hyperthreading turned for 8 “virtual” cores. I find that in this current generation, hyperthreading enabled with 8 processes running gives me 50% more throughput on my jobs than 4 processes. (Edit: that’s for CPU-only jobs. Most of the GPU jobs this computer runs are pretty light on the CPU side, so the number of CPU cores is less critical.)

seibert · June 1, 2010, 2:56pm

OK, the 64-bit CUDA version seems to be working on my GPU node here, but I have no simulation configuration files to run. Can you point me at the files corresponding to the speed tests you posted above?

lmount · June 1, 2010, 3:00pm

If I am not mistaken, you can have more than 2 processes on a dual core machine.
For example you can have 4 processes at 50% each, instead of 2 and 100% per core.

__
Lampros

Luke_SLI · June 1, 2010, 3:30pm

Unfortunately I cannot. I’m running a TraPPE UA forcefield for my surfactant that I made, and I do not think I would be allowed to release it. :(

For CPU only, on some 920’s, I was seeing a ~12% speedup from HTT.

seibert · June 1, 2010, 5:47pm

OK, do you have a suggested generic configuration file to benchmark?

Luke_SLI · June 1, 2010, 6:04pm

I’m going to throw together a simple liquid simulation. Can I email the files to you?

tachyon_john · June 10, 2010, 2:46am

When you guys run NAMD with CUDA, make sure the outputEnergies config parameter is a large number, as any timestep that outputs energies currently falls back to the host. If you do it too often, you will slow down the GPU (or rather the GPU will be idle for more timesteps than it ought to…)

Cheers,
John

Luke_SLI · June 12, 2010, 10:05pm

Hi tachyon John,

I output energies and pressures between 10 to 20k for my npt simulations. I don’t really need to keep an eye on them at that point.

Speaking about tachyon, is there a way to perform the tachyon ray trace / rendering for VMD using the gpu instead of my cpu? Some of our renderings take as much as an hour :shrug: (using vmd 1.8.7 with the following command)

-Luke

tachyon_john · June 18, 2010, 3:56am

I haven’t had a chance to start working on adapting Tachyon for CUDA/OptiX, but it’s on my TODO list, believe me…

Cheers,

John Stone

Topic		Replies	Views
Looking for CUDA apps that can use more than 1 GPU. CUDA Programming and Performance	41	13489	December 9, 2009
Delivering up to 9X the Throughput with NAMD v3 and NVIDIA A100 GPU Technical Blog	0	504	August 25, 2020
CUDA NAMD two gpu error CUDA Setup and Installation	1	1658	July 26, 2021
run NAMD in GPU workstation with NVIDIA CUDA Programming and Performance	0	1218	July 17, 2013
Problem in running NAMD on Tesla Personal SuperComputer CUDA Programming and Performance	6	13116	August 25, 2009
GPU benchmark NAMD runs faster on VM than navie execution Linux	8	756	November 15, 2019
Binding Error Running NAMD on CUDA Computer CUDA Programming and Performance	2	2400	June 4, 2015
Advice on first CUDA system CUDA Programming and Performance	13	2825	July 7, 2009
CUDA hardware & software CUDA Programming and Performance	9	2795	November 13, 2010
Using more than 1 CUDA card at a time. Physics simulations flat out flying on GPU CUDA Programming and Performance	12	12695	March 12, 2010

My NAMD CUDA expirience thus far GTX 260 192sp

Related topics