Startup Script for Runlevel 3 Headless boot problems

thezim · April 20, 2008, 5:57pm

Hello All,

I’m having a curious problem on a linux workstation that starts in runlevel 3 (headless). I’m using one of the RHEL startup scripts as posted here in the forums, and it does a wonderful job of making CUDA accessible at boot time without the need to start X. The problem is that we can’t realize the speed improvement of using pinned memory until we do start X. For example, I obtain the following outputs from the bandwidth test after a clean boot (VNC’d in to the server):

-----------------------------------------------------------------------------------------------gpu-server1:~/sdk/bin/linux/release> ./bandwidthTest --memory=pinned
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1644.6

Quick Mode
Device to Host Bandwidth for Pinned memory
testDeviceToHostTransfer, elapsedTimeInMs = 191.796005
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1668.4

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 28712.4

&&&& Test PASSED

Press ENTER to exit…

…And the following results after simply logging in at the console, starting X, and then exiting X:

gpu-server1:~/sdk/bin/linux/release> ./bandwidthTest --memory=pinned
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3180.2

Quick Mode
Device to Host Bandwidth for Pinned memory
testDeviceToHostTransfer, elapsedTimeInMs = 102.746002
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3114.5

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 70827.8

&&&& Test PASSED

Press ENTER to exit…

Clearly, there is something else that needs to be initialized that the startup scripts posted to date have not addressed (not a dig on the scripts posted to date - they do work great!). I think it must be in initializing the DMA controller, which is what really gives the speed improvement for pinned memory.

Any suggestions on how to manually initialize the DMA controller without starting X would be appreciated.

Thanks!

seibert · April 20, 2008, 7:39pm

I noticed this behavior as well on my 8800 GTX under RHEL5 with CUDA 1.1 and the 169.09 driver. I just recently upgraded the system to the CUDA 2.0 beta driver 174.55. However, I haven’t rebooted the system since loading the new kernel module to see if I still need to start X to get the Device to Device bandwidth fixed. (Presumably, the controller is still initialized from when I powered on a week ago and loaded X briefly.)

Which NVIDIA driver are you using? I’ll get access to the computer tomorrow and will be able to tell you if 174.55 fixes this.

thezim · April 20, 2008, 7:55pm

Hi Seibert,

I’m using the 169.09 driver.

Thanks!

seibert · April 21, 2008, 4:31pm

The Device-to-Device bandwidth problem appears to be fixed in the beta 174.55 driver.

I started the system from a cold boot, and only ran the CUDA startup script posted in the forum, but did not start X. The DtoD bandwidth was standard 71000 MB/sec value I see during normal operation.

jimh · April 21, 2008, 6:54pm

Did you try the H->D or D->H pinned transfers?

kristleifur · April 22, 2008, 10:12am

Hi,

sorry I can’t help with the main issue, but I’d like to point out a tool that has been very useful to me: Nomachine / NX. It’s a remote desktop client, only MUCH faster than VNC. It’s more intelligent about caching, and it works at near-local speed even on transatlantic connections. I was using it to great effect across continents the other day, on a crappy hotel Internet link. It’s free too, for two server connections per machine.

Download:

http://www.nomachine.com/

BTW, CUDA works correctly through a Nomachine connection.