Why does CUDA cuInit() affect Named Pipes latency under Linux Red-Hat ?

I have been programming a TESLA S1070 using the CUDA SDK for Linux Red-Hat.

CONFIGURATION DETAILS

  1. Operating System: LINUX RED-HAT ENTERPRISE SERVER 5.3
  2. CUDA toolkit version: 2.1
  3. SDK version: 2.1
  4. Compiler for CPU host code: GCC 4.1.2
  5. System description:
    • 1 x INTEL XEON QUAD CORE E5405 2.0Ghz
    • 2GB DDE2 SDRAM 667Mhz/PC2-5300
    • HP PROLIANT DL160 G5
    • CHIPSET: INTEL 5400
    • INTEGRATED VIDEO CARD 32 MB SHARED
  6. TESLA S1070.

The application logic works fine, but there seems to be some weird interaction between the Driver API and the OS. We have written 2 processes and the interproc mechanism (same server) is based on 2 Named Pipes (or FIFOs).

Under normal condition a 200-byte message written to a Named Pipe by Process #1 (with two threads NONE of which is using the Driver API) may take up to 400 microseconds to be read by Process #2 (with ONE thread which is managing the 2 named pipes and interfacing the Tesla S1070 via the Driver API).

If all the calls to the Driver API except cuInit() are removed, the latency of messages sent up and down the Named Pipe is still the same as before.

Once the cuInit() call is removed, latency drops to 5 microseconds !!!

Are there any reasons for this behavior ? And, what’s more, how can it be avoided ?

Any help will be greatluy appreciated.