Why does CUDA cuInit() affect Named Pipes latency under Linux Red-Hat ?

I have been programming a TESLA S1070 using the CUDA SDK for Linux Red-Hat.

The application logic seems to work fine, but there seems to be some weird interaction between the Driver API and the OS. I have written 2 processes and the interproc mechanism (same server) is based on 2 Named Pipes (or FIFOs).

Under normal condition a 200-byte message written to a Named Pipe by Process #1 (which is NOT using the Driver API) may take up to 400 microseconds to be read by the process #2 (which is hosting the Driver API and interacts with the Tesla S1070).

If all the calls to the Driver API except cuInit() have been removed, the latency of messages send up and down the Named Pipe is still the same.

Once the cuInit() call is removed, latency drops to 5 microseconds !!!

Are there any reasons for this behavior ? And, what’s more, how can it be avoided ?

Any help will be greatly appreciated.