Cuda Issues on Xen dom0

Hi there,
I succeded installing nvidia driver (185.18.36) into a gentoo linux (2.6.31) dom0 (Xen 3.4.1). Then I installed CUDA toolkit (2.0) and cuda sdk (2.02) but I am experiencing some problems using CUDA programs. Some sdk applications work but most of them don’t. For istance the output of deviceQuery program is:

There is 1 device supporting CUDA

Device 0: “GeForce GTX 280”
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 1073414144 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.30 GHz
Concurrent copy and execution: Yes

Test PASSED

Press ENTER to exit…

I also used bandwidthTest and it worked. Anyway, the remaining applications don’t. As an expample, running Montecarlo I get:


Using device 0: GeForce GTX 280
Generating input data…
Allocating memory…
Generating normally distributed samples…
Running GPU Monte Carlo…
Options : 256
Simulation paths: 262144
Time (ms.) : 11.275000
GPU options per sec.: 22705.100546
GPU Monte Carlo vs. Black-Scholes statistics
L1 norm : 1.000000E+00
Average reserve: 0.000000
TEST FAILED
CPU Monte Carlo vs. Black-Scholes statistics…

L1 norm: 2.970427E-06
Average reserve: 0.000000
CPU vs. GPU Monte Carlo statistics…
L1 norm: 1.000000E+00
Shutting down…

Press ENTER to exit…

Looking at the code, it seems that communication functions (such as cudaMemcpy) don’t work. To prove it I wrote a little program. It performs the following simple steps:

  1. Initialize a vector in the CPU Mem and copies it to GPU Mem (I previous allocated them using malloc and CudaMalloc)
  2. Copies back the GPU vector to another CPU vector (different from the step 1 cpu vector)
    The CPU vector at step 2 does not change. I had no error but the code does not work. I don’t know which CudaMemcpy call (CPU → GPU or GPU → CPU) fails.

Does anyone experienced something like that? What could be the issue in using cuda on dom0? Maybe, do I need a particular dom0 configuration?
Finally, is Cuda supported by NVIDIA for Xen kernels? I read some thread about NVIDIA support for Xen and I succeded installing the driver and using a X server under Xen. However, I did not understand yet if Cuda is supported.
Any suggestion is appreciated.

Thank you in advance.
Giovanni