Hi there,
I succeded installing nvidia driver (185.18.36) into a gentoo linux (2.6.31) dom0 (Xen 3.4.1). Then I installed CUDA toolkit (2.0) and cuda sdk (2.02) but I am experiencing some problems using CUDA programs. Some sdk applications work but most of them don’t. For istance the output of deviceQuery program is:
There is 1 device supporting CUDA
Device 0: “GeForce GTX 280”
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 1073414144 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.30 GHz
Concurrent copy and execution: Yes
Test PASSED
Press ENTER to exit…
I also used bandwidthTest and it worked. Anyway, the remaining applications don’t. As an expample, running Montecarlo I get:
Using device 0: GeForce GTX 280
Generating input data…
Allocating memory…
Generating normally distributed samples…
Running GPU Monte Carlo…
Options : 256
Simulation paths: 262144
Time (ms.) : 11.275000
GPU options per sec.: 22705.100546
GPU Monte Carlo vs. Black-Scholes statistics
L1 norm : 1.000000E+00
Average reserve: 0.000000
TEST FAILED
CPU Monte Carlo vs. Black-Scholes statistics…
L1 norm: 2.970427E-06
Average reserve: 0.000000
CPU vs. GPU Monte Carlo statistics…
L1 norm: 1.000000E+00
Shutting down…
Press ENTER to exit…
Looking at the code, it seems that communication functions (such as cudaMemcpy) don’t work. To prove it I wrote a little program. It performs the following simple steps:
- Initialize a vector in the CPU Mem and copies it to GPU Mem (I previous allocated them using malloc and CudaMalloc)
- Copies back the GPU vector to another CPU vector (different from the step 1 cpu vector)
The CPU vector at step 2 does not change. I had no error but the code does not work. I don’t know which CudaMemcpy call (CPU → GPU or GPU → CPU) fails.
Does anyone experienced something like that? What could be the issue in using cuda on dom0? Maybe, do I need a particular dom0 configuration?
Finally, is Cuda supported by NVIDIA for Xen kernels? I read some thread about NVIDIA support for Xen and I succeded installing the driver and using a X server under Xen. However, I did not understand yet if Cuda is supported.
Any suggestion is appreciated.
Thank you in advance.
Giovanni