CUDA, Total Global Memory: 0.000 Gbytes. Why?

Hi,

I took a tough start in CUDA with the PGI Fortran compiler. I firmly intend to make religion here, but can’t seem to get around the problem that the cufinfo tells me that my card has no global memory, i.e.,

Device Number: 0
Device Name: Device Emulation (CPU)
Total Global Memory: 0.000 Gbytes      <---- this line
sharedMemPerBlock: 16384 bytes
regsPerBlock: 8192
warpSize: 1                            <---- is this correct by the way? shouldn't it be 32?
maxThreadsPerBlock: 512
maxThreadsDim: 512 x 512 x 64
maxGridSize: 65535 x 65535 x 1
ClockRate: 1.350 GHz
Total Const Memory: 65536 bytes
Compute Capability Revision: 9999.9999
TextureAlignment: 256 bytes
deviceOverlap: F
multiProcessorCount: 16
integrated: T
canMapHostMemory: T

The above was run on a MacBook Pro equipped with a GeForce 8600M GT. As a result, the matmult example returns an error when allocating memory on the device for the matrix.

Many thanks for your help!

Sohrab

Following my last thread, here is what deviceQuery from the CUDA SDK tells:

 CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 8600M GT"
  CUDA Driver Version:                           3.0
  CUDA Runtime Version:                          3.0
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         1
  Total amount of global memory:                 134021120 bytes
  Number of multiprocessors:                     4
  Number of cores:                               32
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    0.94 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     Yes
  Integrated:                                    No
  Support host page-locked memory mapping:       No
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 53331, CUDA Runtime Version = 3.0, NumDevs = 1, Device = GeForce 8600M GT


PASSED

However even though deviceQuery tells me that my GPU does have memory and I managed to run some toy codes written in C, I still cannot allocate any variable on the device using Fortran.

Ideas anyone?

You should post your code so that we can know how you allocate data on device memory.

Tuan

Hi Sohrab Kehtari,

I was able to recreate the issue here on a MacBook Pro. It appears that the CUDA 2.3 libraries that we ship with the compilers are incompatible with NVIDIA’s CUDA 3.0 MacOS driver. To fix either rename or remove the “/opt/pgi/osx86/2010/cuda/2.3” directory and compile with “-ta=nvidia,cuda3.0” when using the PGI Accelerator model or “-Mcuda=cuda3.0” when using CUDA Fortran.

Note that the incompatibility seems to only occur with devices using compute capability 1.1.

Hope this helps,
Mat

Many thanks Mat, this was very helpful and did fix the problem.

Best regards,
Sohrab

Mat, this raises the question - when will PGI begin shipping CUDA 3.0, or 3.1?

Malcolm

Hi Malcolm,

We started shipping CUDA 3.0 with the 10.4 release. Future versions of CUDA will be added after they are officially released by NVIDIA (i.e. not Beta) and once we have validated it.

  • Mat