[beginner ]cudaGetDeviceCount

Hi Folks,

for a courseworks I decided to work on GPGPU Computing ith Cuda.

Today I made my really first steps here and encountered a problem.

At least i think it is a problem.

Simple challange was to write a first program being able to read some data from my graphics card,

such as name, cores etc. So far so good. Here’s my code:

[codebox]#include <stdio.h>

#include <cuda_runtime.h>

int main() {

int device = 0; 

int gpuDeviceCount = 0; 

struct cudaDeviceProp properties; 



cudaError_t cudaResultCode = cudaGetDeviceCount(&gpuDeviceCount); 

if (cudaResultCode == cudaSuccess) 

{ 

	cudaGetDeviceProperties(&properties, device); 

	printf("%d GPU CUDA devices(s)(%d)\n", gpuDeviceCount, properties.major); 

	printf("\t Product Name: %s\n"		, properties.name);

	printf("\t TotalGlobalMem: %d MB\n"	, properties.totalGlobalMem/(1024^2));

	printf("\t GPU Count: %d\n"		, properties.multiProcessorCount);

	printf("\t Kernels found: %d\n"		, properties.concurrentKernels);

}

return 0; /* success */     

}

[/codebox]

I ran this on my workstation (GeForece 8100/nForce 720a all on board) and it returns me:

1 GPU CUDA devices(s)(1)

Product Name: GeForce 8100 / nForce 720a

TotalGlobalMem: 244833 MB

GPU Count: 1

Kernels found: 0

Then I ran this on my HTPC which is based on a Intel Atom N330 with NVidia Ion.

I have CoreAVC with Cuda support runnig there very well, but my code returns:

0 GPU CUDA devices(s)(1638084)

Product Name:

TotalGlobalMem: 0 MB

GPU Count: 2003237514

Kernels found: 4235560

Ok, true, the numbers are totally bulls… here, sure.

My Questions:

  1. Why 0 GPUs in the latter sceanrio or why does it not detect anything?

  2. How can I query the number of Cuda Cores?

  3. I know my code is fragile it was a short test, but beside from this is here something utterly wrong with it?

For my courseworks:

  1. Does anyone know some good source to get keyfigures about cores in graphics cards over the years? I found some figures on GFlops on Wikipedia.

Cheers,

ruphus

  1. I think that’s because the Nvidia Ion isn’t CUDA compatible.

  2. You already are. That card only has 1 Multiprocessor (MP), it seems. Multiply by 8 to get the number of Streaming Processors (SP).

3)Looks alright. But perhaps change

if (cudaResultCode == cudaSuccess)

to

if (cudaResultCode == cudaSuccess && gpuDeviceCount > 0)

and then try it on the laptop again.

Hm, according to the nvidia pages it support 16 High-Speed CUDA Cores (s. “First Generation ION”)

http://www.nvidia.com/object/io_72770.html

… Harness the power of 16 high-speed CUDA cores with NVIDIA® PureVideoâ„¢ HD technology for high-definition Blu-ray playback. …

Any Ideas?

Cheers

Driver too old for the version of toolkit you are running?

Get the driver version with cudaDriverGetVersion and the runtime version with cudaRuntimeGetVersion. Check that the driver version is not 0 and that the driver version is >= runtime version.

Yeah that would be my guess too. You need to have a driver version that is at least as new as the toolkit version used to build the app, and the runtime library must match the toolkit version used in the built.

And just to confirm, CUDA runs just fine on all versions of Ion, both the MCP79a based Ion/ion LE and the newer Ion 2 (which is really a discrete GPU anyway).

ah, great stuff.
I thought I updated my HTPC (the one with ION) to the mosts recent nVidia driver, but I will check this again this evening.

Last question, is there a general rule like zeus13i mentioned that 1 GPU == 8 SP(Cuda Cores?)
Or how can I calculate the useable cores?

Cheers

For compute capability less than 2.0, the formula is 8* multiProcessorCount, for compute capability 2.0 the formula is 32* multiProcessorCount. The compute capability is returned in major and minor from the same cudaGetDeviceProperties() call you are using to get the MP count.

Not 16?

I thought Fermi has 16 scalar processors which run 2 warps in parallel (8 SPs per each) ?

The dual warps bit is right, but my understanding is that the hardware has 32 cores per MP - right now available in 14 MP (448 core) or 15 MP (480 core) variants. The original architecture white paper described Fermi as having 16 MP (512 cores), but that has apparently proved beyond the design/process capabilities…