Cuda Doesn't seem to work at all repost from /index.php?showtopic=183202 (wrong section)

Hi everyone. I’m encountering a very strange problem: Mu 9800GT doesnt seem to calculate at all.
I’ve tried all hello-worlds i’ve found in the internet, here’s one of them:

this program creates 1…100 array on hosts, sends it to device, calculates a square of each value, returns it to host, prints the results.

[indent]
#include “stdafx.h”

#include <stdio.h>
#include <cuda.h>

__global__ void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] * a[idx];
}

// main routine that executes on the host
int main(void)
{
float *a_h, *a_d; // Pointer to host & device arrays
const int N = 100; // Number of elements in arrays
size_t size = N * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);
// Cleanup
free(a_h); cudaFree(a_d);
}[/indent]

so the output is expected to be:
1 1.000
2 4.000
3 9.000
4 16.000

I swear back in 2009 it worked perfectly (vista 32, deviceemu)

now i get output:
1 1.000
2 2.000
3 3.000
4 4.000

so my card doesnt do anything. What can be the problem?
Configuration is:
win7x64
visual studio 2010 32bit
cuda toolkit 3.2 64bit

compilation settings: cuda 3.2 toolkit, 32-bit target platform, deviceemu or not - doesnt matter, the results are the same.

i also tried it on my vmware xp(32bit) visual studio 2008. the result is the same.

Please help me, i barely made the programe to compile, now i need it to work.
Sample solution project and .cu file are attached.

Thanks, Ilya.
cuda2.rar (2.79 KB)

Hi everyone. I’m encountering a very strange problem: Mu 9800GT doesnt seem to calculate at all.
I’ve tried all hello-worlds i’ve found in the internet, here’s one of them:

this program creates 1…100 array on hosts, sends it to device, calculates a square of each value, returns it to host, prints the results.

[indent]
#include “stdafx.h”

#include <stdio.h>
#include <cuda.h>

__global__ void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] * a[idx];
}

// main routine that executes on the host
int main(void)
{
float *a_h, *a_d; // Pointer to host & device arrays
const int N = 100; // Number of elements in arrays
size_t size = N * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);
// Cleanup
free(a_h); cudaFree(a_d);
}[/indent]

so the output is expected to be:
1 1.000
2 4.000
3 9.000
4 16.000

I swear back in 2009 it worked perfectly (vista 32, deviceemu)

now i get output:
1 1.000
2 2.000
3 3.000
4 4.000

so my card doesnt do anything. What can be the problem?
Configuration is:
win7x64
visual studio 2010 32bit
cuda toolkit 3.2 64bit

compilation settings: cuda 3.2 toolkit, 32-bit target platform, deviceemu or not - doesnt matter, the results are the same.

i also tried it on my vmware xp(32bit) visual studio 2008. the result is the same.

Please help me, i barely made the programe to compile, now i need it to work.
Sample solution project and .cu file are attached.

Thanks, Ilya.

i’ve figured that out. cuda driver version and sdk version were not compatible

i’ve figured that out. cuda driver version and sdk version were not compatible

The program works ok, so its your compiler settings, probably. Maybe you can add the compiler output.

It should be something like:

[codebox]1>------ Build started: Project: yetanothertest2, Configuration: Debug Win32 ------

1>Compiling with CUDA Build Rule…

1>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe” -G0 -gencode=arch=compute_13,code="sm_13,compute_13" --machine 32 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -D_NEXUS_DEBUG -g -Xcompiler “/EHsc /W3 /nologo /Od /Zi /MTd " -I”./" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1/C/common/inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1/shared/inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -maxrregcount=32 --compile -o “Win32\Debug/main.cu.obj” main.cu

1>main.cu

1>tmpxft_00001738_00000000-3_main.cudafe1.gpu

1>tmpxft_00001738_00000000-8_main.cudafe2.gpu

1>main.cu

1>tmpxft_00001738_00000000-3_main.cudafe1.cpp

1>tmpxft_00001738_00000000-14_main.ii

1>Linking…

1>LINK : C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1/C/bin/Win32/Debug\yetanothertest2.exe not found or not built by the last incremental link; performing full link

1>Embedding manifest…

1>Build log was saved at “file://c:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1\C\src\yetanothertest2\Win32\Debug\BuildLog.htm”

1>yetanothertest2 - 0 error(s), 0 warning(s)

========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

[/codebox]

deviceemu is no longer supported AFAIK, probably since 3.0.

I can’t check your VS2010 sln and vcproj, am still on VS2008

The program works ok, so its your compiler settings, probably. Maybe you can add the compiler output.

It should be something like:

[codebox]1>------ Build started: Project: yetanothertest2, Configuration: Debug Win32 ------

1>Compiling with CUDA Build Rule…

1>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe” -G0 -gencode=arch=compute_13,code="sm_13,compute_13" --machine 32 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -D_NEXUS_DEBUG -g -Xcompiler “/EHsc /W3 /nologo /Od /Zi /MTd " -I”./" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1/C/common/inc" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1/shared/inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -maxrregcount=32 --compile -o “Win32\Debug/main.cu.obj” main.cu

1>main.cu

1>tmpxft_00001738_00000000-3_main.cudafe1.gpu

1>tmpxft_00001738_00000000-8_main.cudafe2.gpu

1>main.cu

1>tmpxft_00001738_00000000-3_main.cudafe1.cpp

1>tmpxft_00001738_00000000-14_main.ii

1>Linking…

1>LINK : C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1/C/bin/Win32/Debug\yetanothertest2.exe not found or not built by the last incremental link; performing full link

1>Embedding manifest…

1>Build log was saved at “file://c:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.0-3.1\C\src\yetanothertest2\Win32\Debug\BuildLog.htm”

1>yetanothertest2 - 0 error(s), 0 warning(s)

========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

[/codebox]

deviceemu is no longer supported AFAIK, probably since 3.0.

I can’t check your VS2010 sln and vcproj, am still on VS2008