bugBot
September 4, 2008, 7:46pm
1
I recently ported some of my stuff from linux cuda 1.1 to Mac cuda 2.0 and I have problem with the code while running.
I do not get any compile time error, but I get wrong run-time output because of problem with cudaMemCpy. To simplify the problem, consider the code snippet below (Note that I am not invoking kernel at all):
float *a,*b;
a = (float *) malloc( sizeof(float ) * (N+1) *(N+1) );
for (int ii = 0; ii <= N; ii++) {
for (int jj = 0; jj <= N; jj++) {
a[ii+jj*(N+1)] = 1.0;
}
}
cudaMalloc((void**)&b, sizeof(float)(N+1) (N+1));
cudaMemcpy(b, a,sizeof(float)(N+1) (N+1),cudaMemcpyHostToDevice);
float *c;
c = (float *) malloc( sizeof(float ) * (N+1) (N+1) );
for (int ii = 0; ii <= N; ii++) {
for (int jj = 0; jj <= N; jj++) {
c[ii+jj (N+1)] = 2.0;
}
}
cudaMemcpy(c, b,sizeof(float)(N+1) (N+1),cudaMemcpyDeviceToHost);
int i,j;
printf(" after is…\n “);
for(i=0;i<=N;i++) {
for(j=0;j<=N;j++) {
printf(” %f “, c[(i*(N+1))+j]);
}
printf(”\n");
}
The expected output is a matrix with all entries 1.0. Whereas I get the original matrix with all entries as 2.0. Can someone help me in this?
Thanks.
bugBot
September 4, 2008, 7:52pm
2
Btw, it works fine in emulation mode and i get proper output. (A matrix with all entries as 1.0)
Please provide a complete test app which reproduces the problem.
bugBot
September 4, 2008, 8:02pm
4
// includes, system
include <stdlib.h>
include <stdio.h>
include <string.h>
include <math.h>
// includes, project
include <cutil.h>
////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int
main( int argc, char** argv)
{
int N= 16;
float *a,*b;
a = (float *) malloc( sizeof(float ) * (N+1) *(N+1) );
for (int ii = 0; ii <= N; ii++) {
for (int jj = 0; jj <= N; jj++) {
a[ii+jj*(N+1)] = 1.0;
}
}
cudaMalloc((void**)&b, sizeof(float)(N+1) (N+1));
cudaMemcpy(b, a,sizeof(float)(N+1) (N+1),cudaMemcpyHostToDevice);
float *c;
c = (float *) malloc( sizeof(float ) * (N+1) *(N+1) );
for (int ii = 0; ii <= N; ii++) {
for (int jj = 0; jj <= N; jj++) {
c[ii+jj*(N+1)] = 2.0;
}
}
cudaMemcpy(c, b,sizeof(float)(N+1) (N+1),cudaMemcpyDeviceToHost);
ifdef DEBUG
int i,j;
printf(" after is…\n ");
for(i=0;i<=N;i++) {
for(j=0;j<=N;j++) {
printf(" %f ", c[(i*(N+1))+j]);
}
printf(“\n”);
}
endif
}
This is it. Let me know if you need anything else.
bugBot
September 4, 2008, 9:17pm
5
Is anyone able to reproduce this problem?
Thanks.
tmurray
September 4, 2008, 10:22pm
6
All 1.0s for me here on a MacBook Pro. What hardware are you using?
bugBot
September 5, 2008, 11:04pm
7
Using NVDIA GeForce 8800 GS with MAC OS X 10.5.2 and CUDA 2.0
mfatica
September 5, 2008, 11:39pm
8
Your code worked ok on my Mac Pro with a 8800 GT.
What is the output of deviceQuery?
bugBot
September 6, 2008, 12:19am
9
Your code worked ok on my Mac Pro with a 8800 GT.
What is the output of deviceQuery?
[snapback]436037[/snapback]
There is no device supporting CUDA.
Device 0: “Device Emulation (CPU)”
Major revision number: 9999
Minor revision number: 9999
Total amount of global memory: 4294967295 bytes
Number of multiprocessors: 16
Number of cores: 128
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 1
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.35 GHz
Concurrent copy and execution: No
Test PASSED
Press ENTER to exit…
Ouch… Why does this happen? My system profiler shows 8800 GS.
mfatica
September 6, 2008, 12:21am
10
Try to reinstall the toolkit.
It should ask you to reboot ( if not check in the custom setting that you are loading the kernel module for cuda)
bugBot
September 6, 2008, 12:52am
11
Try to reinstall the toolkit.
If should ask you to reboot ( if not check in the custom setting that you are loading the kernel module for cuda)
[snapback]436048[/snapback]
That worked! Thanks a lot mfatica for your reply. :)