skchoe
1
Hi, I have problem running simple code.
The code is really simple. It uses only 1 block, 1 thread. Kernel returns array of values containing 1, 2 resp.
I cannot get the values in host. I want my memory alloc, and args-passing is right.
Any help?
kernel part:
extern "C"
__global__ void cpyTest(int* answer)
{
answer[0] = 1;
answer[1] = 2;
}
Part of host code:
CUdeviceptr d_0;
CU_SAFE_CALL(cuMemAlloc( &d_0, sizeof(int) * 2));
// Calling kernel
int sf = sizeof(int);
CU_SAFE_CALL(cuFuncSetBlockShape(cpyTest, 1, 1, 1));
CU_SAFE_CALL(cuParamSeti(cpyTest, 0, d_0));
CU_SAFE_CALL(cuParamSetSize(cpyTest, sf*2));
CU_SAFE_CALL(cuLaunchGrid(cpyTest, 1, 1));
int *h_0 = (int*)malloc(sf * 2);
CU_SAFE_CALL(cuMemcpyDtoH(h_0, d_0, sf * 2));
printf("answer = %d, %d\n", h_0[0], h_0[1]);
After running, printf wrote:
answer = -1213066928, 134817120
It seems that cuMemcpyDtoH doesn’t copy from device to host. Any idea is appreciated.
S.
skchoe
2
The simpler version using runtime API answers strange too.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <cuda.h>
#include <cutil.h>
#include <math_functions.h>
extern "C"
__global__ void
cpyTest(int* answer)
{
answer[0] = 1;
answer[1] = 2;
}
////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int
main(int argc, char** argv)
{
int sf = sizeof(int);
int* d_0;
CUDA_SAFE_CALL(cudaMalloc( (void**) &d_0, sf * 2));
dim3 threads(1, 1);
dim3 grids(1, 1);
// Calling kernel
cpyTest<<<grids, threads>>>(d_0);
int *h_0 = (int*)malloc(sf);
CU_SAFE_CALL(cudaMemcpy(h_0, d_0, sf, cudaMemcpyDeviceToHost));
printf("answer = %d, %d\n", h_0[0], h_0[1]);
free(h_0);
CUT_EXIT(argc, argv);
}
The answer should be 1, 2 but
it prints:
answer = 1, 828337523
Press ENTER to exit…
Thanks,
S.
cudaMemcpy(h_0, d_0, sf, cudaMemcpyDeviceToHost)
should be
cudaMemcpy(h_0, d_0, sf*2, cudaMemcpyDeviceToHost