trying to copy array to Device -> Device -> Host

Hi All

I’m studying this beginners tutorial and tried one of the first examples:

#include <assert.h>

#include <stdio.h>

int main(void)

{

  float *a_h, *b_h; // host data

  float *a_d, *b_d; // device data

  int N = 14, nBytes, i;

  nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

  b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

  cudaMalloc((void **) &b_d, nBytes);

for (i=0; i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

  cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

  cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) {

	printf("%f == %f\n", a_h[i], b_h[i]);

	//assert( a_h[i] == b_h[i] );

  }

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

On my MacBook pro I can compile and run it, however it prints:

100.000000 == 0.000000

101.000000 == 0.000000

102.000000 == 0.000000

103.000000 == 0.000000

104.000000 == 0.000000

105.000000 == 0.000000

106.000000 == 0.000000

107.000000 == 0.000000

108.000000 == 0.000000

109.000000 == 0.000000

110.000000 == 0.000000

111.000000 == 0.000000

112.000000 == 0.000000

113.000000 == 0.000000

Any suggestions what might go wrong with this code ?

Thnx

LuCa

It works ok on my Macbook Pro, are you sure that CUDA is installed properly?

What is the output of the deviceQuery SDK example?

#include <cuda_runtime.h>

#include <stdlib.h>

#include <stdio.h>

int main(void)

{

 float *a_h, *b_h; // host data

 float *a_d, *b_d; // device data

 int N = 14, nBytes, i;

 nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

 b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

 cudaMalloc((void **) &b_d, nBytes);

for (i=0; i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

 cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

 cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) {

   printf("%f == %f\n", a_h[i], b_h[i]);

   //assert( a_h[i] == b_h[i] );

 }

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}
nvcc -os sample_copy sample_copy.c
./sample_copy 

100.000000 == 100.000000

101.000000 == 101.000000

102.000000 == 102.000000

103.000000 == 103.000000

104.000000 == 104.000000

105.000000 == 105.000000

106.000000 == 106.000000

107.000000 == 107.000000

108.000000 == 108.000000

109.000000 == 109.000000

110.000000 == 110.000000

111.000000 == 111.000000

112.000000 == 112.000000

113.000000 == 113.000000

CUDA Device #0
Major revision number: 9999
Minor revision number: 9999
Name: Device Emulation (CPU)
Total global memory: 4294967295
Total shared memory per block: 16384
Total registers per block: 8192
Warp size: 1
Maximum memory pitch: 262144
Maximum threads per block: 512
Maximum dimension 0 of block: 512
Maximum dimension 1 of block: 512
Maximum dimension 2 of block: 64
Maximum dimension 0 of grid: 65535
Maximum dimension 1 of grid: 65535
Maximum dimension 2 of grid: 1
Clock rate: 1350000
Total constant memory: 65536
Texture alignment: 256
Concurrent copy and execution: No
Number of multiprocessors: 16
Kernel execution timeout: No

CUDA is not installed properly on your system:
Name: Device Emulation (CPU)

Reinstall the toolkit and be sure that the CUDA Kext is selected during installation (under customize option).

to bad :(

If I download the toolkit via [url=“CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer”]http://www.nvidia.com/object/cuda_get.html[/url] there are no questions asked during installation. Is there an alternative installation procedure ?

Should the test code not also work in emulation mode ?

BTW, is my graphics card good enough ?

GeForce 8600M GT:

Chipset Model: GeForce 8600M GT
Type: Display
Bus: PCIe
PCIe Lane Width: x16
VRAM (Total): 512 MB
Vendor: NVIDIA (0x10de)
Device ID: 0x0407
Revision ID: 0x00a1
ROM Revision: 3212
Displays:
Color LCD:
Resolution: 1680 x 1050
Depth: 32-bit Color
Core Image: Hardware Accelerated
Main Display: Yes
Mirror: Off
Online: Yes
Quartz Extreme: Supported
Built-In: Yes
Display Connector:
Status: No display connected

thnx a lot!!
LuCa

I did the installation again and finally noticed the ‘customization’ button :) and enabled Kext!

Now it works!!!

Thnx a lot

LuCa

When you install, in the installation type screen, there is a customize button ( left lower corner).
Click it and in the next screen ("Custom installation "), be sure that the CUDAKext radio button is checked.