Below is a simple test program.
It gives the expected output.
However, the adresses printed Ka/Kb, that are passed into the kernel, are not the same inside the kernel a/b.
I find this confusing, and while I first thought this was the reason my real program seg faults, I noticed the same thing happens here.
Could anyone please elaborate on what is happening?
Compiled and executed like: nvcc -g -G Kernel.cu -o Kernel.bin && cuda-gdb ./Kernel.bin
Commands to cuda-gdb:
break TestKernel
run
print a
print b
Relevant part of my output from cuda-gdb:
(cuda-gdb) break TestKernel
(cuda-gdb) run
Ka: 0x1000500 Kb: 0x1000600
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
Breakpoint 1, TestKernel () at Kernel.cu:5
5 if(threadIdx.x > num) return;
Current language: auto; currently c++
(cuda-gdb) print a
$1 = (float * const @global) 0x14e94e0
(cuda-gdb) print b
$2 = (float * const @global) 0x14ea2f0
[codebox]#include
using namespace std;
global void TestKernel(float *a, float *b, const int num) {
if(threadIdx.x > num) return;
a[threadIdx.x] = b[threadIdx.x] * b[threadIdx.x];
}
int main() {
const int size = 10;
float a;
float b;
float *Ka = NULL;
float *Kb = NULL;
for(int i = 0; i < size; i++) {
a[i] = 0;
b[i] = i * 2;
}
cudaMalloc((void**)&Ka, size * sizeof(float));
cudaMalloc((void**)&Kb, size * sizeof(float));
cout << "Ka: " << Ka << “\t” << "Kb: " << Kb << endl;
cudaMemcpy(Kb, b, sizeof(float) * size, cudaMemcpyHostToDevice);
TestKernel<<<1, size>>>(Ka, Kb, size);
cudaMemcpy(a, Ka, sizeof(float) * size, cudaMemcpyDeviceToHost);
for(int i = 0; i < size; i++) {
cout << a[i] << "\t\t" << b[i] << endl;
}
}[/codebox]