Why aren't the adresses in the kernel the same as those I pass in?

Below is a simple test program.

It gives the expected output.

However, the adresses printed Ka/Kb, that are passed into the kernel, are not the same inside the kernel a/b.

I find this confusing, and while I first thought this was the reason my real program seg faults, I noticed the same thing happens here.

Could anyone please elaborate on what is happening?

Compiled and executed like: nvcc -g -G Kernel.cu -o Kernel.bin && cuda-gdb ./Kernel.bin

Commands to cuda-gdb:

break TestKernel


print a

print b

Relevant part of my output from cuda-gdb:

(cuda-gdb) break TestKernel

(cuda-gdb) run

Ka: 0x1000500 Kb: 0x1000600

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, TestKernel () at Kernel.cu:5

5 if(threadIdx.x > num) return;

Current language: auto; currently c++

(cuda-gdb) print a

$1 = (float * const @global) 0x14e94e0

(cuda-gdb) print b

$2 = (float * const @global) 0x14ea2f0


using namespace std;

global void TestKernel(float *a, float *b, const int num) {

if(threadIdx.x > num) return;

a[threadIdx.x] = b[threadIdx.x] * b[threadIdx.x];


int main() {

const int size = 10;

float a;

float b;

float *Ka = NULL;

float *Kb = NULL;

for(int i = 0; i < size; i++) {

  a[i] = 0;

  b[i] = i * 2;


cudaMalloc((void**)&Ka, size * sizeof(float));

cudaMalloc((void**)&Kb, size * sizeof(float));

cout << "Ka: " << Ka << “\t” << "Kb: " << Kb << endl;

cudaMemcpy(Kb, b, sizeof(float) * size, cudaMemcpyHostToDevice);

TestKernel<<<1, size>>>(Ka, Kb, size);

cudaMemcpy(a, Ka, sizeof(float) * size, cudaMemcpyDeviceToHost);

for(int i = 0; i < size; i++) {

  cout << a[i] << "\t\t" << b[i] << endl;