Passing a pointer to constant memory

Hello all,

I had a similar post on this forum, but I mistakenly posted in the wrong group. so I apologize to all who have already read my question.

I’m trying to write a CUDA kernel that can read from different arrays in constant memory. The array that needs to be operated on is specified by the host code. The purpose is to implement a filter than can use multiple window functions. In my application, I will have to switch between 4 different windows very often, so I’m trying to avoid having to pass the windows across the PCI bus every time I run the filter.

My approach is to pass a pointer to the kernel for the particular array that is required. I have written test code below that doesn’t print the expected values.

[codebox]#include <stdio.h>

#include <cutil.h>

//Allocate constant memory, constantA - constantD

//shortened for readability

#define SIZE 10

device constant float constantA;

//Return pointers to the constant memory

float* getA(){

return constantA;


//kernel copies data from constant memory to global memory

global void getConstant(float* d_ptr, const float* constant){

d_ptr[threadIdx.x] = constant[threadIdx.x];


int main(){

//Allocate memory

float* h_data = new float;

float* h_constantA = new float;

float* d_data;

CUDA_SAFE_CALL( cudaMalloc( (void **)&d_data, SIZE*sizeof(float)) );

//Initiaze memory

for(int i = 0; i < SIZE; i++)

  h_constantA[i] = i;

//Copy memory to constant space on the GPU

CUDA_SAFE_CALL( cudaMemcpyToSymbol(reinterpret_cast<const char*> (getA()), h_constantA, SIZE*sizeof(float)) );

dim3 block(1);

dim3 thread(SIZE);

getConstant<<<block, thread>>>(d_data, getA());

CUDA_SAFE_CALL( cudaMemcpy(h_data, d_data, SIZE*sizeof(float), cudaMemcpyDeviceToHost) );

for (int i = 0; i < SIZE; i++)

  printf("%f ",h_data[i]);



If I change the kernel to the code below, it works fine, but doesn’t allow me to select witch window to use.

[codebox]global void getConstant(float* d_ptr, const float* constant){

d_ptr[threadIdx.x] = constantA[threadIdx.x];



Does anyone have any ideas on how to make this work?



I would use a template. It gets compiled into 4 functions, but you keep 1 sourcecode.