Memory Issues Having Trouble with Card Memory

So I’m relatively new at progrmming CUDA but am having some trouble with card memory. Basically, what I want to do is allocate memory on the card and then use this variable in a function. It all works perfect in emulation mode, but the variable isn’t what I set it to when I run it on the card. Here’s some code snippets:

global void indfitness(float *fitness)
{
int i = threadIdx.x;

fitness[i]= 5;
}

void mexFunction( int nlhs, mxArray *plhs,
int nrhs, mxArray *prhs)
{
float *fitness;
float *fitnessout_float;

//Allocate float array in shared memory for use on GPU
cudaMalloc((void**)&fitness, sizeof(float)*popSize);

// Kernel invocation
dim3 dimBlock(popSize);
indfitness<<<1, dimBlock>>>(fitness);

//Copy back to host
cudaMemcpy((void*)fitnessout_float, (void*)fitness, sizeof(float)*popSize, cudaMemcpyDeviceToHost);
}

Is there anything wrong with this?

Thanks!

Note: when I compile on the device, I get the warnings “Advisory: Cannot tell what pointer points to, assuming global memory space” but I don’t understant why it’s giving these to me.

Thanks!

bump! I’ve resorted to developing in emulation mode until I figure out why this is happening, because there are no problems if I don’t run on the card…

It seems to be correct, assuming that

  • popSize is an int and has been initialized to some positive value

  • memory for fitnessout_float has been alloc’d

The following code is working for me, when ran on the GPU. It is a minor variation of yours

Hope it helps,

[codebox]#include

global void indfitness(float *fitness)

{

int i = threadIdx.x;

fitness[i]= 5;

}

int main() {

float *fitness;

float *fitnessout_float;

int popSize =10;

//Allocate float array in shared memory for use on GPU

cudaMalloc((void**)&fitness, sizeof(float)*popSize);

// Kernel invocation

dim3 dimBlock(popSize);

indfitness<<<1, dimBlock>>>(fitness);

fitnessout_float = (float *)malloc(sizeof(float)*popSize);

//Copy back to host

cudaMemcpy((void*)fitnessout_float, (void*)fitness, sizeof(float)*popSize, cudaMemcpyDeviceToHost);

for(int j = 0; j<popSize; j++)

cout << fitnessout_float[j] << " ";

}

[/codebox]

Thanks for the reply, but I’m still having trouble. Just for a small test, I wrote a small program (for use in matlab) that I attached here. It’s not at all working correctly; I don’t think it’s even running the kernel, because if I run in debug mode and put a print inside the kernel, it doesn’t print it even though it prints if I put a print before or after the kernel. Why would not not be even running the kernel, now?

Note that I’m also running on 64-bit Windows, but I’ve replaced the nvmexouts.bat, etc. with compatable verions for 64-bit, and so it compiles and runs, but just isn’t working correctly… Let me know if this does work for someone with 32-bit.

Thanks,

#include "mex.h"

#include "cutil.h"

void matrix_double2float( double *input_double, float *output_float,int N)

{

	int i;

	for (i = 0; i < N; i++) {

		output_float[i] = (float) input_double[i];

	}

}

void matrix_float2double( float *input_float, double *output_double,int N)

{

	int i;

	for (i = 0; i < N; i++) {

		output_double[i] = (double) input_float[i];

	}

}

__global__ void indfitness(float *fitness)

{

	int ind = threadIdx.x;

	fitness[ind] = (float)ind;

}

void mexFunction( int nlhs, mxArray *plhs[],

				  int nrhs, mxArray *prhs[])

{

	float *fitness;

	float *fitnessout_float;

	double *fitnessout;

	int size = 10;

	int dims[] = {size};

	plhs[0] = mxCreateNumericArray(1, dims, mxDOUBLE_CLASS, mxREAL);

	//Get a pointer to the data space in our newly allocated memory

	fitnessout = mxGetPr(plhs[0]);

	//Allocate float array in shared memory for use on GPU

	cudaMalloc((void**)&fitness, sizeof(float)*size);

	//CUDA_SAFE_CALL(cudaMemset(fitness, 0.0, sizeof(float)*size));

	fitnessout_float = (float*)mxMalloc(sizeof(float)*size);

	

	// Kernel invocation

	dim3 dimBlock(size);

	indfitness<<<1, dimBlock>>>(fitness);

	//copy back

	cudaMemcpy((void*)fitnessout_float, (void*)fitness, sizeof(float)*size, cudaMemcpyDeviceToHost);

	mexPrintf("[%f]\n", fitnessout_float[1]);

	//copy into return array

	matrix_float2double(fitnessout_float, fitnessout, size);

}

“printf” would NOT even compile inside a kernel unless you are in emulation mode.

i.e.

If you are compiling for device, you cannot have "printf"s inside your kernel. It wont even compile