cudaMalloced memory cannot be used in other functions memory managment

muzhikas · May 24, 2010, 3:39pm

From my main.cpp I call a initializeVision() in another C++ file that calls initializeStereoCuda();

initializeStereoCuda() mallocs all the variables needed for later computation. Here is the code:

[codebox]device float3 * LuvImageLeft2;

device float3 * LuvImageRight2;

device uchar3 *LeftImage2;

device uchar3 *RightImage2;

//The Image width & height.

int g_w;

int g_h;

size_t Luv_pitch;

SOM_SA *SOM_MAP;

StereoMapper *mapper;

Rectifier *rectifier;

//LineDetector *lineDetector;

size_t RGB_pitch;

int SOM_TRAINED;[/codebox]

[codebox]void initializeStereoCuda(unsigned int w, unsigned int h)

{

g_w = w;

g_h = h;

CUDA_SAFE_CALL(cudaMallocPitch((void**)&g_disparityLeft,&g_floatDispPitch,w*sizeof(float),h));

CUDA_SAFE_CALL(cudaMallocPitch((void**)&g_disparityLeft2,&g_floatDispPitch,w*sizeof(float),h));

CUDA_SAFE_CALL(cudaMallocPitch((void**)&g_minSSD, &g_floatDispPitch,w*sizeof(int),h));

g_floatDispPitch /= sizeof(float);

CUDA_SAFE_CALL(cudaMallocPitch((void**)&LuvImageLeft2,&Luv_pitch,g_w*sizeof(float3),g_h));

CUDA_SAFE_CALL(cudaMallocPitch((void**)&LuvImageRight2,&Luv_pitch,g_w*sizeof(float3),g_h));

CUDA_SAFE_CALL(cudaMallocPitch((void**)&LeftImage2,&RGB_pitch,g_w*sizeof(uchar3),g_h));

CUDA_SAFE_CALL(cudaMallocPitch((void**)&RightImage2,&RGB_pitch,g_w*sizeof(uchar3),g_h));

Luv_pitch = Luv_pitch/sizeof(float3);

RGB_pitch = RGB_pitch/sizeof(uchar3);

SOM_MAP = new SOM_SA(w,h);

rectifier = new Rectifier(w,h);

mapper = new StereoMapper(w,h);

initializeLineDetector(w,h);

SOM_TRAINED=0;

cudaChannelFormatDesc U8Tex = cudaCreateChannelDesc<unsigned char>();

cudaMallocArray(&g_leftTex_array, &U8Tex, g_w, g_h);

cudaMallocArray(&g_rightTex_array, &U8Tex, g_w, g_h);

print_GPU_mem();

}[/codebox]

Now, initializeVision() then starts a thread that keeps calling stereoProcess() which is in the cuda.cpp together with initializeStereoCuda():

stereoProcess():

[codebox]dim3 grid(1,1,1);

dim3 threads(16,16,1);

grid.x = divUp(g_w,threads.x);

grid.y = divUp(g_h,threads.y);

cudaError ret;

//if i don’t reallocate LuvImageLeft2 and LeftImage2 then cudaMemcpy will throw unspecified launch failure

// ret = cudaMallocPitch((void**)&LuvImageLeft2,&RGB_pitch,g_w*sizeof(float3),g_h);

// printf(“Error malloc luv: %d\n”, ret);

// ret = cudaMallocPitch((void**)&LeftImage2,&RGB_pitch,g_w*sizeof(uchar3),g_h);

// printf(“Error malloc leftimage: %d\n”, ret);

printf("image: %d\n", p_hostLeft[0]);

//RGB_pitch = RGB_pitch/sizeof(uchar3);

//Luv_pitch = Luv_pitch/sizeof(float3);

ret = cudaMemset(LeftImage2,0,g_w*sizeof(uchar3)*g_h);

printf("Error cudaMemset: %d\n", ret);

ret = cudaMemcpy(LeftImage2,p_hostLeft,g_w*sizeof(uchar3)*g_h,cuda

MemcpyHostToDevice);

printf("Error cudaMemcpy: %d\n", ret);

BGR_to_RGB<<<grid,threads>>>(LeftImage2,RGB_pitch,g_w,g_h);

//CUT_CHECK_ERROR("sasd");

cudaThreadSynchronize();

convertRGB_to_MLUV<<<grid,threads>>>(LeftImage2,LuvImageLeft2,RGB_pitch,Luv_pitch,g_w,g_h);

cudaThreadSynchronize();

unsigned char* temp_seg = (unsigned char*) malloc(g_w*sizeof(unsigned char)*g_h);

segmentLines(LuvImageLeft2,temp_seg);

detectLines(temp_seg,seg_image);

ret = cudaFree(LeftImage2);

printf("Error: %d\n", ret);

ret = cudaFree(LuvImageLeft2);

printf("Error: %d\n", ret);

//ret = cudaFree(LeftImage2);

//printf("Error: %d\n", ret);

//ret = cudaFree(LuvImageLeft2);

//printf("Error: %d\n", ret);

CUT_CHECK_ERROR("asd");[/codebox]

I don’t understand why I need to reallocate that memory. The same thing happens in segmentLines() function which also uses memory allocated by initializeLineDetector();

Also, I noticed that the address of LeftImage2 changes from when it is initialized to when it used again in stereoProcess(); ???

PS. I had my cuda functions in a class but had the same problem

Thnx!

tmurray · May 24, 2010, 3:52pm

Because all CUDA functions operate inside of a context, which is bound per thread in the runtime API.

tmurray · May 24, 2010, 3:52pm

Because all CUDA functions operate inside of a context, which is bound per thread in the runtime API.

muzhikas · May 24, 2010, 4:24pm

Which means?? :)

Also, I figured out that if I remove device from my variables dont need to reallocate those variables. Adding device changed the address from when it was allocated to when it was used, weird. So now I have another error when I try to cudaMemcpy, I get 11, Invalid argument.

[codebox]ret = cudaMemcpy(LeftImage2,p_hostLeft,g_w*sizeof(uchar3)*g_h,cuda

MemcpyHostToDevice);

printf("Error cudaMemcpy: %d\n", ret);[/codebox]

Any ideas?

muzhikas · May 24, 2010, 4:24pm

Which means?? :)

Also, I figured out that if I remove device from my variables dont need to reallocate those variables. Adding device changed the address from when it was allocated to when it was used, weird. So now I have another error when I try to cudaMemcpy, I get 11, Invalid argument.

[codebox]ret = cudaMemcpy(LeftImage2,p_hostLeft,g_w*sizeof(uchar3)*g_h,cuda

MemcpyHostToDevice);

printf("Error cudaMemcpy: %d\n", ret);[/codebox]

Any ideas?

muzhikas · May 24, 2010, 4:41pm

Which means?? :)

Also, I figured out that if I remove device from my variables dont need to reallocate those variables. Adding device changed the address from when it was allocated to when it was used, weird. So now I have another error when I try to cudaMemcpy, I get 11, Invalid argument.

[codebox]ret = cudaMemcpy(LeftImage2,p_hostLeft,g_w*sizeof(uchar3)*g_h,cuda

MemcpyHostToDevice);
printf("Error cudaMemcpy: %d\n", ret);[/codebox]
Any ideas?

If I do reallocate LeftImage2 then the error goes away…

muzhikas · May 24, 2010, 4:41pm

Which means?? :)

Also, I figured out that if I remove device from my variables dont need to reallocate those variables. Adding device changed the address from when it was allocated to when it was used, weird. So now I have another error when I try to cudaMemcpy, I get 11, Invalid argument.

[codebox]ret = cudaMemcpy(LeftImage2,p_hostLeft,g_w*sizeof(uchar3)*g_h,cuda

MemcpyHostToDevice);
printf("Error cudaMemcpy: %d\n", ret);[/codebox]
Any ideas?

If I do reallocate LeftImage2 then the error goes away…

avidday · May 24, 2010, 5:09pm

It means you should probably (re)read Chapter 3 of the programming guide. All CUDA runtime API resource allocations and operations must be used only inside the thread that created them. You can’t allocate resources in one host thread with the runtime API and then expect them to be valid in another thread, because the context and all the resources are tied to the thread where the allocation was done.

avidday · May 24, 2010, 5:09pm

It means you should probably (re)read Chapter 3 of the programming guide. All CUDA runtime API resource allocations and operations must be used only inside the thread that created them. You can’t allocate resources in one host thread with the runtime API and then expect them to be valid in another thread, because the context and all the resources are tied to the thread where the allocation was done.

muzhikas · May 24, 2010, 5:22pm

Thanks alot. it was exactly that. As soon as I called processStereo outside of the thread it worked like a charm. I will refactor my code to allocate and run in the same thread. Thanks again.

muzhikas · May 24, 2010, 5:22pm

Thanks alot. it was exactly that. As soon as I called processStereo outside of the thread it worked like a charm. I will refactor my code to allocate and run in the same thread. Thanks again.

Topic		Replies	Views
Contexts and cudaMallocHost Same rules? CUDA Programming and Performance	17	11288	November 15, 2008
CUDA + CPU threads CUDA Programming and Performance	5	11715	August 20, 2008
more than one Kernel functions in CUDA application CUDA Programming and Performance	8	7265	May 21, 2009
cudaMalloc and threads "invalid device pointer" error CUDA Programming and Performance	4	5461	June 26, 2007
Multi-thread memset and memcpy CUDA Programming and Performance	2	3089	June 8, 2009
MultiGPU start help CUDA Programming and Performance	8	10547	August 10, 2010
cudaMalloc error in big loop CUDA Programming and Performance	12	15653	May 21, 2008
cudaMalloc and sharing between CPU threads CUDA Programming and Performance	0	4364	May 20, 2009
cudaMalloc, cudaFree from different threads CUDA Programming and Performance	6	11001	August 27, 2007
Reporting a problem with CUDA memory access in multiple OS threads CUDA Programming and Performance	4	4927	April 30, 2007

cudaMalloced memory cannot be used in other functions memory managment

Related topics