Weird malloc problem

BROSE · August 7, 2009, 3:43pm

Hi guys,

So I am having an issue with the order in which I malloc my variables and being able to access them. Specifically for a code like:

[codebox] cudaMalloc((void**)&kxihlG,nElkxihl*sizeof(float));

    cudaMalloc((void**)&kxihrG,nElkxihr*sizeof(float));

    cudaMemcpy(kxihlG, kxihl, nElkxihl*sizeof(float), cudaMemcpyHostToDevice);

    cudaMemcpy(kxihrG, kxihr, nElkxihr*sizeof(float), cudaMemcpyHostToDevice);

.

. lots of variables

cudaMalloc((void**)&TARGETVAR,nElTARGETVAR*sizeof(float));

    cudaMemcpy(TARGETVARG, TARGETVAR, nElTARGETVAR*sizeof(float), cudaMemcpyHostToDevice);

      test<<<1,1>>>(TARGETVARG);

      Error = cudaThreadSynchronize();

      fprintf(stderr,"@TEST1 Error = %d \n",Error);[/codebox]

if I run it like this cudaThreadSynchronize returns a failure of ‘4’, but if move the cudaMalloc for TARGETVAR to the top of the list then the test<<<>>> kernel runs succesfuly:

THIS WORKS

[codebox]

cudaMalloc((void**)&TARGETVAR,nElTARGETVAR*sizeof(float));

cudaMalloc((void**)&kxihlG,nElkxihl*sizeof(float));

    cudaMalloc((void**)&kxihrG,nElkxihr*sizeof(float));

    cudaMemcpy(kxihlG, kxihl, nElkxihl*sizeof(float), cudaMemcpyHostToDevice);

    cudaMemcpy(kxihrG, kxihr, nElkxihr*sizeof(float), cudaMemcpyHostToDevice);

.

. lots of variables

cudaMemcpy(TARGETVARG, TARGETVAR, nElTARGETVAR*sizeof(float), cudaMemcpyHostToDevice);

      test<<<1,1>>>(TARGETVARG);

      Error = cudaThreadSynchronize();

      fprintf(stderr,"@TEST1 Error = %d \n",Error);[/codebox]

Can anyone explain this? I dont think the machine is full? Do I have to put some delay in? why does the order matter as long as malloc for a given variable is before the memcpy for it?

How can this be fixed because this problem is occuring elsewhere with other variable? Thank you for your time!

Quoc_Vinh · August 8, 2009, 3:24am

Did you think something happens with TARGETVAR?

why dont you debug the error of that statement?
Error = cudaMalloc((void**)&TARGETVAR,nElTARGETVAR*sizeof(float));
printf(“CUDA Error: %s\n”, cudaGetErrorString(Error));

SPWorley · August 8, 2009, 2:44pm

If nElTARGETVAR is especially large, then it could be a classic problem of memory address space packing. It’s harder to allocate a contiguous large chunk than smaller chunks (which can fit into “cracks” better). This isn’t a GPU problem, it happens all the time on the CPU too.

Classic answer: always allocate from large to small. Do this as a reflex in all your coding on GPU and CPU.
When this fails, the next strategy is usually to start being fancier with your memory allocator, reusing old mallocs, etc, and/or changing your algorithm to need less large contiguous blocks.

Topic		Replies	Views
Memory allocation : strange behavior CUDA Programming and Performance	4	2569	March 4, 2008
cudaMalloc issue CUDA Programming and Performance	4	884	January 23, 2018
Weird behaviour with cudaMalloc CUDA Programming and Performance	2	3029	January 15, 2010
Multi-GPU Memory Allocation behaves differently with different order of allocation CUDA Programming and Performance	1	780	June 15, 2013
Not working correctly new () and malloc () inside the kernel, why? CUDA Programming and Performance	2	1267	April 4, 2014
Problem with cudaMalloc CUDA Programming and Performance	4	10114	October 29, 2008
What happens to cudamalloc() + atomicAdd()? CUDA Programming and Performance	4	415	December 29, 2021
Is it thread-safe to malloc in threads of a kernel function? CUDA Programming and Performance	7	2362	December 8, 2017
Memory allocation problems on device CUDA Programming and Performance	0	2687	March 15, 2007
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37744	April 23, 2009

Weird malloc problem

Related topics