Dynamically freeing memory

Hi,

Hopefully someone can help me with this. I am trying to allocated and deallocate arrays in my code. In some cases it works fine, but not always. Here is a excerpt:

Call timing_start(time_bucket_start)
Call Bounds(x_d,Bound_d,nTotal,nCell_d,hsml_d)
Call BuildCellIndex<<<grid,tBlock>>>(x_d,hsml_d,nTotal,CellIndex_d,Bound_d,nCell_d)	
maxIndex = nCell_d(1)*nCell_d(2)*nCell_d(3)
Allocate(CellList_d(maxIndex,NumCLRows),NodesInCell_d(maxIndex))
NodesInCell_d = 0	
Call BuildCellList<<<grid,tBlock>>>(nTotal,CellIndex_d,CellList_d,maxIndex,NodesInCell_d)	
Call GPUtimeStart(startEvent1)								
Call BucketSearch<<<grid,tBlock>>>&
(x_d,hsml_d,CellList_d,nCell_d,CellIndex_d,NodesInCell_d,Neib_d,NeibCount_d,w_d,dwdx_d,nTotal,maxIndex)	
Call GPUtime(startEvent1,stopEvent1,t_bucket) 
If (Allocated(NodesInCell_d)) Deallocate(NodesInCell_d)
If (Allocated(CellList_d)) Deallocate(CellList_d)	
Call timing_end(time_bucket_start,time_bucket)

The general idea is to lay a cell grid out and bin particles into the cells. I want to be able to accommodate growing and shrinking domains by dynamically allocating and deallocating.

I have been successfully running the code for a few simple test problems. But I just found a case that caused the program to exit with an error message:

0: Deallocate: Unspecified launch failure

I am using PVF 13.10.

Thanks for any help,

Kirk

I should mention that this bit of code is part of a program. The block of code is part of a loop (time stepping).

The particular error shows up after ~2800 time steps.

Do

Call timing_start(time_bucket_start)
Call Bounds(x_d,Bound_d,nTotal,nCell_d,hsml_d)
Call BuildCellIndex<<<grid,tBlock>>>     (x_d,hsml_d,nTotal,CellIndex_d,Bound_d,nCell_d)	
maxIndex = nCell_d(1)*nCell_d(2)*nCell_d(3)
Allocate(CellList_d(maxIndex,NumCLRows),NodesInCell_d(maxIndex))
NodesInCell_d = 0	
Call BuildCellList<<<grid,tBlock>>>(nTotal,CellIndex_d,CellList_d,maxIndex,NodesInCell_d)	
Call GPUtimeStart(startEvent1)								
Call BucketSearch<<<grid,tBlock>>>&
(x_d,hsml_d,CellList_d,nCell_d,CellIndex_d,NodesInCell_d,Neib_d,NeibCount_d,w_d,dwdx_d,nTotal,maxIndex)	
Call GPUtime(startEvent1,stopEvent1,t_bucket) 
If (Allocated(NodesInCell_d)) Deallocate(NodesInCell_d)
If (Allocated(CellList_d)) Deallocate(CellList_d)	
Call timing_end(time_bucket_start,time_bucket)

... More code in loop 

If (time .ge. t_end) Exit

time = time + dtMin
itimestep = itimestep + 1

End Do

Thank you for any help,

Kirk

Hi Kirk,

It sounds like things are getting out of sync or there’s an issue with the driver allocating and deallocating that many times.

Can you try adding a call cudaDeviceSynchronize after BucketSearch? I’m not sure if your GPUtime call syncs the kernel and host before deallocating.


Call BucketSearch<<<grid,tBlock>>>& 
 (x_d,hsml_d,CellList_d,nCell_d,CellIndex_d,NodesInCell_d,Neib_d,NeibCount_d,w_d,dwdx_d,nTotal,maxIndex)    
 call cudaDeviceSynchronize()
 Call GPUtime(startEvent1,stopEvent1,t_bucket)

If that doesn’t work, I’ll try writing a test case that replicates the issue.

  • Mat

I tried with

Call cudaDeviceSynchronize()

but that resulted in a compile error.

I then tried

istat = cudaDeviceSynchronize()

That ran, but again crashed at time cycle ~2800

Kirk

Oops, sorry. My bad about the syntax.

Would you mind send you a reproducing example to PGI Customer Service (trs@pgroup.com) and ask them to send it to me? If it’s just a matter of calling allocate and deallocate 2800 times, then I can recreate that, but in case it’s something, I’d like to start with your known failing case.

Thanks,
Mat

I think that I found the problem, I don’t think that it was related to the dynamics allocation after all.

I will send an example along if I run into the problem again. But for no, it seems to be OK.

Thank you,

Kirk