my kernel function allocates a large local array for each thread. Before calling this kernel i can allocate 2.5 GB array in global memory, but after calling this kernel, it seems that i lost at least 128MB global memory (can not allocate 2.5GB).
These local arrays are private to threads, so they should be cleared after the kernel finishes. It seems to me that this kernel should not result in any global memory consumption when it is DONE. can any one try it and see why is that?
my kernel function allocates a large local array for each thread. Before calling this kernel i can allocate 2.5 GB array in global memory, but after calling this kernel, it seems that i lost at least 128MB global memory (can not allocate 2.5GB).
These local arrays are private to threads, so they should be cleared after the kernel finishes. It seems to me that this kernel should not result in any global memory consumption when it is DONE. can any one try it and see why is that?