It is my first post here so I will be grateful for your help.
My problem is as follows:
I have small production code - written in pandas and rewritten to GPU in CUDF library.
Idea of code is to load small data frame, then make some operations on it (add data from one small data frame and remove from another small data frame) and return the same data frame but with changes applied. It is wrapped in simple FOR loop (logic of introduced changes is the same for each iteration).
Here is some pseudo code describing whole concept:
import cudf, pandas, gc, torch # on the very beginning - reading dataset to variable my_data (around 200MB file) my_data = cudf.read_csv(...) # starting simple loop - 10 examplary iterations for i in range(1, 11): # read some data (very small amount - in KBs) from external db and transforming it into cudf dataframe (2x) - from first df we add rows to my_data, secondly we remove all rows from my_data which are in second df first_data = pd.read_sql_query(...) first_data_cudf = cudf.from_pandas(first_data) my_data_after_concat = cudf.concat([my_data, first_data]) second_data = pd.read_sql_query(...) second_data_cudf = cudf.from_pandas(second_data) my_data = my_data_after_concat.loc[~my_data_after_concat.index.isin(second_data_cudf.index)] # removing unused variables, running garbage collector and removing cache del my_data_after_concat del first_data_cudf del second_data_cudf gc.collect() torch.cuda.empty_cache()
Now, let’s focus on my problem with this algorithm.
I have 6GB of vram in my nvidia gpu card (5,7GB is free all time, rest is used for system usages) and I want to run this algorithm in a simple for loop. First iteration goes well, but in the second I get this error:
MemoryError: std::bad_alloc: CUDA error at: /home/softwarehouse/miniconda3/envs/rapids-0.17/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory
When I observe nvidia-smi I see rapid growth of memory usage and whole code really reaches limit of 6GB and then corrupt. So first thought can be that 6GB of vram is too small to handle this code and I should buy sth with bigger GPU vram.
But is it normal behavior for you? I read 200MB file (still around 5,5GB is free at that moment), I make some operations on this file lined with two additional (but very very small) files, then I free each possible variable to cleanup before next iteration. In my intuition, this size of file and presented operations shouldn’t take this amount of memory. What are your opinions ? Can you help me with this problem ? Someone has maybe the same one ?
Best regards for you,