Following query() on the cuDataFrame hits 2-3% GPU utilization. I am interested in running multiple such queries in parallel.
Is there a way to run 10-20 such queries on the same dataframe in parallel using cuDF?
NUM_ELEMENTS = 100000
df = cudf.DataFrame()
df['value1'] = cp.random.sample(NUM_ELEMENTS)
df['value2'] = cp.random.sample(NUM_ELEMENTS)
df['value3'] = cp.random.sample(NUM_ELEMENTS)
c1 = np.random.random()
c2 = np.random.random()
c3 = np.random.random()
res = df.query('((value1 < @c1) & (value2 > @c2) & (value3 < @c3))')
I had to look up cuDF, hadn’t heard of it before. Best I can tell, this product is neither provided nor supported by NVIDIA.
The authors / the vendor of cuDF should be able to answer your question. Have you tried their support channel (I am assuming some sort of forum or mailing list)?
[Later:] This open-source project appears to be hosted on GitHub: GitHub - rapidsai/cudf: cuDF - GPU DataFrame Library . While I see hundreds of issues filed, I don’t see a general support venue linked, but I only took a cursory look.
cuDF is part of RAPIDS
Yes, its possible to run multiple cuDF activities at the same time. One method that RAPIDS employs to do this is via DASK.