NVIDIA Rapids apply_rows

Hi - I am trying to adapt the following example from the Rapids CUDF page:
My question is, how would I specify the type for an object whose type is:

type(kde)
<class ‘sklearn.neighbors.kde.KernelDensity’>

so that it works with the function call: (the bolded types need to be changed so that it can represent the KernelDensity function and be recognizable to numba)

df.apply_rows(kernel,
… incols=[‘in1’, ‘in2’, ‘in3’],
… outcols=dict(out1=np.float64, out2=np.float64),
… kwargs=dict(kwarg1=3, kwarg2=4))

My error message looks like this:
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f4eaea55a90>) with argument(s) of type(s): (array(float64, 1d, A), array(int64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), float64, float64)

  • parameterized
    In definition 0:
    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    Untyped global name ‘KernelDensity’: cannot determine Numba type of <class ‘type’>

This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f4eaea55a90>)
[2] During: typing of call at (11)

File “”, line 11:

Examples

The user function should loop over the columns and set the output for each row. Loop execution order is arbitrary, so each iteration of the loop MUST be independent of each other.

When func is invoked, the array args corresponding to the input/output are strided so as to improve GPU parallelism. The loop in the function resembles serial code, but executes concurrently in multiple threads.

import cudf
import numpy as np
df = cudf.DataFrame()
nelem = 3
df[‘in1’] = np.arange(nelem)
df[‘in2’] = np.arange(nelem)
df[‘in3’] = np.arange(nelem)
Define input columns for the kernel

in1 = df[‘in1’]
in2 = df[‘in2’]
in3 = df[‘in3’]
def kernel(in1, in2, in3, out1, out2, kwarg1, kwarg2):
… for i, (x, y, z) in enumerate(zip(in1, in2, in3)):
… out1[i] = kwarg2 * x - kwarg1 * y
… out2[i] = y - kwarg1 * z
Call .apply_rows with the name of the input columns, the name and dtype of the output columns, and, optionally, a dict of extra arguments.

df.apply_rows(kernel,
… incols=[‘in1’, ‘in2’, ‘in3’],
… outcols=dict(out1=np.float64, out2=np.float64),
… kwargs=dict(kwarg1=3, kwarg2=4))
in1 in2 in3 out1 out2
0 0 0 0 0.0 0.0
1 1 1 1 1.0 -2.0
2 2 2 2 2.0 -4.0

Were you able to resolve it? I keep getting this error when I do compute on dask cudf. I get no error while running apply_rows itself.

You should ask in RAPIDS-GoAi Slack

1 Like

Thank you so much. Needed this desperately.

1 Like