CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python

It may be that you’re not compiling for the correct GPU type.

If the code starts working when you change N to 32000:

N = 32000

it would tend to confirm this theory.

what gpu do you have? (check_cuda should report this)