Cdp_simple_quicksort made the Cuda-context consumed 50MB more...why?and what's the best way to sort in CUDA?

cdp_simple_quicksort function from here made the Cuda-context consumed 50MB more than not compiled cdp_simple_quicksort…WHY 50MB so much?
https://github.com/NVIDIA/cuda-samples/blob/master/Samples/3_CUDA_Features/cdpSimpleQuicksort/cdpSimpleQuicksort.cu

and what’s the best way to sort in CUDA?

Thanks!

anyone help?

Many things can affect the size of the CUDA context. It may well have to do with new kernel code loaded. And CDP almost certainly has some overhead.

Since there is no specification for context size, as well as no tools provided by NVIDIA to inspect the details of what is in the context, an exact answer to your questions about context size cannot be given.

with no other context, my suggestion would be to use cub, or thrust. Sorting is a difficult enough problem that I would advise most folks not to “roll your own” but instead use a library implementation written by experts.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.