Understanding nsys profile results with thrust

I am trying to optimize my code using nsys profile and some “events” are hard to understand. In particular I have this event that appears many times and takes most of the load:

void thrust::cuda_cub::core::_kernel_agent<
    thrust::cuda_cub::__parallel_for::ParallelForAgent<
        thrust::cuda_cub::__transform::unary_transform_f<
            thrust::permutation_iterator<
                thrust::detail::normal_iterator<thrust::device_ptr<double>>,
                thrust::detail::normal_iterator<thrust::device_ptr<int>>
                >,
            thrust::detail::normal_iterator<thrust::device_ptr<double>>,
            thrust::cuda_cub::__transform::no_stencil_tag,
            thrust::cuda_cub::identity,
            thrust::cuda_cub::__transform::always_true_predicate
            > long
        >,
    thrust::cuda_cub::__transform::unary_transform_f<
        thrust::permutation_iterator<
            thrust::detail::normal_iterator<thrust::device_ptr<double>>,
            thrust::detail::normal_iterator<thrust::device_ptr<int>>
            >,
        thrust::detail::normal_iterator<thrust::device_ptr<double>>,
        thrust::cuda_cub::__transform::no_stencil_tag,
        thrust::cuda_cub::identity,
        thrust::cuda_cub::__transform::always_true_predicate
        >,
    long
    >(T2, T3)

It seems that it corresponds to a unary transform without a predicate, and with an operation being identity. Thus it is doing nothing!

Is there any way to know where this was called from? Or what it is doing exactly?

I guess by event you mean kernel launch.

One possible method to determine where a kernel launch is in your code is using nsys nvprof --print-gpu-trace …

If that still is confusing, put something that you can easily recognize as a unique “marker” in your source code, in order to identify the line of thrust code that is giving rise to a particular kernel launch. A marker could be a dummy, small kernel launch, with a kernel name that you easily recognize and is unique.

A similar, more flexible method, would be to use the GUI output from nsys. That alone might be enough, with a graphical timeline to inspect, or else combine that with a wide range of possibilities for markers (such as nvtx, or the dummy kernel or really any dummy CUDA api call that you can easily recognize).

Rather than trying to deduce the thrust intent from this kind of kernel prototype, I would start by identifying that thrust call that gave rise to the kernel. That is a far easier method to connect the dots and deduce intent, IMO.

Also, thrust is open source.

1 Like

Thank you for the tips. There were actually other kernel launches that were recognizable as “markers”. It turns out that this particular kernel (see the initial post) was a call to thrust::gather. It makes sense as it can be seen as some sort of permutation, without predicate, without unary operation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.