I am trying to optimize my code using nsys profile
and some “events” are hard to understand. In particular I have this event that appears many times and takes most of the load:
void thrust::cuda_cub::core::_kernel_agent<
thrust::cuda_cub::__parallel_for::ParallelForAgent<
thrust::cuda_cub::__transform::unary_transform_f<
thrust::permutation_iterator<
thrust::detail::normal_iterator<thrust::device_ptr<double>>,
thrust::detail::normal_iterator<thrust::device_ptr<int>>
>,
thrust::detail::normal_iterator<thrust::device_ptr<double>>,
thrust::cuda_cub::__transform::no_stencil_tag,
thrust::cuda_cub::identity,
thrust::cuda_cub::__transform::always_true_predicate
> long
>,
thrust::cuda_cub::__transform::unary_transform_f<
thrust::permutation_iterator<
thrust::detail::normal_iterator<thrust::device_ptr<double>>,
thrust::detail::normal_iterator<thrust::device_ptr<int>>
>,
thrust::detail::normal_iterator<thrust::device_ptr<double>>,
thrust::cuda_cub::__transform::no_stencil_tag,
thrust::cuda_cub::identity,
thrust::cuda_cub::__transform::always_true_predicate
>,
long
>(T2, T3)
It seems that it corresponds to a unary transform without a predicate, and with an operation being identity. Thus it is doing nothing!
Is there any way to know where this was called from? Or what it is doing exactly?