Yes, all other things being equal, it is more efficient to call one kernel than to call several.

For the most part, a call to a thrust algorithm will result in a kernel call.

The canonical advice would be to use fusion of operations. This can’t be done ad infinitum or in any arbitrary case, but it will allow, primarily through the use of thrust fancy iterators, to fuse multiple operations into a single thrust algorithm, thus resulting in accomplishing those operations in a single thrust kernel call.

For example, suppose I had a reduction operation where I wanted to sum the squares of every element of an array.

I could realize this naively as a thrust transform (to square each element) followed by a thrust::reduce (to sum each of the previously squared elements).

Using a transform iterator, I can pass a transform iterator to the thrust::reduce operation, that will square the elements as they are being reduced.

Since combinations of transforms followed by reductions are so common, this *particular* use-case is simplified for the thrust programmer by provision of the complex algorithm thrust::transform_reduce.

There are whole presentations on thrust fusion of operations. Take a look at the presentations that are available linked from the thrust github site.

This presentation:

http://its.unc.edu/files/2014/11/UNC05_An-Introduction-to-the-Thrust-Parallel-Algorithms-Library.pdf

begins to discuss thrust “best practices” including fusion around slide 23.