My question is how to efficiently merge two vectors alternating indexes in a batch using CUDA thrust.
Toy example:
For two device vectors, vecA=[1, 2, 3, 4, 5, 6, 7, 8, 9]
and vecB=[1, 1, 1, 2, 2, 2, 3, 3, 3]
.
Given batch size len=3
, combine vecA
and vecB
into a new vector vecC
.
vecC=[{1, 2, 3}, {1, 1, 1}, {4, 5, 6}, {2, 2, 2}, {7, 8, 9}, {3, 3, 3}]
.
{}
is only for explanation, vecC
is a 1-dimensional device vector.
I’m new to CUDA programming, could you provide the code so that I can test the performance?