Memcpy thrust vector to cuda mapped opengl vertex buffer?

Greetings All,

I am still learning the ropes with cuda and with parallel thinking in general so apologize if this is common know how. Currently, I have a huge vertex buffer(opengl) that is filled via cuda kernels; marching cubes to be precise. All is great except that I have millions of vertices so I am trying to simplify them where the source array is on the gpu. My question is how does one copy a thrust vector to a mapped vertex buffer: possible?

My snippet of the process:

//vertexArray created with GL_FLOAT, 3, cudaGraphicsMapFlagsWriteDiscard, GL_STREAM_DRAW
        pangolin::CudaScopedMappedPtr mappedVertexArray(*vertexArray);

        // ... generate mesh ....

        std::cout << "Original vertex count: " << totalVertexCount << std::endl;

        // pointer to hold cuda mapped vertex array
        float3* devPtr;
        size_t size;

        // set the pointer
        cudaGraphicsResourceGetMappedPointer( (void**)&devPtr, &size, mappedVertexArray.res);

        // create host vector from cuda pointer
        thrust::device_ptr<float3> thrustPtr = thrust::device_pointer_cast(devPtr);
        thrust::device_ptr<thrust::tuple<float,float,float>> d_mcVertexTupleArr((thrust::tuple<float,float,float>*)thrust::raw_pointer_cast(thrustPtr));

        //create working list of verts from
        thrust::device_vector<thrust::tuple<float,float,float>> d_verticesToCompact(d_mcVertexTupleArr, d_mcVertexTupleArr + totalVertexCount);

        // sort
        thrust::sort(d_verticesToCompact.begin(), d_verticesToCompact.end());

        //delete duplicates
        d_verticesToCompact.erase(thrust::unique(d_verticesToCompact.begin(), d_verticesToCompact.end()), d_verticesToCompact.end());

        //get indices
        thrust::lower_bound(d_verticesToCompact.begin(), d_verticesToCompact.end(), d_mcVertexTupleArr, d_mcVertexTupleArr + totalVertexCount, d_mcIndexArr->begin());

        thrust::device_ptr<core::float3> compactedVertices((core::float3*)thrust::raw_pointer_cast(;

        //how to copy to vertex buffer?
        //d_mcVertexArr->assign(compactedVertices, compactedVertices + d_verticesToCompact.size());

        totalVertexCount = d_verticesToCompact.size();

    std::cout << "Final vertex count: " <<  totalVertexCount << std::endl;

you already have devPtr, that is the destination pointer
get the raw pointer from the thrust pointer or vector that holds the source data, I think it is this puppy:

float3 *rawPtr = thrust::raw_pointer_cast(;


cudaMemcpy(devPtr, rawPtr, d_verticesToCompact.size()*sizeof(float3), cudaMemcpyDeviceToDevice);

You should also be able to use thrust::copy

thrust::copy_n(d_verticesToCompact.begin(), d_verticesToCompact.size(), thrustPtr);

Thanks for the response txbob, thrust::copy_n is surely handy. The first case makes sense to me but I am curious about the usage of copy though, can thrust understand the translation between a list of tuple<float, float, float> and a float*? The vertex buffer has no interleaving so it is just a big array of floats.

I typed this stuff into the browser, my code is not tested, and if you need to do some data swizzling during the copy, then neither approach will do that. You should organize the data in your thrust buffer to be in exactly the same order as the data order you want in the destination buffer.

If you do that, the cudaMemcpy should certainly work.

I had not paid close attention to the usage of the thrust tuple <float,float,float> instead of float3. I’m not sure if that would compile, and would perhaps need to play with it. You appear to be short circuiting thrust by doing pointer casting. Not sure why, what you’re up to with that, or whether it would work or not. If your pointer casting between float3 and tuple vectors/device pointers works (i.e. gives the right result) then the cudaMemcpy method should work, and you could probably work some similar ugliness to make the copy_n method work.

I was assuming these examples would move you in the right direction. Perhaps not.

I generally don’t advertise my responses as tested code unless OP provides a MCVE, which you have not, or unless it’s obvious from my example that it represents a fully worked test case, which my answer does not.