Pinned memory and std::vector

mikeheck · September 11, 2009, 11:10pm

Pinned memory is wonderful. For large data sets it reduces the data transfer from being the only significant bottleneck to merely a big bottleneck. :-)
That’s all well and good for a “lab” demonstration/benchmark where we can assume the data is optimally arranged for GPU computing. What happens when we want to integrate this with production code and the application’s data is stored in, for example, an std::vector?

The gap between application memory and pinned memory is a general problem. Obviously we can allocate some pinned memory, copy the data and download from the pinned memory. Conveniently we can get a float* ptr from a vector, so at least the copy is efficient in this case. But the copy is still “lost time” and we may not have spare memory to make another copy of the data.

A general solution to the general problem is to modify the application to (optionally) use pinned memory for data that we’ll need to move to the GPU. Whether that makes sense is a whole discussion in itself. The question here is - is it even feasible if the application uses std::vector?

It looks like (in theory) STL should allow this by implementing a custom “allocator” class, but it also looks complex and tricky to get right. Has anyone tried it?

Thanks,
Mike

JaredHoberock · September 11, 2009, 11:43pm

Pinned memory is wonderful. For large data sets it reduces the data transfer from being the only significant bottleneck to merely a big bottleneck. :-)

That’s all well and good for a “lab” demonstration/benchmark where we can assume the data is optimally arranged for GPU computing. What happens when we want to integrate this with production code and the application’s data is stored in, for example, an std::vector?

The gap between application memory and pinned memory is a general problem. Obviously we can allocate some pinned memory, copy the data and download from the pinned memory. Conveniently we can get a float* ptr from a vector, so at least the copy is efficient in this case. But the copy is still “lost time” and we may not have spare memory to make another copy of the data.

A general solution to the general problem is to modify the application to (optionally) use pinned memory for data that we’ll need to move to the GPU. Whether that makes sense is a whole discussion in itself. The question here is - is it even feasible if the application uses std::vector?

It looks like (in theory) STL should allow this by implementing a custom “allocator” class, but it also looks complex and tricky to get right. Has anyone tried it?

Thanks,

Mike

Yes, there’s a pinned memory allocator in Thrust. Try using host_vector with pinned_allocator. I haven’t tried the allocator with std::vector, but I imagine it will work.

You can see this thread for details.

mikeheck · September 11, 2009, 11:50pm

Sweet! Thrust sounds awesome. It’s already on my list of things to investigate after optimizing the current CUDA kernel.

-Mike

Topic		Replies	Views
Failing to allocate large arrays with pinned memory nvc, nvc++ and nvfortran	3	670	February 5, 2024
Passing thurst vector into kernel and pushing data into vector CUDA Programming and Performance	8	7836	January 2, 2018
Pinned Memory Allocation Why should it be driver specific? CUDA Programming and Performance	8	3225	September 1, 2009
Subset of STL containers that permit read/write access from CUDA CUDA Programming and Performance	0	497	September 29, 2021
[Thrust] is there a managed_vector? With Unified Memory do we still only have device_vector? [Cuda Thrust Managed Vectors] GPU-Accelerated Libraries	4	4488	June 23, 2020
C++ support for STL containers in device code and memory CUDA Programming and Performance	11	14160	December 11, 2010
Does thrust::device_vector::resize() cause reallocation when resizing to a smaller size? CUDA Programming and Performance	1	562	December 2, 2022
CUDA Example with C++ vector list? CUDA Programming and Performance	6	692	March 12, 2024
check for cudaHostAlloc Portable possibility CUDA Programming and Performance	13	2767	July 1, 2015
Unified Memory vs Pinned Host Memory vs GPU Global Memory CUDA Programming and Performance	9	8406	June 1, 2022

Pinned memory and std::vector

Related topics