cudaMallocHost and alignment question

Maddy_Scientist · January 20, 2009, 1:48pm

Hi,

I’ve noticed that a significant bottleneck in my application is the packing and unpacking of the data to be sent and received on the CPU side of things. It is not the transfer over the PCIe bus that is the source of the bottleneck, rather the actual reformatting that takes place when copying to / from a CPU array, and a GPU formatted array held in pinned memory (from where it is then copied to / from the GPU device memory).

I intend to accelerate my packing / unpacking routines by rewriting them using SSE intrinsics (float4 manipulation seems especially well suited to this). However, for this to work I require that the pinned memory is 16 byte aligned. So my question is, is memory allocated using cudaMallocHost 16 byte aligned? If not, then how can ensure that it is so?

Cheers.

AlfredDube · January 20, 2009, 5:09pm

I believe that page-locked memory will be aligned to boundaries of 4 kB pages (that’s what I’ve seen to far), so you should be safe if your code requires alignment to 16 bytes.

Topic		Replies	Views
ignore this thread, `cudaMallocHost` appears to work as expected CUDA Programming and Performance	4	904	December 28, 2017
Alignment of memory returned by cuMemAllocHost? CUDA Programming and Performance cuda	1	437	May 7, 2020
Low performance for CPU accessing page-locked memory? CUDA Programming and Performance	3	605	March 7, 2019
Difference between host cudaMalloc() and kernel malloc() CUDA Programming and Performance	1	12579	April 21, 2011
cuda unify and memory alignement for CPU CUDA Programming and Performance	2	1130	November 21, 2016
cudaMallocHost CUDA Programming and Performance	3	2787	June 8, 2011
Is cudaHostAlloc() fast? CUDA Programming and Performance	5	549	March 28, 2024
Fast processing of large amounts of pinned memory CUDA Programming and Performance	2	714	August 29, 2017
Using cudaHostRegister() in CUDA 4.0 CUDA 4.0 CUDA Programming and Performance	16	30223	January 25, 2018
Why is cudaMallocHost() so slow? CUDA Programming and Performance	7	8848	November 17, 2021

cudaMallocHost and alignment question

Related topics