Is cuda::std::aligned_storage aligned at give alignment?

Hi experts, I met a crash with my self-written kernel. After some debugging I found that a cuda::std::aligned_storage<24, 8> which I supposed to be aligned at 8 bytes, actually got an alignment of 1.
So I wrote below simple kernel to verify my assumption, like

#include <cuda/std/atomic>
#include <stdio.h>

struct S2 {
struct alignas(8) S { 
  unsigned char v[8];
__global__ void kernel() {
  __shared__ int a;
  __shared__ std::aligned_storage<24, 8> buf;
  printf("%lu\n", alignof(buf));
  __shared__ S2 ss; 
  printf("%lu\n", alignof(ss));

int main() {
  kernel<<<1, 1>>>();
  return 0;

I was getting output of 1, 1. So what’s the issue here?

when posting code on these forums, please format the code properly. You can edit your post by clicking the pencil icon beneath it, then select the code, then click the </> button at the top of the edit window, then save your changes.

Your question in fact has multiple layers. I’ll try to address them one by one.

1. Nested class
The name of a nested class is in the scope of the enclosing class. This is the only thing important about the nested class. It does not change the memory layout or alignment requirement of the enclosing class. For example:

struct X1 {
    int p;
    struct A {
        double q;

sizeof(X1) is 4 and alignof(X1) is 4.

However, this is different:

struct X2 {
    int p;
    struct A {
        double q;
    } a; // --> Note this part

Now that A a is a data member of X2, sizeof(X2) is 16 and alignof(X2) is 8.

2. Empty class size
The size of any object or member subobject is required to be at least 1 even if the type is an empty class type. This means:

struct X3 {

Both sizeof(X3) and alignof(X3) are 1. This is actually what’s happened to your second printf: S2 is an empty class (with S being its nested class), so its alignment requirement is 1.

3. std::aligned_storage
I’m not an expert in libcu++, but from the standpoint of C++, the way you use std::aligned_storage for aligned buffer is incorrect. It should be:

std::aligned_storage<Len,Align>::type buf; // C++11
std::aligned_storage_t<Len,Align> buf; // since C++14

A sample implementation of std::aligned_storage is given here. It has only a nested class, and consequently its size and alignment requirement are both 1. This is what’s happened to your first printf.

Nit: std::aligned_storage is to be deprecated in C++23. I guess libcu++ will take this into account and eventually deprecate it too. Consider simply implementing your own aligned buffer.

4. alignof operator
To conform to the C++ standard, you probably should pass a type rather than an object to align(T). Consider using one of the following instead:

alignof(decltype(ss)); // C++
__alignof(ss); // CUDA built-in
__alignof(S2); // CUDA built-in

Your responses have been incredibly helpful and informative. Thank you for sharing your knowledge with me.

One last thing: Since the question is about using libcu++, the code should be modified to the following (although this would yield the same PTX with the one using the standard library).

#include <cuda/std/type_traits>

__shared__ cuda::std::aligned_storage_t<24, 8> buf;