'misaligned address" exception when rendering some glTF2 models

I get this exception after trying to render some but not all glTF models. In my project, I get the exception when syncing the RaygenRecord, which doesn’t make any sense since the RaygenRecord doesn’t change from model to model.

29.992512 CRTICAL [12364 ActiveOptix7.cpp->ActiveOptix7::renderNextFrame:165] CUDA call (cudaMemcpy( reinterpret_cast<void*>(sbt.raygenRecord), &raygenRecord, sizeof(RaygenRecord), cudaMemcpyHostToDevice ) ) failed with error: ‘misaligned address’

My RaygenRecord is just the camera vectors

struct RayGenData
{
	float3 cam_eye;
	float3 camera_u, camera_v, camera_w;
};

using RaygenRecord = SbtRecord<RayGenData>;

Some of the models that fail for me do work in the OptixSamples->optixMeshViewer but here’s one that also causes a 'misaligned address" exception in optixMeshViewer.

Can someone please tell what’s happening here and how I track down the cause of these exceptions when they occur?

That misaligned address case is probably a known issue.
The OptiX 7 examples do not handle some glTF vertex attribute combinations correctly.

OpenGL doesn’t have the same vector alignment restrictions like CUDA and, for example, you cannot map an interleaved vertex array with a per vertex structure like { float3 vertex; float2 texcoord;} directly to CUDA because the float2 will be misaligned, same for float4.

I would expect that your failing models are doing exactly that.
The vertex attribute data would need to be remapped to a properly aligned CUDA structure to fix this.

From the description it’s not possible to tell if the other misaligned address is related to that.
I always manually align all my device structure members to their required CUDA alignment inside a structure and if I use a structure inside an array, pad its size manually to the maximum required alignment as well.

Please always provide this minimal required system configuration information when asking about OptiX issues:
OS version (Windows: issue winver in a cmd prompt), installed GPU(s), display driver version, OptiX version (major.minor.micro), CUDA compiler version used to generate the input PTX, host compiler version.

Thanks for the reply!

Here’s my info

Windows10 Home
Version 1809(OS Build 17763.805)

Geforce RTX 2070 Super
Driver version 436.48

OPTIX_VERSION 70000

CUDA version 10.1

Microsoft Visual Studio Community 2019
Version 16.3.3


OpenGL doesn’t have the same vector alignment restrictions like CUDA and, for example, you cannot map an interleaved vertex array with a per vertex structure like { float3 vertex; float2 texcoord;} directly to CUDA because the float2 will be misaligned, same for float4.

In the optixMeshViewer sample, glTF provides a Buffer of raw interleaved mesh data that the optixMeshViewer saves to device and then provides access to the device data via BufferView objects. That seems like a very efficient way to do things and it works well for about 90% of the models I’ve tried I’ve stepped through some of the gltf meshes that fail and can’t see where the error is.

From the description it’s not possible to tell if the other misaligned address is related to that.
In my project, the 'mis-aligned address" error seems to be triggered when I update my camera matrices on the device. This makes no sense since this bit of code is independent of any model I’ve loaded. I found one possible explanation in another thread that suggests the the the misaligned address error is coming from a previous kernel call. But when I remove the code that syncs the host camera data to the device then the ‘mis-aligned address’ error goes away

https://devtalk.nvidia.com/default/topic/1042740/how-to-solve-error-of-misaligned-address-/

I still can’t get to the bottom of this and it’s kinda of show stopper my project

This is the simplest model I can find that fails both in my project and in the optixMeshViewer sample

There’s 13 meshes in the scene but just reading the 1st one will trigger the misaligned address exception

The Cuda Programming Guide says

Global memory instructions support reading or writing words of size equal to 1, 2, 4, 8, or 16 bytes. Any access (via a variable or a pointer) to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e., its address is a multiple of that size).

Here’s the code I use to create the buffer views for the indices and vertex attributes

The debug output shows that the buffer view addresses of the indices, vertices, normals, and textcoords are all evenly divisible by the size of the data type

7.349754 DEBUG [1120 OptixMesh.cpp->OptixMesh::init:108] Indices start: 47358689280::0
7.350071 DEBUG [1120 OptixMesh.cpp->OptixMesh::init:119] Positions start: 47358697500::0
7.352428 DEBUG [1120 OptixMesh.cpp->OptixMesh::init:130] Normals start: 47358703728::0
7.352767 DEBUG [1120 OptixMesh.cpp->OptixMesh::init:145] UV start: 47358709956::0
7.353199 DEBUG [1120 OptixAccel.cpp->OptixAccel::rebuildSceneAccel:84] REBUILDING
7.472592 CRTICAL [1120 ActiveOptix7.cpp->ActiveOptix7::renderNextFrame:165] CUDA call (cudaMemcpy((void*)t, devicePtr, count * sizeof(T), cudaMemcpyDeviceToHost) ) failed with error: ‘misaligned address’ (D:\ActiveWorks\Code\ActiveBerserko\framework\optix7_core\excludeFromBuild\OptixComponents.h:179)

Is there any way to get any more information about where the problem lies? Will Nsight help in this case?

I got some helpful info using the cuda-memcheck tool. Here’s the output from running the optixMeshViewer loading the bench plane gltf file.

My project doesn’t give the me the misaligned address in _closesthit__radiance. But it does report Invalid range on access by cudaMemcopy source. The optixMeshViewer sample reports this too. Can someone tell me how to find out what’s causing this error?

========= CUDA-MEMCHECK
========= Host API memory access error at host access to 0xb03e08238 of size 124 bytes
========= Invalid range on access by cudaMemcopy source.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\system32\nvcuda.dll (cuMemcpyDtoDAsync_v2 + 0x17b) [0x1bd1eb]
========= Host Frame:C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_827405c7c65146ab\nvrtum64.dll (rtcGetExportTable + 0x8e909) [0xad769]
========= Host Frame:C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_827405c7c65146ab\nvrtum64.dll (rtcGetExportTable + 0x8fc09) [0xaea69]
========= Host Frame:C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_827405c7c65146ab\nvrtum64.dll (rtcGetExportTable + 0x878b0) [0xa6710]
========= Host Frame:C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_827405c7c65146ab\nvoptix.dll (optixQueryFunctionTable + 0x1aa7eb) [0x2480fa]
========= Host Frame:C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_827405c7c65146ab\nvoptix.dll (optixQueryFunctionTable + 0x1ab81d) [0x24912c]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (optixLaunch + 0x6f) [0x76dff]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (launchSubframe + 0x216) [0x64116]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (main + 0x76c) [0x6518c]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (invoke_main + 0x39) [0x77df9]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (__scrt_common_main_seh + 0x12e) [0x77c9e]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (__scrt_common_main + 0xe) [0x77b5e]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (mainCRTStartup + 0x9) [0x77e89]
========= Host Frame:C:\Windows\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x17974]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6a271]

========= Misaligned Shared or Local Address
========= at 0x00000990 in OPTIX/generated/generated:280:__closesthit__radiance_0x619ce22fc25c8f1d_ss_0
========= by thread (0,0,0) in block (55,35,0)

========= Program hit cudaErrorMisalignedAddress (error 716) due to “misaligned address” on CUDA API call to cudaDeviceSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\system32\nvcuda.dll (cuProfilerStop + 0x115952) [0x2e33c2]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (cudaDeviceSynchronize + 0xf8) [0x9168]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (launchSubframe + 0x333) [0x64233]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (main + 0x76c) [0x6518c]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (invoke_main + 0x39) [0x77df9]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (__scrt_common_main_seh + 0x12e) [0x77c9e]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (__scrt_common_main + 0xe) [0x77b5e]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (mainCRTStartup + 0x9) [0x77e89]
========= Host Frame:C:\Windows\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x17974]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6a271]

========= Program hit cudaErrorMisalignedAddress (error 716) due to “misaligned address” on CUDA API call to cudaGetLastError.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\system32\nvcuda.dll (cuProfilerStop + 0x115952) [0x2e33c2]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (cudaGetLastError + 0xf1) [0xba81]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (launchSubframe + 0x338) [0x64238]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (main + 0x76c) [0x6518c]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (invoke_main + 0x39) [0x77df9]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (__scrt_common_main_seh + 0x12e) [0x77c9e]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (__scrt_common_main + 0xe) [0x77b5e]
========= Host Frame:C:\ProgramData\NVIDIA Corporation\OptiX SDK 7.0.0\SDK\build\bin\Debug\optixMeshViewer.exe (mainCRTStartup + 0x9) [0x77e89]
========= Host Frame:C:\Windows\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x17974]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6a271]

========= ERROR SUMMARY: 4 errors

But it does report Invalid range on access by cudaMemcopy source. The optixMeshViewer sample reports this too. Can someone tell me how to find out what’s causing this error?

https://devtalk.nvidia.com/default/topic/1056448/optix/cuda-memcheck-errors-from-sdk-samples-ignore-/

Hi Bird33, have you tried catching a call stack in Nsight VSE?

The Nsight tools all know a bit more about OptiX than cuda-memcheck does.

So your gltf file does crash an unmodified optixMeshViewer? Can you share the minimal .gltf file with us?


David.

Hi David,

I haven’t learned how to use Nsight yet. I guess it’s time to dig in. :) Is the video mentioned here still in the works?
https://devtalk.nvidia.com/default/topic/1061831/optix/optix-debugging-and-profiling-tools/

The simplest, cleanest model that I’ve found that causes the 'misaligned address" exception is here

It does crash in an unmodified version of optixMeshViewer. The file has 13 meshes but to simplify a bit you can change line 775 in Scene.cpp so that only the first of the 13 meshes is parsed, since it will throw the exception.

//  for(size_t i=0; i<m_meshes.size(); ++i)
for (size_t i = 0; i < 1; ++i)

Thanks for the help!

Yes we’re still planning to release a video at some point, but it wouldn’t necessarily help in this case. I recommend just diving in, install Nsight VSE and fire up the debugger. I hope that you can run with Nsight VSE, catch the exception happening, and get a CUDA call stack, without really having to learn much about Nsight. If you turn on debug symbols & lineinfo, and you don’t get a call stack right away, then don’t spend too much time fighting with it. I will try it on my end with the model you linked.


David.

Oh @Bird33, I can repro this issue, and this is expected behavior. I didn’t read through Detlef’s response carefully enough, but he answered this completely.

The issue is that not all of GTLF’s binary layouts are automatically CUDA compatible. The interleaved vertex data formats don’t meet all of CUDA’s requirements.

In order to fix this problem, you’ll need to find or write a tool to convert the GLTF model so that the internal structures are aligned internally, it’s not enough for the base pointer to meet alignment requirements. This means unpacking the vertex data and repacking it so that it’s CUDA compatible. The GLTF loader we use in the OptiX SDK examples doesn’t do that, it just opens the file and sends the data directly to the GPU. So unfortunately, the OptiX7 optixMeshViewer will not work with all GLTF models.

I hope that helps clarify. You don’t need to worry about finding a call stack. Your options are to either exclude models that have this problem, or see if you can find or write a program to re-pack GLTF models with valid CUDA alignments.


David.

Hi David,

Thanks for taking the time to check out the model and clarify what’s happening in the optixMeshViewer sample.

In my own project, I have already written a tool that unpacks the possibly interleaved vertex data and repacks it into a raw binary buffer so that indices and vertex data are separate contiguous blocks that are aligned to be CUDA compatible. It’s this buffer that I pass to the GPU, not the one provided by tinyglTF that the optixMeshViewer sample uses.

A few posts back, I showed debugging output that indicates that the starting addresses of the vertex attributes on the GPU are evenly divisible by the size of the data type, which AFAIK is what CUDA requires. I guess I’m probably missing something there.

Thanks for the help!

Hello.

It is not as simple as being aligned to data type boundaries. OptiX has helper macros in optix_7_types.h to help you align correctly. There is also some discussion of this in the optix programming guide (where it does state that accel build buffers must be 128byte aligned). Here are the macros for reference:

/// Alignment requirement for device pointers in OptixShaderBindingTable.
#define OPTIX_SBT_RECORD_ALIGNMENT 16ull

/// Alignment requirement for output and temporay buffers for acceleration structures.
#define OPTIX_ACCEL_BUFFER_BYTE_ALIGNMENT 128ull

/// Alignment requirement for OptixBuildInputInstanceArray::instances.
#define OPTIX_INSTANCE_BYTE_ALIGNMENT 16ull

/// Alignment requirement for OptixBuildInputCustomPrimitiveArray::aabbBuffers and OptixBuildInputInstanceArray::aabbs.
#define OPTIX_AABB_BUFFER_BYTE_ALIGNMENT 8ull

/// Alignment requirement for OptixBuildInputTriangleArray::preTransform
#define OPTIX_GEOMETRY_TRANSFORM_BYTE_ALIGNMENT 16ull

/// Alignment requirement for OptixStaticTransform, OptixMatrixMotionTransform, OptixSRTMotionTransform.
#define OPTIX_TRANSFORM_BYTE_ALIGNMENT 64ull

I also happened to crash this sample with an gltf model downloaded from sketchfab.
https://sketchfab.com/3d-models/map-gta5-f622784b2fa9453fb20821afb74a9cb6

I’ve fixed the crash by replacing float2 with

struct myfloat2
{
float x, y;
};

Hope it helps.