Now I’m trying to use the clusters API.
Specifically, trying to build CLASes from triangles.
However I got an error when calling optixClusterAccelBuild.
[ 2][VALIDATION_ERROR]: [CLUSTER_OP_VERTEX_COUNT_OUT_OF_BOUNDS] Cluster vertex count in cluster-operation argument exceeds the per-cluster maximum found in the cluster-operation description.
argument index: 0
vertex count: 166
maximum vertex count: 82
current operation:
type: build
object type: CLAS
object address: 0x13034e0a0
or
[ 2][VALIDATION_ERROR]: [CLUSTER_OP_VERTEX_INDEX_OUT_OF_BOUNDS] Vertex index in cluster exceeds the maximum found in the cluster-operation description.
argument index: 0
primitive index: 0
vertex index: 15656
vertex count: 82
current operation:
type: build
object type: CLAS
object address: 0x13022e0a00
Both from the same program, the same input but with different GPUs (RTX 4080 and 2080 Super, respectively).
The second one is a bit easy to understand what the error is saying.
Actually, the index buffer pointed to by the argument 0 contains the vertex index 15656.
Do vertex indices in each cluster need to be less than OptixClusterAccelBuildInputTrianglesArgs::vertexCount?
(In other words, does vertex buffer need to be cluster-local?)
I have assumed that the value range of vertex indices can be arbitrarily large since OptixClusterAccelBuildInputTrianglesArgs has the field indexFormat. Is this not correct?
In addition to this, I noticed that the optixClusterUnstructureedMesh sets OptixClusterAccelBuildInputTrianglesArgs::indexBufferStrideInBytes to 4. Is 4 correct? not 12 (4 bytes x 3 verts per tri) ?
Yes! Vertices in the cluster API are cluster-local, so you’ll need a (logically) separate vertex buffer and index buffer for each cluster, and the indices will need to be relative to just the cluster. This is a good question, it looks like we didn’t explain this clearly anywhere in the programming guide nor SDK samples nor header files. I’m making a note to add some explanation on indexing into our documentation.
So the bigger question here is how to cluster-ify large meshes, since you need to transcode your vertex and index buffers. You can do it yourself if you want, but we provide some examples and tools for it, and FYI the OptiX SDK sample’s data file (“duck_clustered.gltf”) was created using some of the Vulkan-based clustering tools that we provide, specifically vk_animated_clusters. There’s an associated blog post that describes some of the tools we offer. There is also a lower level C++ library called nv_cluster_builder for clustering triangle meshes, if you want to write code to produce clusters at run time rather than pre-baking clustered geometry.
Anyway, in the OptiX API, the cluster indices currently cannot be arbitrarily large, but there are other reasons for the indexFormat enum to allow 16 and 32 bit values. This allows data compatibility with the non-cluster BVH builder, and perhaps data compatibility with binary mesh formats such as GLTF. It also provides for data access with 2 or 4 byte alignment, for better performance. Last but not least, we may want to increase the index limits in future versions.
The stride parameter of 4 in optixClusterUnstructuredMesh is correct. Note that with clusters, the indexFormat and indexStrideInBytes params are referring to a single index, rather than a triplet of indices belonging to a triangle. The struct OptixClusterAccelBuildInputTrianglesArgs is a bit different in that respect than (for example) OptixBuildInputTriangleArray.
Yeah, it is a reasonable question. So the 9 bits allocated to vertexCount is the piece that most strongly dictates what the maximum value of a cluster’s vertex index can be - currently 511 (0x1ff). Note while this is the maximum value that can be stored in vertexCount, the max value that can be currently used might still be lower.
When using an array of 9 bit values, they would naturally require 2 bytes each and so get padded to 16 bits by default. Using 8 bits per index is an option if you know in advance that your data will never exceed 255 (0xff). So the option to use 32 bits is perhaps the main one that’s confusing or potentially misleading in terms of maximum allowable value. Allowing these small values to be represented in 32 bits mainly means that if you’re using any other APIs or file format that might read or write these cluster submeshes using index buffers of 32 bit values, then you don’t need to copy and re-pack the data in a separate buffer. Instead you can exchange data between the OptiX cluster build and something else via the very same buffer. It is a tradeoff where some bits are wasted in the buffer in return for some development or runtime convenience, possibly even a little perf in some cases when you can avoid unnecessary memory traffic.
When using an array of 9 bit values, they would naturally require 2 bytes each and so get padded to 16 bits by default.
The necessity of 16-bit index is understandable.
However,
Shouldn’t a case like this be handled by using indexFormat of 8-bit or 16-bit + indexStrideInBytes = sizeof(uint32_t)?
I don’t understand the necessity of 32-bit index enum since the maximum representable value 511 fits in 16-bit.
I don’t have a real problem in the clusters API for now. I’m just curious in the choices of OptiX API specification.
It’s a good question, I appreciate you asking for clarification. What I’m trying to say is that the option to use 32 bits in the storage buffer is not meant to be a specification or statement about how large the index values can be. I believe the option is mainly there for developer convenience when connecting OptiX to other software that is external to OptiX. It might be other SDKs, or it might be existing code you wrote and don’t want to have to update or refactor. For example, if I use vk_animated_clusters to build GLTF files with cluster-compatible sub-meshes in them, and then I get index buffers out of GLTF that use 32 bit indices, and I want to pass them directly to OptiX without having to worry or potentially modify or transcode them, then I can do that by using the 32 bit format when building clusters. GLTF doesn’t know anything about OptiX, and GLTF meshes might use 32 bits for indices because GLTF supports large meshes and doesn’t know that the user is going to create only meshes with no more than 256 triangles. (That’s just a contrived example, I’m not super familiar with what options GLTF offers. Imagine any API or SDK that spits out 32 bit indices.)
It might work to use the 8 or 16 bit formats with a stride of 32 bits, though that depends on assuming the data is laid out in little endian format in memory. While that’s true on NVIDIA GPUs for now (as far as I know) there’s still less to worry about, and it’s less confusing and error prone if you can match the type of the data coming from somewhere else exactly with the type of the data being passed to OptiX, right?