NV_bindless_multi_draw_indirect extention

Hello!

My question is about OpenGL extention NV_bindless_multi_draw_indirect.

Let me introduce you some details:

  1. I am working on Windows 10, 64 bit. CPU - Xeon E5-2620 v3 2.4 GHz. GPU - NVidia GeForce GTX 1060 6Gb, driver version - 441.20.
  2. I have some data (stored in RAM). This data is separated by “tiles”. One “tile” is represented by vertices (3 double per vertex), indices (unsigned int), UV (2 floats per 1 UV coord), textures (10 mip-map levels of DDS format) and normals (3 float per normal) arrays.
    So, for whole data I have 4 GPU buffer and for MultiDraw purposes one more buffer. Every buffer is created and used with appropriate target and so on. More over, every buffer is Persistante and registred as resident for GPU.
    For simplicity: on loading stage all data (from RAM) is transfered to GPU. At the same time I create for each “tile” indirect command with appropriate offsets, GPU addreses and so on. So 1 tile - 1 draw indirect command.
    After loding stage no more modifications are made with that all buffers.
    On render stage only vertex attributes parameters are set for appropriate buffers and glMultiDrawElementsIndirectBindlessNV called.
    As the result - i can see expected picture, expected FPS.

But.
My GPU is loading by 70-97% when only 4 “tiles” (commands) are drawing (5220 triangles)!!! Of course I have ~2k FPS, but such GPU load is very strage for me.

As i can see there is poor count of answered topics on this forum, but maybe someone have had the same problem and knows the reasons.

Here some code snippets from what I am doing.

Each frame:

bool _isPrimRestEnabled = Gl.glIsEnabled(Gl.GL_PRIMITIVE_RESTART);
if (!_isPrimRestEnabled) Gl.glEnable(Gl.GL_PRIMITIVE_RESTART);
Gl.glPrimitiveRestartIndexNV(int.MaxValue);

/* Activate texture array */
Gl.glActiveTexture(Gl.GL_TEXTURE0 + IN_TextureDataLoc);
Gl.glEnable(Gl.GL_TEXTURE_2D);
Gl.glBindTexture(Gl.GL_TEXTURE_2D_ARRAY, _textureArray);

Gl.glEnableClientState(Gl.GL_VERTEX_ATTRIB_ARRAY_UNIFIED_NV);
Gl.glEnableClientState(Gl.GL_ELEMENT_ARRAY_UNIFIED_NV);
Gl.glEnableClientState(Gl.GL_DRAW_INDIRECT_UNIFIED_NV);

            _gpuBufferVertices.Use(true);
            _gpuBufferIndices.Use(true);
            _gpuBufferUV.Use(true);

            _gpuBufferIndirectNV.Use(true);
            _gpuBufferIndirectNV.Use(Gl.GL_DRAW_INDIRECT_ADDRESS_NV);
            Gl.glVertexAttribDivisor(IN_DrawIDDataLoc, 1);

            Gl.glMultiDrawElementsIndirectBindlessNV(
                Gl.GL_TRIANGLE_STRIP, 
                Gl.GL_UNSIGNED_INT,
                IntPtr.Zero,
                _drawCommandsCount,
                0,
                3
                );

            _gpuBufferIndirectNV.UnUse(true);
            _gpuBufferIndirectNV.UnUse(Gl.GL_DRAW_INDIRECT_ADDRESS_NV);

_gpuBufferUV.UnUse(true);
            _gpuBufferIndices.UnUse(true);
            _gpuBufferVertices.UnUse(true);

Gl.glDisableClientState(Gl.GL_DRAW_INDIRECT_UNIFIED_NV);
Gl.glDisableClientState(Gl.GL_ELEMENT_ARRAY_UNIFIED_NV);
Gl.glDisableClientState(Gl.GL_VERTEX_ATTRIB_ARRAY_UNIFIED_NV);

Gl.glBindTexture(Gl.GL_TEXTURE_2D_ARRAY, 0);
Gl.glActiveTexture(Gl.GL_TEXTURE0);

Function

_gpuBuffer______.Use(bool flag)

sets vertex attribute params by such way:

Gl.glEnableVertexAttribArray(_bufferDataFormat.Index);
if (_bufferDataFormat.Type == Gl.GL_DOUBLE)
    Gl.glVertexAttribLPointer(_bufferDataFormat.Index, _bufferDataFormat.Size, Gl.GL_DOUBLE, _bufferDataFormat.Stride, offset);
else if (_bufferDataFormat.Type == Gl.GL_UNSIGNED_INT)
     Gl.glVertexAttribIPointer(_bufferDataFormat.Index, _bufferDataFormat.Size, Gl.GL_UNSIGNED_INT, _bufferDataFormat.Stride, offset);
else Gl.glVertexAttribPointer(_bufferDataFormat.Index, _bufferDataFormat.Size, _bufferDataFormat.Type, _bufferDataFormat.IsNorm, _bufferDataFormat.Stride, offset);

*of course for indices buffer there no VertexAttrib bindings are made
**first Use call of indirect buffer - sets VertexAttrib binding for TileID (baseInstance parameter in DrawArraysIndirectBindlessCommandNV struct). The second Use call of indirect buffer binds buffer range of indirect buffer via function :

glBufferAddressRangeNV

Kind regards!