commandBufferLength is always 0

Hi,

I encountered Command Buffer Full in nsight systems log. It happens only on H100 and not A100 and I wanted to find out why.
So I found out CUPTI has this CUpti_ActivityOverheadCommandBufferFullData data structure and I was hoping I could get some info out of it.

I modified the sample cuda file activity_trace_async/activity_trace_async.cu, added some code to send a lot kernels to GPU and successfully triggered COMMAND_BUFFER_FULL on H100.

But all I got was something like:

OVERHEAD COMMAND_BUFFER_FULL [ 1730095755645170274, 1730095755645707410 ] duration 537136, THREAD, id 2027675648, correlation id 42169
CUpti_ActivityOverheadCommandBufferFullData : commandBufferLength 0 channelID 10 channelType 0
OVERHEAD COMMAND_BUFFER_FULL [ 1730095755645713971, 1730095755646239164 ] duration 525193, THREAD, id 2027675648, correlation id 42170
CUpti_ActivityOverheadCommandBufferFullData : commandBufferLength 0 channelID 10 channelType 0

Is this the correct behavior? Does that mean this GPU has effectively no commandBuffer (commandBufferLength 0)?
or does it mean the buffer length is exhausted, which seems meaningless if this is only triggered when command buffer is full.

On the other hand, when I ran the same thing on A100, I didn’t get any COMMAND_BUFFER_FULL. Is there a way to find out the command buffer length without triggering a command buffer full event?

I used cuda 12.6.2 btw:

wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda_12.6.2_560.35.03_linux.run

The commandBufferLength returned in the COMMAND_BUFFER_FULL overhead activity record will always be 0 as it denotes the remaining size of the command buffer (which will always be 0 in the event of it being blocked). So, this field is not useful, and we will consider dropping it in a later release.

There is currently no way to know the command buffer length. But the size of the command buffer can be scaled [0.25x, 0.5x, 2x, 4x] using the CUDA_SCALE_LAUNCH_QUEUES environment variable, as documented here. By modifying this environment variable, one can observe if there is any change in the number of times we incur COMMAND_BUFFER_FULL overheads. If the number of COMMAND_BUFFER_FULL records decrease when the command buffer size is increased, it can result in performance improvements.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.