OpenGL Compute Shader SSBO Write Performance Issue

krackaan · October 30, 2017, 9:02pm

I’m finding that whenever I use compute shaders that operate on data structures larger than vec4 I am having huge peformance issues. Doing the following to re-order a buffer after is sort is around 800 times slower than re-ordering vec4s. I did the timing with OpenGL Timer queries.

I wonder if anybody else is using compute shaders with larger data structures and not having any issues. Or I’m doing something fundamentally wrong and/or I should expect this behaviour. Or there could be a driver issue, I am using the latest GameReady driver which is 388.00 with a GTX 1060 6gb.

I’ve tried packing with only vec4s but not to any success, a slightly larger struct causes slightly slower execution (1.9ms-2.0ms). Writing the buffer only without re-ordering, or writing in the elements separately causes no change in execution speed. The only thing that does is the size of the struct written.

Here’s an example of a very slow shader.

#version 430

precision mediump float;

struct ConvexHull{
  vec3  position;
  uint enabled;
  vec3  half_ex;
  uint hash;
  vec4  verts_0[8];
  vec4  planes_n[6];
  vec4  planes_d[6];
};

layout(local_size_x = 128) in;

layout(binding = 0, std430) readonly buffer In {
  ConvexHull in[];
};

layout(binding = 1, std430) writeonly buffer Out {
  ConvexHull out[];
};

layout(binding = 2, std430) readonly buffer SortData {
  uvec4 sort_buf[];
};

void main() {
  uint index = gl_GlobalInvocationID.x;
  out[index] = in[sort_buf[index].y];
}

krackaan · November 1, 2017, 10:01am

The more I look at the, the more potential I think there is there could be a driver issue. If I don’t use double buffering I get a speed up of 100 times, then for every vec4 I eliminate from writing to the struct the speed doubles. The size of the buffer or struct makes no difference, only the number of bytes written each time.

Topic		Replies	Views
OpenGL Compute Shader unusually slow OpenGL	3	1684	July 11, 2022
Long compute shader compile/link time with large SSBO size OpenGL	0	975	May 21, 2019
Compute Shader Performance Vulkan	11	8199	June 8, 2016
Strange poor performances using glNamedBufferSubData OpenGL	0	1266	June 21, 2018
Cuda/openGL and SSBO alignment CUDA Programming and Performance	0	1118	July 13, 2016
OpenGL Interoperability Latency, why? CUDA Programming and Performance	0	2855	September 1, 2008
OpenGL performance regression? OpenGL	1	1166	November 20, 2015
OpenGL 4.4 very slow - OpenGL 1.1 very fast - Performance Problem Quadro K4200/K2000 OpenGL	1	3164	January 26, 2016
Slow compile with large fixed-size arrays in SSBO OpenGL	3	1683	November 29, 2024
vec3 array inside of ssbo is read with stride of 4 Linux	2	509	October 12, 2021

OpenGL Compute Shader SSBO Write Performance Issue

Related topics