OpenGL Compute Shader unusually slow


I’m writing a pathtracer using an OpenGL compute shader which writes color values to a texture, and then I draw that texture on a fullscreen quad using a different shader program. My loop consists of the following:

  1. Clearing the color buffer
  2. glUseProgram for the compute shader program, glDispatchCompute followed by glMemoryBarrier
  3. glUseProgram for the fullscreen quad shader program, glDrawArrays
    That’s all.

After simplisitc profiling, 99% of the GPU time is spent in the compute shader which is kind of expected, with a frame time of ~64ms on average. I tested the same program out on an AMD laptop which is significantly lower spec, (see below), and I got ~30ms per frame. This was completely unexpected and I can’t really put my finger on why this is the case.

More details: I use a single texture to act as the drawing texture that gets accessed by the compute shader by binding it to an image unit (image2D in the shader), and in the normal fullscreen quad shader program i just use a sampler2D in the fragment shader and output its value. I have one uvec2 uniform, which shouldn’t be a problem, and I have 5 SSBOs which all contain a dynamic array of custom GLSL structs. I will add a snippet to show the struct alignment:

// SSBO helper structs

struct Triangle
	vec4 v0v1;   // v0.x, v0.y, v0.z, v1.x
	vec4 v1v2;   // v1.y, v1.z, v2.x, v2.y
	vec4 v2norm; // v2.z, n.x, n.y, n.z
	vec4 e1e2;   // e1.x, e1.y, e1.z, e2.x
	vec4 e2matX; // e2.y, e2.z, mat_index, empty

struct Sphere
	vec4 sphere_data; // o.x, o.y, o.z, radius
	uvec4 mat_index;

struct Material
	vec4 type_diffuse;  // mat_type, diff.x, diff.y, diff.z
	vec4 specular_spec; // spec.x, spec.y, spec.z, n_spec
	vec4 Le;            // Le.x, Le.y, Le.z, empy

struct AABB
	vec4 data1; //, bmax.x
	vec4 data2; // bmax.yz

struct BVHNode
	AABB node_AABB;
	uvec4 data; // left/first_tri, num_tris

// SSBOs

layout(std430, binding = 1) readonly buffer SpheresSSBO
	Sphere spheres[];
} spheres_ssbo;

layout(std430, binding = 2) readonly buffer ModelTrisSSBO
	Triangle triangles[];
} model_tris_ssbo;

layout(std430, binding = 3) readonly buffer ModelLightTrisSSBO
	uint light_tri_indices[];

layout(std430, binding = 4) readonly buffer MaterialsSSBO
	Material materials[];
} materials_ssbo;

layout(std430, binding = 5) readonly buffer BVHSSBO
	BVHNode bvh_nodes[];
} bvh_ssbo;

I could optimize some of these to be UBOs and therefore not be in global memory (like an SSBO might be), but not all. The rest of the compute shader is normal raytracing math. I use imageLoad in the begining of main() to average out frames, and imageStore to store the value to the image2D. There is little to no difference if I omit the imageLoad.

What could be the cause of such slowdown? It’s 2x slower than on my AMD laptop, which is pretty low end.

OS: Windows 10 64bit
GPU: GTX 1050 (2GB)
Driver Version: 516.59
OpenGL: 4.6 Core

Hello ,

would you mind sharing the specs of you “low end” AMD laptop as well for comparison?

And ideally share the app for people to more easily reproduce the behavior just like you experience it?

Just based on the above shader code it is really difficult to say anything about performance, whether it is expected or whether there might be some (not so obvious) optimization issues. Lots of things like mem alignment or how the shader compiler behaves can influence this. Even the way you generate your rays might have an impact.

Did you look at NSight to try some more in-depth profiling?

Sorry if I can’t be of more help right away.


Just before I saw your reply I found the issue. It was me skipping over a part of my shader that I thought was irrelevant, where I (while initial porting to GLSL from C++) used an allocated array of uints with a length of 200. This was somehow causing the slowdown, probably by being dumped to global memory? By reducing the number of elements to 10-30 I got an instant speedup and now the shader takes ~14ms. Register counts didn’t change.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.