Avoiding race conditions in operations on bit-sized-voxel structures

Clarence · November 10, 2018, 10:29am

Dear all,

I’m currently working on voxel project in university. My work is based on a highly modified version of the gvdb-library. One of the changes is, that I want to use only one Bit per voxel instead of char/float etc…

An important part is removing the voxel at places where they collide with the position of points. This is easy if each voxel is of size one byte or larger: the position of the points is read in a kernel and zero is written to the location where each of the intersecting voxel is stored.
However, if one voxel is only the size of a bit - one dimension eg. In x direction extends by 8. If we now launch kernel with one thread per point and if two points lie within one char(8voxel), two threads would each set a voxel/bit to zero and write it to global memory. We’d now have a race condition and the whole world collapses.

So the question is: What is the best way to do this?
Before I present my current thoughts, here are a few facts: The number of points that are interacting with the voxelstructure is in the order of 10^4 to 10^6. The points do often cover a surface and are separated by approximately one voxel size apart from each other.

My current approach would be to sort the points via cub::radixsort in x,y and z and bin all points that have the same value with respect to x/8 to a byte-long bitmask, such that I can deal with all Voxel within a byte at once.

I’d be really glad, if you guys have suggestions, ideas or feedback. If you need any more information, just ask away, I just wanted to keep it short and on point for now.

All the best from munich,
Glenn

M_B · November 12, 2018, 7:25pm

Have you thought about using a global array of type uint32_t together with atomicOr? (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicor).

In this case your x direction would extend by 32, not 8. Each thread would fill a register with 1 << (your particle x coordinate & 31) and atomic OR that into your global memory array at [z][y][x >> 5]. Because you say the density is around one point per voxel, you shouldn’t have much contention.

If you wanted to clear bits rather than set them to 1, you could instead use atomicAnd with ~(1 << (your particle x coordinate & 31)).

Clarence · November 13, 2018, 8:03am

Hi M__B,

thanks for your answer! Yes, I thought about atomic operations, but my thinking was that it would increase the run time by a whole lot - that was just intuition based not facts, though.
When thinking about it, it still seems faster than sorting, encoding and binning.

The reason that you proposed 32 bit is because there is no 8/16-bit version available, right?
One problem that comes to my mind is that I need to know the bits that changed/ the voxel that were deleted.
But as far as I remember the atomic operation only returns the new value, right?

All the best,
Glenn

Clarence · November 13, 2018, 8:18am

Oh wait, I’m using a Surface bound to a 3DcudaArray and surfaceWrite to access the memory. I guess atomic operations are not possible then?