Reducing register usage of a quaternion class

Hello, I am currently implementing a partilce filter system for upper body tracking in a 3D teleconferencing environment . The particle evaluation should be processed on the gpu.

In my setup I compute a 3D volumetric reconstruction from several camera images on a gpu which is used as input for the particle filter system.

You can find some example videos on my homepage at

http://www.mi.fh-wiesbaden.de/~cjohn/research.html

The videos just show reconstructions of skin colored regions. But currently I do this reconstruction on foreground and skin colored objects, thus reconstructing the whole person.

I have a kinematic model of the upper body of a person. On each bone you find a quadric attached. Each particle of the particle filter now describes a possible kinematic configuration of the model. The task at hand is now to find the body configuration with the highest likelihood, which is given with the body configuration resulting in the highest overlap of body model and occupied reconstructed volume. (its slightly more involved, but this is the general idea)

Right, no bank conflicts. Is it possible to perform several of these reductions in parallel? This would be more efficient. Possibly this would also be somewhat better: (for 256-thread blocks)

for(unsigned offset = BLOCK_DIM_X>>2; offset > 0; offset >>= 2)

{	

	// ensure last summing cycle has been finished for all threads in block

	__syncthreads();

	if(threadIdx.x < offset)

		LocalBlock[threadIdx.x] += LocalBlock[threadIdx.x + offset] + LocalBlock[threadIdx.x + offset*2] + LocalBlock[threadIdx.x + offset*3];

}

This is fascinating. I haven’t studied math as much as I should. Is there a good book you can recommend to learn about such things? (Quaternion math and its applications) Something intuitive and not too dry.

Hello, for my task at hand this is the most parallel reduction possible, as I do have only 256 registers with data to sum up. For Quaternion math a good starting point is

http://www.euclideanspace.com/maths/index.htm

Very interesting topic. I’ve seen a few videos of it before, and have always been impressed by the stability of the tracking.

I can see that you are using axis-aligned bounding boxes as bounding volumes - wouldn’t arbitrarily oriented bounding boxes provide you with even more information? Or are you required to keep the degrees of freedom down to a reasonable level due to time constraints?

I think the bounding boxes are just used as a fast way of generating constraints for the particles (if x< a || x > b is very easy to code ;)).

The particle cloud contains much more information probably

That is a wonderful site. Thanks!

(If anyone has any more, please post. Esp books.)

Hello, the hand tracking video shows just some first tests to evaluate image processing algorithms which needed to be aware of foreground objects, which in that case were hands. Currently I am developing an upper body tracking system which has 14 DOF. I will post some videos if its working (could take a while).

Actually the particle cloud represents a probability distribution of the current state of hands and head volume. What you see in the videos are 3 interacting particle filters each with 200(head 300) particles, just the modes of the probability distribution are drawn (red boxes). For this each particles consists of its pose and recent history (brownian motion and first and second order autoregressive models)