Inv , mul, sub of 1M Matrix 4 by 4

“Hello World”

I would like to do a evolution of my program running on cpu to gpu:

imagine you have 4 bloc of 500k matrix 4x4 in memory name W A B B0

my program do for all block ok 500K M = (1/W)(B-BO)(1/A)
and the result is M 500K matrix 4x4;

Any libraries for matrix ?
How can i do that with an gpu ? One core for one calculation ?

Thanks for any reply

Based on my on experience with the batch processing of small matrices I would recommend having each thread handle one, or a few, matrices. This means that the per-thread program is essentially simple scalar code for the various matrix operations. You may want to download the “batched solver” code from the registered developer website for an example of how to do this for the matrix inverse. The code is under BSD license so you could simply use it as a building block in your processing pipeline.