I am actually working on a project and I would like to have your opinion
The aim of this project is to solve a problem using GPGPU programming (so CUDA)

So problem is quite easy, I have to put in a truck a lot of container, with different weight and size. But the Truck is large enough to accept all containers so it is not a bin packing problem. Also, the truck is divided in N area in which you can put exactly one container.
So, each container has an order of â€œpreference placeâ€ and I have to optimize this preference. Also you have some other constraint; some container must be neighbor, one container can take one and half place etcâ€¦

The problem can be model by a Mixed Integer problem, with an objective function.
Actually, I was thinking to put for each container N binary variables and put 1 at the variable i if the container is in the area I, 0 otherwise.

The problem is that, I will have a spare matrix (density under 5%). So, from what I’ve read so far, it wonâ€™t be very useful to use GPGPU

I also think to put for each container a variable which can have the value 1 to N. Then the matrix will be denser.

My major problem is that I donâ€™t know what is the best way to tackle the problem :
Using branch & Band Solver and during the relaxation, use a LP solver programming in GPGPU.
Or use Constraint Solver in GPGPU
Or, use metaheuristic solver in GPGPU.

So, I would like to have your opinion about my problem

This sounds like a fun problem. I don’t know if I have seen any related work on adapting problem-specific optimization
heuristics to GPUs, so you may be in uncharted territory.

If you want to use something off-the-shelf, you’ll want to cast your problem in terms of a well-known algorithm like a neural network, a
support vector machine, or a genetic algorithm. There have been some successful efforts to port these to GPUs and you could probably
find a paper to use as a guide.

Otherwise, feel free to try to adapt an algorithm that you already have. The only insight that I can offer for problems like this is to
try to break the algorithm into enumeration and evaluation stages, where enumeration prunes a large space of possible solutions into many
independent sub-problems, and evaluation determines an efficient solution for each subproblem.

There are several sparse matrix libraries for CUDA. I don’t know if this would apply directly, but you could try CUSP: http://code.google.com/p/cusp-library/