Then your job configuring Torque is basically done. Presuming your cluster is heterogeneous (ie. not all nodes have graphics cards), then you will need to create a resource for the node or nodes with GPUs. Jobs which need the GPUs must then specify that resource so that the scheduler knows to allocate your job nodes which have the correct hardware.
But that is incidental, because nothing you do to the cluster scheduling software is going to automagically make your application use the GPU. You need to have an application with CUDA support and the appropriate NVIDIA runtime libraries installed on the node. You then need to configure your job scripts to set the appropriate runtime environment variables to ensure your CUDA application can find everything it needs when MOM (or whatever launch daemon you use) forks you job on the node.
I suspect the latter requirement is a far bigger problem for you than the former.
Yes the problem is that I need to make any application use the GPU, Even non-CUDA ones >.<
If needed, I can write a program with CUDA to do it, but I would like to avoid doing it
It would make the GPU execute the program instead of the CPU and also allocate the memory on the graphics card, and so on. I have not thought about it yet.
I think you have a fundamental misunderstanding of what CUDA is and what CUDA capable video cards can do. What you are proposing is impossible. The GPU is a completely different architecture to the host CPU. It cannot run host code, it cannot be scheduled by the host process scheduler.
Ok, thanks
With such a mistake, even the drawing board would laugh at me >.<
So, if I run CUDA programs with qsub, I suppose that CUDA will manage the resources of the card (threads and memory allocation) used by those programs.
If my supposition is right, where can I get some documentation on it?
I think this goes to the heart of your misunderstanding. You don’t run CUDA programs, you run one host program containing CUDA kernels per GPU at a given time. This means your cluster scheduling software must only fork one process per node per free GPU. An outline of how to do that it was contained in my original reply to you.