Is it possible to run multiple kernels parallel in CUDA? I am running Matrix multiplication kernel for n matrices where n can be any number like 100000.
M[sub]n[/sub] x M[sub]n-1 [/sub] x … M[sub]3[/sub] x M [sub]2[/sub] x M [sub]1[/sub].
If I run this kernel sequentially, the performance of program is much worse than CPU based solution.
I want to run Matrix Multiplication kernel parallel for different matrices. e.g.
M[sub]2[/sub] X M[sub]1[/sub], M[sub]3[/sub] x M[sub]4[/sub], M[sub]5[/sub] x M[sub]6[/sub] …
Or I want to run this multiplication of matrices in blocks like each multiprocessor executes n/mp matrix multiplication in parallel.
Does CUDA provide such functionality? I am using CUDA 2.3 and GeForce 8800 GT.