Shall we run well optimized cpu code on multiple cuda cores simultaneously?

I’m having complete c code of machine learning model which is well optimized for cpu. Now want to make it multichannel . suppose lets say my gpu contains 3054 cores , shall I run my process in each core and make it possible to run 3054 processes parallelly with different inputs .

Is there any method/framework to do it directly with c code , or do we need to convert it to cuda code and then make it possible to run multiple channels at a time.

our goal is to maximize the number channels we can run simultaneously in gpu .

No, that is not generally how CUDA GPUs work (a CUDA core by itself cannot support a complete thread of execution, unlike a CPU Core). This is not the right model or method to use. Instead, if you wish to port your code to run on GPUs, you should start by taking advantage of training which is readily and widely available to learn how to program in CUDA.

For this particular case (machine learning) the usual suggestion would not be to start with lower-level CUDA programming, but instead take advantage of one of the many programming frameworks to run machine learning algorithms efficiently on GPUs.

1 Like