Could multiple stream calculation @GPU be compatible to multiple CPU?

Since I have a lot of independent functions (say millions) needed to be executed, I found GPU can start many kernel functions with different streams, which seems like multiple small CPU running. However, I doubted if this picture is true. If it is true, how many parallel streams can be really executed at the same time (2? or 32? or thousands?), or say how much efficiency can I improve by using multiple stream GPU calculation instead of CPU?

i am curious: how do you manage millions of independent functions?

surely there must be some functionality overlap, with merely the data differing?

how many functions truly have different functionality?