CUDA Stream

Is stream concept per GPU? i.e. stream can’t be setup for multiple GPU, each GPU should its separate stream?

Yes, stream creation has an inherent device association: