Concurrency limitation and threshold?

Hi guys,

I was doing some research on Cuda level operation concurrency. I found that if I try to do 2 convolution operation on GPU, even very small input matrix cannot be processed concurrently. Can anyone give me some hints on what might be the limitation/threshold?