Per thread constant memory?

Alright, so I have a program that launches several threads that each launch CUDA kernels with different stream values. These kernels each use constant memory, but the constant memory needs to be on a per thread basis (kernel from thread #1 sees constant #1, kernel from thread #2 sees constant #2, etc, etc).

From what I can tell that is not happening for me right now. I have each thread creating a context using cuCtxCreate and then going about its business, and I also have the multiple streams running.

Something I am missing here?