cuBLAS and 2 GPU's

Is it possible to multi-thread two processes to run cuBLAS functions on two separate GPU’s. Based on the simpleMultiGPU example from the SDK, setting the device in each thread process and calling cublasInit, cublasShutdown will lock one of the threads until it terminates. The same thing happened without actual initialization of cublasInit shutdown, e.g. calling cublasAlloc, cublasSetVector, ect will also lock. Anyone know if this a thread issue or GPU issue?