CUDA 2.3 + OpenMP + templates problem Parallel section not parallelized in templated functions

With nvcc 2.3 and visual studio 9, OpenMP pragma directives does not seem to be taken into account when placed into templated functions like this:

template<class T>

inline void testomp(){

		omp_set_num_threads(2);

#pragma omp parallel

		{

			printf("Thread %d / %d\n", omp_get_thread_num(), omp_get_num_threads());

		}

}

No parallelization appears (only one thread) in this case, while it works perfectly if I remove the template.

This works correctly when compiled directly with visual studio compiler, and I am also quite sure it works correctly under linux, with nvcc over gcc.

Did anybody already run into the same problem ?

We noticed that before; I believe nvcc 3.0 does the right thing.