Large Overhead on clCreateContext

I’m running OpenCL on a Tesla M2050 in a machine with CUDA 5.0.35 installed. I’m seeing huge overhead from calling clCreateContext (sometimes > 1 second). Has anyone seen this before, or have any idea what might be causing it? I expect some initialization overhead but 1 second seems excessive.