CUDA vs OpenCL - Marching cubes isosurfaces OpenGL interop

I was interested in CUDA/OpenCL OpenGL interop and looked at some apps that come with the SDK (ver. 4.0). When running the Marching cubes isosurfaces in wireframe mode and with animation for both the CUDA implementation and the OpenCL implementation, the OpenCL implementation was almost 4-5 times faster in fps ( CUDA was about 62.5fps and OPENCL was ~400fps ). Is there a reason for this discrepancy? I was thinking that CUDA is more optimized and should be running slightly faster and not 4-5 times slower? Btw I am using an NVidia Quadro FX 4600 video card on an Intel Xeon dual processor machine.