Why is OpenCV Gpu module performing faster than VisionWorks ?

I have tried several functions of OpenCv gpu module and compared the same behavior with visionWorks immediate code. And surprisingly, it all circumstances the OpenCv Gpu Module is performing significantly faster than VisionWorks.

e-g
a Gaussian pyramid of level 4 implemented manually using opencv gpu::pyrdown

gpu::pyrDown(d_m,l1);
		gpu::pyrDown(l1,l2);
		gpu::pyrDown(l2,l3);
		gpu::pyrDown(l3,l4);

takes 2.5 ms on average for 100 iterations. Whereas, VisionWorks

if(vxuHalfScaleGaussian(context,image,half_image,3) != VX_SUCCESS)
    {
      cout <<"ERROR :"<<"failed to perform scaling"<<endl;
    }

    if(vxuHalfScaleGaussian(context,half_image,half_image_2,3) != VX_SUCCESS)
    {
      cout <<"ERROR :"<<"failed to perform scaling"<<endl;
    }

    if(vxuHalfScaleGaussian(context,half_image_2,half_image_3,3) != VX_SUCCESS)
    {
      cout <<"ERROR :"<<"failed to perform scaling"<<endl;
    }

    if(vxuHalfScaleGaussian(context,half_image_3,half_image_4,3) != VX_SUCCESS)
    {
      cout <<"ERROR :"<<"failed to perform scaling"<<endl;
    }

takes 11.1 ms on single execution, and 96ms on average for 100 iterations.

If this is generally true, then what does visionWorks offer ?

I am running “cuda-repo-l4t-r21.3-6-5-local_6.5-50” version of L4T