caffe2 vs cuda performance

Hi,

has anyone performed a comparison between the performance of caffe2 and hand-tuned C/C++ CUDA ( i.e. program built from C / C++ with CUDA runtime / driver API ) production engine?

It would be great if you could share the results.

Thanks