Porting TK1 based Convnet Implementation to FPGA

I have to use Nvidia’s TK1 and TX1 devkits to implement some convnets. According to Nvidia, both kits do not support OpenCL drivers and only work with CUDA. In the second stage, the same nets need to be ported to FPGA due to better power consumption stats as the project is aimed at developing embedded solutions. (Please don’t tell me how GPUs are better than FPGA as this needs to be done in any case :))In this case, OpenCL will have to be used as CUDA does not support FPGA or DSP kits. Is there any alternative to learning both OpenCL and CUDA in this scenerio? I cannot switch to a different devkit and FPGA implementation is also mandatory.