High latency on cudnn enabled pytorch for a single image hdr model

Hi ,
I am using jetson agx-xavier. I have installed pytorch version 1.14.0a0+44dac51c.nv23.02 with cuda enabled. I get True for torch.cuda.is_available() .

I am running this code GitHub - dmarnerides/hdr-expandnet: Training and inference code for ExpandNet , where by default the input tensor is converted to cuda and model is also converted to cuda using below lines in expand.py,
t_input = t_input.cuda()
So I believe the model runs on GPU . I wanted to know , if there is anything else , which needs to be enabled to make pytorch utilize cudnn ?

Because I get only 4 fps on this model(single image hdr , where the output of the model is also an image). The inference takes about 250ms . In the code net.predict(t_input, opt.patch_size).cpu() takes the 250ms . Initially I believed the conversion to output tensor to cpu( .cpu() function) took the majority of the time . But later on I realized the inference itself was running on background , by adding torch.cuda.synchronize() .

I have two questions .
Is cudnn enabled ?
Is there any other way to accelerate it on nvidia GPU?