I am observing some slowdown on the first iteration of CNN training or inference. I fixed it with a call to cudnnCnnInferVersionCheck(). But calling cudnnCnnInferVersionCheck() will increase the memory usage by approximately 1.5G. How can I delete it when my program is finished?