TensorRT3 fp32 Inaccuracy

I am trying to run a modified version of a InceptionV3 Tensorflow now with TensorRT. My understanding is that in fp32 mode, the kernel fusion and optimizations should have no effect on the semantics of the network. Is this true? In my case, TensorRT seems to have precision 10% worse than running the same model under TensorFlow.

Also, when registering the inputs, nvuffparser::UffInputOrder seems to have no effect! Both kNCHW and kNHWC result in the same accuracy even those should be completely different…

I traced down the divergence in behavior to an mul op inside a batch norm structure… It seems to be a bug in tensorRT. I have a 128x6x6 CHW tensor feeding into a mul by a 128 length value. Its also set to use the kCHANNEL scale mode so everything looks okay… The values just come out plain wrong. Specifically all x,x,4 and x,x,5 look like they were multilplied by some garbage values.

Where is the link to file bugs against NVidia?

Seems the registered developer website got re-organized; it took me a couple of minutes to find where the bug submissions go now.

Login with your registered developer credentials at https://developer.nvidia.com/, click on your user name and select “My account” from the pull-down menu, then “My Bugs” from the menu on the left, then click “Submit a New Bug” button in upper right hand corner.

I’m not sure this is open access but here is the bug thread:
https://developer.nvidia.com/nvidia_bug/2029375

Because bug reports often contain a variety of confidential information, bug reports filed with NVIDIA are visible only to the filer and relevant NVIDIA personnel.