Bug in fp8x2 and fp8x4 converts

Suggestions:

  1. Retest on the latest CUDA version (12.8.1 currently)
  2. If behavior is unchanged, file a bug.