--dtype is related to computation and activations. bfloat16 is appropriate for Int4-Autoround because this is actually W4A16 (the A16 is 16-bit activations). It does not change anything for that quant type - but it would for FP8 or NVFP4, because those are most often W8A8 or W4A4 (the activations are not 16 bit at base).
The KV cache is separate. --kv-cache-dtype fp8 I think chooses one of the underlying fp8_e4m3 or fp8_e5m2 datatypes. The former is earlier, more compatible, and generally thought to be inferior to e5m2 - however, some implementations or combinations are incompatible with e5m2. A good example is the Gemma4-31B-it model with MTP I posted about here: Gemma4 draft models are now available - #8 by joshua.dale.warner - I tried explicit fp8_e5m2 and it crashes. Incompatible. Works great with e4m3.