What NVIDIA card offers the highest CUDA FP64 throughput at this moment?

Which of the affordable cards (up to $3000) perform resonable well on double precision (FP64) calculations in a graphics workstation? The information on these specs we could find was outdated

I would assume the SXM2 modules of Tesla V100 to have the highest performance in this area currently.

7.8 TFlops according to this:

I only find some “affordable” refurbished parts on ebay, still with 16GB HBM2 memory. New units will cost an arm and a leg. Also the SXM2 form factor requires specialized servers with adequate power supplies.