General questions about multicast objects

Multicast object management APIs are added in cuda 12. I have some general questions:

  1. Which devices support multicast objects? I cannot specifications about compute capabilities on Driver API Guide.
  2. Comparing to shareable handle, which maps a memory segment onto various devices, multicasting seems to cost more memory bandwidth since each write operation takes place on the entire multicast team. Is there any benefit of it over shareable handle?