Can threads from different warps access shared memory at the same time?

Cuda SMs comprise 4 Partitions. Those have independent arithmetic units=cores.
Warps running on a SM are each assigned to a specific partition.
Each partition can schedule an instruction every clock cycle.

Shared memory is accessed over the LSU. It is a single resource shared by all 4 partitions.
Typical speeds from the last few architecture generations are mostly 1 (up to 2 cycles) per access / SM. There is no time-sharing (not multiple ports for each SM partition).

See also: How does the LSU (Load/Store Unit) execute Load/Store instructions in the Ampere architecture? - #8 by Greg