Can threads from different warps access shared memory at the same time?

Question:
I understand that all threads of one warp can access the shared-memory in one access as long as all threads access a different bank.
Is this also the case when threads from different warps access the shared-memory? So as long as all threads access a different bank it is ok? Even if the threads are from different warps? (I assume only one thread of each warp is accessing the memory and that the threads are synced)

I realize that there will not execute 32 warps at the same time. But I assume 2 or 4 warps may execute at the same time. Will the memory access of those be combined into one access in that case?

No, in cuda, memory accesses from different warps are never combined into a single request, transaction, or wavefront.

And can they access the shared-memory at the same time? Or do they have to time-share an access port?

what do you mean by “at the same time”? In the same clock cycle? No, two requests, from two different warps, cannot both be satisfied in the same clock cycle, AFAIK. It would take at least 2 clock cycles. One to service the request from the first warp, one to service the request form the 2nd warp.

There are also aspects like latency which I have not discussed here.

Cuda SMs comprise 4 Partitions. Those have independent arithmetic units=cores.
Warps running on a SM are each assigned to a specific partition.
Each partition can schedule an instruction every clock cycle.

Shared memory is accessed over the LSU. It is a single resource shared by all 4 partitions.
Typical speeds from the last few architecture generations are mostly 1 (up to 2 cycles) per access / SM. There is no time-sharing (not multiple ports for each SM partition).

See also: How does the LSU (Load/Store Unit) execute Load/Store instructions in the Ampere architecture? - #8 by Greg