GPU Direct Storage: cuFileWrite concurrently to kernel execution

cuFileWriteAsync() (leveraging streams) is not yet implemented in cuda 11.5

during my evaluation, cuFileWrite() (launched in another thread) is blocked by a (persistent) kernel execution. Is this the expected behavior ?

from the reference guide:
This is a synchronous call and will block until the IO is complete.

does it mean cuFileRead/Write are synchronous with other kernel ?