Difference between cudaMemcpy and cudaMemcpyAsync in a thread context

Hi everyone! I implemented code where I read several images in parallel using threads and copy the image data to the GPU with cudaMemcpyAsync, according to the following pseudocode:

        band_threads.emplace_back([&, i]() {
           device_band[i] = read_device_band(); 
           HANDLE_ERROR(cudaMemcpyAsync(device_bands[i], host_bands[i], band_bytes, cudaMemcpyHostToDevice, streams[i]));
         }

        for thread in band_threads {
           thread.start()
        }

Is there any benefit to using cudaMemcpyAsync? Or is it the same as using cudaMemcpy?