Are there benefits to copy values to registers from SMEM before doing reduction?


I am looking at the code of CUB, specifically block_scan_raking.cuh.
From the following function for Upsweep, it looks like the values in shared memory are copied to registers of a thread before reduction is done. Is there a benefit to this as opposed to just reading from shared memory and add that to a thread register?

/// Performs upsweep raking reduction, returning the aggregate
template <typename ScanOp>
__device__ __forceinline__ T Upsweep(
    ScanOp scan_op)
    T *smem_raking_ptr = BlockRakingLayout::RakingPtr(temp_storage.raking_grid, linear_tid);

    // Read data into registers
    CopySegment(cached_segment, smem_raking_ptr, Int2Type<0>());

    T raking_partial = cached_segment[0];

    return GuardedReduce(cached_segment, scan_op, raking_partial, Int2Type<1>());