Parallel Computing question2

In this problem, we will compare the performance of a vector processor with a hybrid
system th
at contains a scalar processor and a GPU

based coprocessor. In the hybrid system, the host
processor has superior scalar performance to the GPU, so in this case all scalar code is executed on the
host processor while all vector code is executed on the GPU.
We will refer to the first system as the
vector computer and the second system as the hybrid computer. Assume that your target application
contains a vector kernel with an arithmetic intensity of 0.5 FLOPs per DRAM byte accessed; however,
the application
also has a scalar component which that must be performed before and after the kernel in
order to prepare the input vectors and output vectors, respectively. For a sample dataset, the scalar portion
of the code requires 400 ms of execution time on both the
vector processor and the host processor in the
hybrid system. The kernel reads input vectors consisting of 200 MB of data and has output data consisting
of 100 MB of data. The vector processor has a peak memory bandwidth of 30 GB/sec and the GPU has a
memory bandwidth of 150 GB/sec. The hybrid system has an additional overhead that requires all
input vectors to be transferred between the host memory and GPU local memory before and after the
kernel is invoked. The hybrid system has a direct memory acces
s (DMA) bandwidth of 10 GB/sec and an
average latency of 10 ms. Assume that both the vector processor and GPU are performance bound by
memory bandwidth. Compute the execution time required by both computers for this application.

