Which memory type should I use to do pingpong image process?Global,or Surface?

PingPong image process means use previous processing’s output as next processing’s input,the code just like below:

ipKernel1<<<GridSize,BlockSize>>>(d_ptr_i,d_ptr_o);
std::swap(d_ptr_i,d_ptr_o);
ipKernel2<<<GridSize,BlockSize>>>(d_ptr_i,d_ptr_o);
std::swap(d_ptr_i,d_ptr_o);
ipKernel3<<<GridSize,BlockSize>>>(d_ptr_i,d_ptr_o);
std::swap(d_ptr_i,d_ptr_o);
ipKernel4<<<GridSize,BlockSize>>>(d_ptr_i,d_ptr_o);

because texture memory can not be written,so the “d_ptr_i” & “d_ptr_o” must be global memory,or surface memory,so which one should I use? When the texture catch will raise the performance of a image process kernel?