why cufft kernel launch twice for each call?

In the simpleCUFFT project in SDK, there are two FFTs and one IFFT transformation , so cufftExecC2C kernel should be launched three times, but in the profile log, it appears that it launched six times and there are extra memcpyHtoDs as well.

CUDA_PROFILE_LOG_VERSION 1.5

CUDA_DEVICE 0 GeForce GTX 295

TIMESTAMPFACTOR 116ec87d1573cc00

method,gputime,cputime,occupancy,gld_incoherent,gld_coherent
,gst_incoherent,gst_coherent
method=[ memcpyHtoD ] gputime=[ 4.192 ] cputime=[ 2.033 ]
method=[ memcpyHtoD ] gputime=[ 4.256 ] cputime=[ 1.223 ]
method=[ memcpyHtoD ] gputime=[ 4.160 ] cputime=[ 1.808 ]
method=[ _Z23SP_c2c_mradix_r2_kernelifPvS_i11tfStride_stii ] gputime=[ 12.608 ] cputime=[ 35.464 ] occupancy=[ 0.031 ] gld_incoherent=[ 0 ] gld_coherent=[ 6 ] gst_incoherent=[ 0 ] gst_coherent=[ 32 ]
method=[ memcpyHtoD ] gputime=[ 4.064 ] cputime=[ 1.712 ]
method=[ _Z23SP_c2c_mradix_r7_kernelifPvS_i11tfStride_stii ] gputime=[ 17.952 ] cputime=[ 30.234 ] occupancy=[ 0.031 ] gld_incoherent=[ 0 ] gld_coherent=[ 0 ] gst_incoherent=[ 0 ] gst_coherent=[ 0 ]
method=[ memcpyHtoD ] gputime=[ 4.000 ] cputime=[ 1.501 ]
method=[ _Z23SP_c2c_mradix_r2_kernelifPvS_i11tfStride_stii ] gputime=[ 15.936 ] cputime=[ 27.320 ] occupancy=[ 0.031 ] gld_incoherent=[ 0 ] gld_coherent=[ 0 ] gst_incoherent=[ 0 ] gst_coherent=[ 0 ]
method=[ memcpyHtoD ] gputime=[ 4.000 ] cputime=[ 1.462 ]
method=[ _Z23SP_c2c_mradix_r7_kernelifPvS_i11tfStride_stii ] gputime=[ 18.176 ] cputime=[ 31.007 ] occupancy=[ 0.031 ] gld_incoherent=[ 0 ] gld_coherent=[ 0 ] gst_incoherent=[ 0 ] gst_coherent=[ 0 ]
method=[ _Z27ComplexPointwiseMulAndScaleP6float2PKS_if ] gputime=[ 5.888 ] cputime=[ 19.245 ] occupancy=[ 1.000 ] gld_incoherent=[ 0 ] gld_coherent=[ 0 ] gst_incoherent=[ 0 ] gst_coherent=[ 0 ]
method=[ memcpyHtoD ] gputime=[ 4.000 ] cputime=[ 1.350 ]
method=[ _Z23SP_c2c_mradix_r2_kernelifPvS_i11tfStride_stii ] gputime=[ 15.936 ] cputime=[ 28.976 ] occupancy=[ 0.031 ] gld_incoherent=[ 0 ] gld_coherent=[ 0 ] gst_incoherent=[ 0 ] gst_coherent=[ 0 ]
method=[ memcpyHtoD ] gputime=[ 4.000 ] cputime=[ 1.359 ]
method=[ _Z23SP_c2c_mradix_r7_kernelifPvS_i11tfStride_stii ] gputime=[ 18.720 ] cputime=[ 31.763 ] occupancy=[ 0.031 ] gld_incoherent=[ 0 ] gld_coherent=[ 0 ] gst_incoherent=[ 0 ] gst_coherent=[ 0 ]
method=[ memcpyDtoH ] gputime=[ 4.416 ] cputime=[ 25.302 ]

Anyone can tell me why?
Thanks in advance!

Anyone try to profile the SimpleCufft and find this problem?