Dear All
It’s my first time using CUDA,
I got a transfer rate from host to device about 5.5G/s using pinned memory.
Calculation in my work is very fast , the transfer time cost almost >95%,
in this situation , “stream” can not help me more, so I want to improve the transfer time.
Here is my environment :
===Hardware===
Video Card :1080GTX
CPU : Intel(R)Xeon(R)CPU E5-1620 0 @ 3.6GHz
Ram : 96 GB (DDR3)
MotherBoard : X9SRA
===Software===
OS : Win7 x64
IDE : Visual Studio 2010 SP1
CUDA : 8.0
Driver : version 384.76
I Got about 3.5 GB/s(Pageable) and 6 GB/s(pinned) in BandwidthTest sample code, and almost same performance
in CUDA-Z.
I have confirmed that my card is set in PCIe 3.0 x16 slot ,
and there’s only one video card on my motherboard.
And I have read some discussion that mentioned DDR4 memory,
so I did the same test on system of DDR4 memory , but it did’nt make difference.
Here is my question:
What’s the reasonable data transfer rate in my case ?
I’m thinking about > 10GB/s will be fine, am I wrong ?
I have read about 16MB issue in discussion, but I have no idea about this ,
could someone provide detailed explanation ?
Did I miss some important setting like BIOS ?
Here is my pseudo code in test :
int SzImg = 2000*2048;
//host
BYTE *pHostBuffer_PageLocked = NULL;
unsigned int tag = cudaHostAllocWriteCombined;
cudaHostAlloc((void **)&pHostBuffer_PageLocked, SzImg * sizeof(BYTE), tag);
//device
BYTE pDeviceBuffer = NULL;
cudaMalloc((int*)&pDeviceBuffer, SzImg * sizeof(BYTE));
//run
::QueryPerformanceCounter(&llStart_)
cudaMemcpy(pDeviceBuffer, pHostBuffer, SzImg * sizeof(BYTE), cudaMemcpyHostToDevice);
::QueryPerformanceCounter(&llEnd_);
//end–cost about 0.71 ms
#Modify 20170727 11:58 wrong log time
Anything that may help me will be appreciated !
Best Regards,
David