It’s my first time using CUDA,
I got a transfer rate from host to device about 5.5G/s using pinned memory.
Calculation in my work is very fast , the transfer time cost almost >95%,
in this situation , “stream” can not help me more, so I want to improve the transfer time.
Here is my environment :
Video Card :1080GTX
CPU : Intel®Xeon®CPU E5-1620 0 @ 3.6GHz
Ram : 96 GB (DDR3)
MotherBoard : X9SRA
OS : Win7 x64
IDE : Visual Studio 2010 SP1
CUDA : 8.0
Driver : version 384.76
I Got about 3.5 GB/s(Pageable) and 6 GB/s(pinned) in BandwidthTest sample code, and almost same performance
I have confirmed that my card is set in PCIe 3.0 x16 slot ,
and there’s only one video card on my motherboard.
And I have read some discussion that mentioned DDR4 memory,
so I did the same test on system of DDR4 memory , but it did’nt make difference.
Here is my question:
What’s the reasonable data transfer rate in my case ?
I’m thinking about > 10GB/s will be fine, am I wrong ?
I have read about 16MB issue in discussion, but I have no idea about this ,
could someone provide detailed explanation ?
Did I miss some important setting like BIOS ?
Here is my pseudo code in test :
int SzImg = 2000*2048;
BYTE *pHostBuffer_PageLocked = NULL;
unsigned int tag = cudaHostAllocWriteCombined;
cudaHostAlloc((void **)&pHostBuffer_PageLocked, SzImg * sizeof(BYTE), tag);
BYTE pDeviceBuffer = NULL;
cudaMalloc((int*)&pDeviceBuffer, SzImg * sizeof(BYTE));
cudaMemcpy(pDeviceBuffer, pHostBuffer, SzImg * sizeof(BYTE), cudaMemcpyHostToDevice);
//end–cost about 0.71 ms
#Modify 20170727 11:58 wrong log time
Anything that may help me will be appreciated !