I must say first that I’m kind of a newbie when it comes to hardware so I hope I am not gonna sound stupid here.
Anyway, I have a library able to run either on a GPU or on a CPU (if wanted of if no GPU device can be found). I am working on Image Processing and my library is GDAL dependent when it comes to read/write data from/to the hard disk.
I have placed some timers on my code and the results is kind of suprising. Compiling, and executing on the CPU I can see that data processing last for 80% of the executable. Reading/Writing data with GDAL sounds OK.
However, when I run my program on my GPU, not only I see huge acceleration for data processing, but I can also notice a weird behavior from GDAL. The time needed to read and to write data from/to the hard disk has dramatically increased (and is not stable from one run to another) reducing my overall gain so that it doesn’t sound so interesting after all (I’m not even using asynchronous called so that can’t be the reason why, plus I’ve add several cudaThreadSync() just in case^^)!
I have tried on a other Computer I’ve been able to use for a cupple of hours and have not seen anything like that happening. This is why I presume whatever is causing this strange behavior is not in the code itself.
Does anyone have ever experienced something similar?
If you have any idea of what could be causing the problem (I would have tried to compare hardware from these two computers but I did not have time to do so)? If yes, please remember I am a newbie when it comes to hardware and that includes all linux commands that could give me needed information^^
I could imagine this being an artifact of readahead done by the kernel. If time is spent processing data while you read sequentially through a file, the kernel will put the idle time of the hard disk to good use by reading the data you are likely to request soon ahead of time. So by the time you ask for the data, it will already be in memory, or at least on its way there.
The same applies to writing, if you write enough data so that it cannot be buffered entirely in RAM, or if you fsync() the data to disk before stopping the timer.
I am not sure I follow you… I am pretty sure everything is done sequentially, it is basicly about reading a large amount of data from the hard disk with GDAL. Send it to gpu. Process it. Send it back. And then write it back to the hard disk with an fprintf() done by GDAL… before do it all over again until the whole Image has been processed.
Actually, I am afraid my problem does not belong here as I launched it again runnning on CPU, and I’ve seen same weird facts. From a run to an other time spent reading or writing is likely to be multiplied/divided by 7 while my process is the only one running. These times are really not stable and far too long sometimes. I’m thinking the hard disk might be too fragmented, and it might be difficult to find enough space on the hard disk to write data on it, which could explain while writing on the hard disk take so long. But as for the reading part, I can’t see why it could take so long sometimes.
Anyway, I should have noticed that before but the run on the CPU is very very long! I did not have enough results to notice that even running on CPU only makes it weird. That’s why I think I shouldn’t have post on an nvidia forum obviously! I’m sorry for that!
However I’m still hoping I can find an explanation! But time is not spent processing data while reading from the hard disk. As for the RAM part, there’s no swapping and having done the maths the process cannot run out of RAM. If you’re suggesting I could use async calls to hide this part of computing then yes I presume it could be done but it would not explain why the reading/writing time is not stable (huge difference) from a run to another which makes it incredibly long sometimes.