I am disappointed that on this forum no one knows PTX.
I have posted same questions and no one answered…
For ex. :
Do all the forum there are no people who would be understood in a much Cuda driver api ???

Your conclusion is incorrect.

To increase the likelihood of your question being answered, try to make it clearer what you are asking. E.g., when you write of a 10× difference in speed, care to say what setups you are comparing.

Also keep in mind that no one here has an obligation to answer. So if there is a question that doesn’t seem to make much sense on a quick inspection, it is not guaranteed that someone will dig deeper.

I’m not saying that anybody has an obligation to answer me.
I just say that nobody don`t knows here ptx, because nobody help.

Oh, people here do know PTX. I’ve been helped and inline PTX has been posted as well. If no one helped you it may be because they simply weren’t online or weren’t interested in helping you.

Now, to answer your real question, random accesses in CUDA are incredibly slow regardless of anything else. If you know PTX, I’m assuming you can figure out what memory coalescence is and how writing non-coalescing can be slow and what load efficiency is.


But to me this whole thread is a symptom of a larger problem, there aren’t enough people posting here! We need GPGPU to be become more popular! More people need to be using CUDA. OpenCL is garbo-poopoo. CUDA is statically-compiled C++. It’s beautiful, it’s awesome. It needs to be used more. There’s 8 GB GPUs now. I’ve had to switch to use using the long long int type in my code solely because GPUs actually have enough memory now.

this code WORK at WIN7 and LINUX perfect! Problem at WIN10 only.