An Easy Introduction to CUDA C and C++

That explains everything... Thank you very much!

I am a complete beginner in cuda and have somewhat knowledge of c++, how can i get started on cuda

I just stated CUDA programming today. After basics, I was playing with around with this program (and a similar one I wrote). I ran the program on larger numbers. It worked perfectly well up to 2^23. But with 2^24 it got all the answers wrong (I changed it count number of wrong answers). I noticed that with (2^24+255)/256 was going out of range for unsigned 16 bit integer. I changed number of threads to 512 and it gave correct answers. I tried 2^25 and 1024, still correct. But 2^26 and 2048 gave all wrong answers. I tried 4096 still all wrong. So are there some kind of limits to the number of blocks an threads ? Or it has to do something with memory ? I am running on 8 gb GTX 1070.

Highly recommend you jump to this new introductory post I just published last week: https://devblogs.nvidia.com...

Yes, there are limit: as you guessed the maximum grid size is 65535 in each dimension. You can query these limits using the device properties API. If you write your kernels using grid-stride loops as in the example above you can overcome this issue with only a change to your launch to clamp the grid size to maximum 65535: e.g. min(65535, (N+255)/256).

Maximum block size is 1024 threads, I believe.

Thanks, Mark. I will surely check the new post. Block size is indeed 1024 thread for modern GPUs, I checked.

Hello! Nice tutorial. I want to build a web page that can utilize Cuda in optimizing the calculation and maybe simulation. How to deal with it? Javascript+Html+C+Cuda? I have no idea even whether it is possible or not. Please give me some references.

You need to keep in mind that CUDA is meant to work with GPU a server is unlikely to have.
But, if you server has a GPU, you could make a basic webpage with input form, get all that data, and then run your CUDA program on a server using PHP program exec().

As long as I understand your question, this won't be possible as you describe it. This is easily deducible from how websites work.

Javascript/ Html gets rendered on the client side (on the computer that opens the website). Since you can't know the client's architecture and wether she even uses a Nvidia GPU, it is impossible to supply the correct binaries for each and every possible system.

Server-side however is easy (speaking of php/ aspx or web services/ api calls). Just develop your Cuda Code in C/ C++, provide a web-api to make callbacks to via javascript (for example https://github.com/Microsof... ) and run it on your server.

Any computer can be a server, mate.