May be it is time for some one to write a book about cuda in mathematics and physics.

I have been teaching Phys/Math for the last 15+ years in CS and/or Eng.

At the start, i have found learning cuda is confusing. There is a lot of material

accumulated along the years and Nvidia changed their mind (in the good way) over the years.

From pure C cuda to python cuda. And, as it is well known python is already a very

big community. Morover, at the start, WSL was already confusing. I had to register in the

Windows Insider and the old problem about drivers but then I found learning cuda is

very chaotic. Different approachs and each one claiming, it is the best. Doing it in pure

c is a step backward, no garbage collection no dynamical arrays and all old problems

that were “solved” by python.

Now, I understand Nvidia is investing a lot of money in the AI. Not bad, Tensors and Linear Algebra

are extremely useful everywhere.

I found nvc++ is is a big winner for WSL. To be honest I like to see cython.

How can i mix nvc++ and cython together in an effective way. how to pass nvc++ decorators

to accelerate cython.

Numba is doing good in making cuda-kernels in pythonwhich is very good. Writing big cuda

kernels in c is not effective and debugging is problamatic.

It was impressive to see how it is easy to work with big arrays (bigger than the RAM)

with Dask. I am expecting more form cuDF. It shall be nice to have a hetrogeneous array

defined over the CPU and the GPU. and accelerated sperately over the CPU/GPU and then

fused together using sum/reduce.

Usually i found your mini-courses 4 lectures with sildes(pdf) and notebooks over github is the

best way.

It is extremly usefull for all of us to hear from Nvidia about what is better. After all you are

spending 100% of your time doing cuda so you solved more problems.

But scientific applications is still lacking behind. It shall be nice to see cython or even sage

with cuda. Parallel decorator in sage is usefull and quite effective. I have used with some-gigantic

g=7 riemann surface (RS) calculations. RS is the big sister of sin/cos from simply periodic into

multiply periodic, very extensive calculation and extremely needed. You can not solve the simple

pendulum with sin/cos, you shall need doubly-periodic, not to mention spin top and other

3 and higher dimensionals applications like in solitons.

Now, i am spending my spare time with Dask, cupy, numba, cuDF.

Thank you for all your hardd work and keep going on. We always need to hear

your point of view and your prespective for better future of cuda.