Parallel machine learning algorithms in CUDA

Are there any resources that describe and explain machine learning algorithms implemented in CUDA C for the purposes of learning?

I’m aware of cuDNN, just that it’s highly optimized code, and it’s not created for readability. I was looking more for a learning resource not really a library to use as a black box.

I think what you are searching for is a good (scientific) book on machine/deep learning. Read that first. I used google and found this link at TECHNION, israel, which also has a recommended reading list:

