tips for optimising my neural net kernel

I moved the thread here.