Hi,
I am writing a 3D elementwise matrix multiply (and add) kernel, is there any optimization tip I should be awared of except for the basic implementation:
if (idx < nelements) C[idx] += A[idx]*B[idx]
Thanks,
zhmukc
Hi,
I am writing a 3D elementwise matrix multiply (and add) kernel, is there any optimization tip I should be awared of except for the basic implementation:
if (idx < nelements) C[idx] += A[idx]*B[idx]
Thanks,
zhmukc