3D data - any benchmark CUDA routines ?

Are there code one could learn from to benchmark CUDA routines

  • 3D matrix operations (add, mean, etc.)
  • masking / thresholding (branch issue?)
  • convolution of 3D data with 2D kernel

Should one try to use textures or directly code kernels ?

Thank you in advance for your time and help.