Zig Zag Scanning in CUDA

Hello Everyone

I am working on Implementation of 8*8 block Zig Zag Scanning for JPEG image compression system. Please advise me if its feasible to write it in CUDA and if anyone have some ideas about how to write a kernel for same.

Thanks in advance