Let’s assume you are developing some big and clumsy kernel in CUDA. To find order in the messy code, you decide to draw some diagrams. For single-threaded algorithm, flow charts suffice. There are additional constructs to handle multi-threaded algorithms, but this work well with only few threads (unless I am mistaken). CUDA programs however have thousants of threads, each of them doing usually the same work.
So my question is - is there some well defined, proven useful (and not too big) standard on how to draw such algorithms? What would you recommend? Some links please? :)