Can I use MIMT(multiple instruction, multiple data) execution model in CUDA like as CPU?
(for handling branch divergence effectively)
I’m porting my code with OpenACC.
While working, I found my code makes branch divergence and it occurs ‘inactive’ thread status so many times.
(almost of running time. 80% over of total running time)
But I can’t fix my whole code. So I want to dealing branch divergence effectively.
I heard CUDA execution model is SIMT with 32 warps. In this architecture, braach diverged thread blocks other threads
until the diverged instruction execution complete.
So I think I have to run GPU as MIMD execution model which like thread computing on CPU. (And I’m finding how to)
I had found some documents, but there is some contents.
From ‘CUDA Programming Guide’
A multiprocessor is designed to execute hundreds of threads concurrently. To manage such a large amount of threads, it employs a unique architecture called SIMT (Single-Instruction, Multiple-Thread) that is described in SIMT Architecture. The instructions are pipelined to leverage instruction-level parallelism within a single thread, as well as thread-level parallelism extensively through simultaneous hardware multithreading as detailed in Hardware Multithreading. Unlike CPU cores they are issued in order however and there is no branch prediction and no speculative execution.
But from ‘GPUGems2’(a book)
The latest GPUs, such as the NVIDIA GeForce 6 Series, have similar branch instructions, though their performance characteristics are slightly different. Older GPUs do not have native branching of this form, so other strategies are necessary to emulate these operations.
The two most common control mechanisms in parallel architectures are single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD). All processors in a SIMD-parallel architecture execute the same instruction at the same time; in a MIMD-parallel architecture, different processors may simultaneously execute different instructions. There are three current methods used by GPUs to implement branching: MIMD branching, SIMD branching, and condition codes.
MIMD branching is the ideal case, in which different processors can take different data-dependent branches without penalty, much like a CPU. The NVIDIA GeForce 6 Series supports MIMD branching in its vertex processors.
What is true?
Can I use MIMD exec model from NVIDIA GPU with CUDA?
If OK, how can I use it?
I really thanks for your all reply. :)
– I’M WORKING ON –
Ubuntu 14.04 LTS
NVIDIA GeForce 960
compiler : PGI 15.7 (for OpenACC)