CUDA application for optimizing Bitslice DES

How to use CUDA(GPU) support for parallelizing the Bitslice implementation of DES encryption algorithm.What can be the possible logic for implementation using CUDA.Please help me.

use the search function of the forum. There are already threads about the topic, like this one:

the mentioned github account had moved here it contains a bitslice DES implementation in CUDA