CUDA application for optimizing Bitslice DES

How to use CUDA(GPU) support for parallelizing the Bitslice implementation of DES encryption algorithm.What can be the possible logic for implementation using CUDA.Please help me.

use the search function of the forum. There are already threads about the topic, like this one:

https://devtalk.nvidia.com/default/topic/860120/cuda-programming-and-performance/bitslice-des-optimization/1

the mentioned github account https://github.com/DeepLearningJohnDoe had moved here
https://github.com/meriken/merikens-tripcode-engine-v3 it contains a bitslice DES implementation in CUDA