Cuda encrypted filesystem

I would like to use the parallelism of gpu computing to encrypt in a transparent way data on filesystem on the fly.
The architecture should be like the one showed in the attachment.
The data should be encrypted with simmetric encryption obviously (e.g. AES).

Does anyone knows if there are similiar project? I wasn’t able to find nothing similar.
Maybe that an approach like this will waste too much time to copy data into/from the device and will take no advantages from the gpu utilization?

Any suggestion will be appreciated. ;)
CUEFS.pdf (77.6 KB)

If you are going to try it, the Linux FUSE API is probably the way to do it (there is also an OS X port). At least that way you can keep the implementation in user space.

using fuse probably i can’t share buffer beetwen user and kernel space, so i should pay another mem2mem copy (to transfer data read from disk from kernel space to user space).

This should have an heavy impact on performance.

That may be true, but I don’t think anyone has ever executed CUDA code from kernel space. NVIDIA certainly does not support it.

Not sure if I should like this idea. If your CUDA context gets corrupted (e.g. due to driver issues), the user would immediately lose access to the encrypted partition. A fallback to CPU would be a must-have IMHO ;)

It won’t for at least two reasons:

First, problem with disk encryption is not throughtput but latency, and I don’t believe GPUs can provede better latency. OTOH, current CPUs are powerful enough to encryptd/decrypt data much faster than current HDDs can read/write.

Second, not all encryption modes can be run efficiently in parallel. For example, encryption in CBC mode cannot be parallelized (but decryption can).

I think that you can experience latency problem only if you are working with small files. What happen if you work with huge files?

Are you sure that we are IO limited? Try to copy a file of size 100MB and than try to encrypt the same file with for example AES-256,

you’ll see very different times.

Moreover if we have a more powerful device, than we can apply more encryption algorithm to the same data.

there are some methods to encrypt/decrypt block cipher in parallel.

Any chances of deadlock?? Display waiting for FS and FS waiting for display… ALthough, it sounds remote… A similar mutual dependence might come on the way… just be aware. Good luck.

Latency is introduced not a file-level but at i/o operation level. When you request to read (and decrypt) some data, first block (usually 512 bytes) is read and decrypted. Decryption can be overlapped with reading next block, but time to serve first block to client is increased by time needed to decrypt it. If you read sufficiently large block of data then this is not an issue. But in reality such continious reading is rare. Most files are small or fragmented meaning that you need more than one read to get their content, and you will face same latency penalty at each read.

Try to measure what will be host ↔ CUDA device bandwidth when using 512-byte chunks (typical for storage devices); I suspect it will be far from optimal.

The solution might be a CPU+GPU operation when CPU serves small i/o request (which are the majority) and GPU server only large reads/writes…

This only shows that particular implementation is poor; just google for AES benchmarks and you’ll see that 50+ MB/sec per core is not a problem today.

BTW, I’ve been using whole-disk encryption for years and haven’t noticed any significant performance penalties.

I’ll just remind you that AES-256 is approved for TOP SECRET data. Cascading ciphers may seem like a good idea, but it does not add practical security (it adds overhead and protects only from case when outer cipher is broken completely which is very unlikely…).

Filesystem encryption using CUDA is certainly possible, but I’m just trying to tell you that it is very very likely that it will not be better(=faster) than existing CPU-based solutions due to the problems mentioned before (also note that you can’t use CUDA from kernel space which adds another layer of overhead).

IMO CUDA is perfect for things like initial encryption of HDD, but it is not very good for typical daily work with filesystem.