Visual Cryptography with shared memory

Hi guys:

I’m considering that if you guys can give some idea how to use shared memory to deal with Visual Cryptography. The ordinary code in the kernel for encryption is:

int ih = blockIdx.y * blockDim.y + threadIdx.y; // index for height of original image in device data array
  int iw = blockIdx.x * blockDim.x + threadIdx.x; // index for width of original image in device data array

  //level share1[2][2];
  //level share2[2][2];

  unsigned char share1[2][2];
  unsigned char share2[2][2];

  // Random number generation block
  float random;
  int track1, track2;
  int count0 = 0;
  int count1 = 0;
  unsigned int seed = (unsigned int)clock64();
  curandState s;
  curand_init(seed, iw, 0, &s);
  random = curand_uniform(&s);

  for (track1 = 0; track1 < 2; track1++) {
	for (track2 = 0; track2 < 2; track2++) {
		if (count1 == 2) {
			share1[track1][track2] = 0;
			count0 = count0 + 1;
		}
		else if (count0 == 2) {
			share1[track1][track2] = 1;
			count1 = count1 + 1;
		}
		else {
			if (random <= 0.5) {
				share1[track1][track2] = 0;
				count0 = count0 + 1;
			}
			else {
				share1[track1][track2] = 1;
				count1 = count1 + 1;
			}
		}
	    random = curand_uniform(&s);
	}
  }
  // Random number generator block end


  
  if (iCodecPath == ENCODE) {
    //Setup loop scan over entire original image array	
    //Scan loop begin
	
	if ((ih < iHeight) && (iw < iWidth)){
		if (pImage_d[ih * iWidth + iw] == BLACK) {
			for (track1 = 0; track1 < 2; track1++) {
				for (track2 = 0; track2 < 2; track2++) {
	
					if (share1[track1][track2] == 1) {
						share2[track1][track2] = 0;
					}
	
					else {
						share2[track1][track2] = 1;
					}
	
					pShare1_d[2*(ih*2 + track1)*iWidth + (iw*2 + track2)] = share1[track1][track2];
					pShare2_d[2*(ih*2 + track1)*iWidth + (iw*2 + track2)] = share2[track1][track2];
					__syncthreads();
				}
			}
		}
	
		else {		
			for (track1 = 0; track1 < 2; track1++) {
				for (track2 = 0; track2 < 2; track2++) {
					share2[track1][track2] = share1[track1][track2];
					pShare1_d[2*(ih*2 + track1)*iWidth + (iw*2 + track2)] = share1[track1][track2];
					pShare2_d[2*(ih*2 + track1)*iWidth + (iw*2 + track2)] = share2[track1][track2];
					__syncthreads();
				}
			}
		}
	}
 }

share1 and share2 are two small size encryption blocks, both of them are 2x2. pShare1_d and pShare2_d are two encrypted images after the encryption. So far I got the result which is slower than the CPU version 10 times! I’m considering that if I can make use of shared memory to get faster results. Any ideas?

Thank you guys!

do the encryption blocks remain the same across the images - are the same encryption blocks used by all threads to encrypt the images?

what grid/ block dimensions do you use?
how big is an image?

Sry, I forgot to provide these information. Yes all the threads and blocks used the same encryption approach. I specified the block size in 16x16, grid size for share1.width / blockSizeX.

The image is 2048x4096 size.

i think, the section:
// Random number generation block

is probably the only part you can push into shared memory - share across threads - then, if i understand the above correctly
i do not see how section B: the act of actually using the encryption key to encrypt the image can in any way utilize shared memory to speed it up, as its calculation is rather thread-local

on pushing the “// Random number generation block” into shared memory, the obstacle would be the fact that you are using multiple blocks, given your grid dimension, and would thus require block synchronization, otherwise the encryption key would differ from block to block
you can either use dynamic parallelism to accomplish this, or utilize the host to either completely calculate the encryption key, and forward it via global memory, or calculate at least its inputs, such that all blocks accept constant inputs and thus calculate the same encryption key
a single thread within each block would then calculate the encryption key and push it into shared memory

Thanks man, I will try it.