I want to ask a simple question about which I am confused. I am writing a program which takes characters as input(length may be 100000) and makes a matrix of 8x8 for each character. It requires 64x4x100000 bytes = 25 MB(approax) memory space and also this amount of data is transferred from host to device. But if I transfer character sequence to GPU and make matrix of desired size there and perform calculation on matrices, I need much less memory space but I am confused about storage of new matrix created in kernel because I cannot allocate dynamic memory to matrix. Any suggestions about this???