Performance of small code project

Hello, I just got my first actual project working:). However I’m not sure how efficient it is could someone point me towards some topics with which I can improve my code. My code is at https://github.com/stefanberg96/Magic-squares . Already thanks in advance.

When I run your code I get kernel errors.

GPU memory usage: used = 303.937500, free = 11886.000000 MB, total = 12189.937500 MB, malloc limit = 488.312500 MB
magic square {1,3,8,5,7,0,6,2,4}
magic square {0,8,4,5,1,6,7,3,2}
magic square {0,7,5,8,3,1,4,2,6}
magic square {1,6,5,8,4,0,3,2,7}
magic square {0,8,4,7,3,2,5,1,6}
magic square {1,5,6,3,7,2,8,0,4}
magic square {0,5,7,8,1,3,4,6,2}
magic square {1,8,3,6,4,2,5,0,7}
magic square {2,3,7,6,1,5,4,8,0}
magic square {2,6,4,3,1,8,7,5,0}
magic square {2,6,4,7,5,0,3,1,8}
magic square {2,7,3,6,5,1,4,0,8}
magic square {3,1,8,7,5,0,2,6,4}
magic square {3,2,7,8,4,0,1,6,5}
magic square {4,2,6,0,7,5,8,3,1}
magic square {4,0,8,2,7,3,6,5,1}
magic square {4,0,8,6,5,1,2,7,3}
magic square {4,2,6,8,3,1,0,7,5}
magic square {3,8,1,2,4,6,7,0,5}
magic square {3,7,2,1,5,6,8,0,4}
magic square {4,6,2,8,1,3,0,5,7}
magic square {4,8,0,2,3,7,6,1,5}
magic square {4,6,2,0,5,7,8,1,3}
magic square {4,8,0,6,1,5,2,3,7}
magic square {5,6,1,0,4,8,7,2,3}
magic square {6,1,5,2,3,7,4,8,0}
magic square {5,7,0,1,3,8,6,2,4}
magic square {5,1,6,7,3,2,0,8,4}
magic square {6,2,4,1,3,8,5,7,0}
magic square {5,0,7,6,4,2,1,8,3}
magic square {6,5,1,2,7,3,4,0,8}
magic square {6,2,4,5,7,0,1,3,8}
magic square {7,5,0,3,1,8,2,6,4}
magic square {7,0,5,2,4,6,3,8,1}
magic square {7,2,3,0,4,8,5,6,1}
magic square {7,3,2,5,1,6,0,8,4}
magic square {8,3,1,0,7,5,4,2,6}
magic square {8,0,4,1,5,6,3,7,2}
magic square {8,0,4,3,7,2,1,5,6}
magic square {8,1,3,0,5,7,4,6,2}
magic square {1,3,8,5,7,0,6,2,4}
magic square {0,8,4,5,1,6,7,3,2}
magic square {0,7,5,8,3,1,4,2,6}
magic square {1,6,5,8,4,0,3,2,7}
magic square {0,8,4,7,3,2,5,1,6}
magic square {1,5,6,3,7,2,8,0,4}
magic square {0,5,7,8,1,3,4,6,2}
magic square {1,8,3,6,4,2,5,0,7}
magic square {2,3,7,6,1,5,4,8,0}
magic square {2,6,4,3,1,8,7,5,0}
magic square {2,6,4,7,5,0,3,1,8}
magic square {2,7,3,6,5,1,4,0,8}
magic square {3,1,8,7,5,0,2,6,4}
magic square {3,2,7,8,4,0,1,6,5}
magic square {4,2,6,0,7,5,8,3,1}
magic square {4,0,8,2,7,3,6,5,1}
magic square {4,0,8,6,5,1,2,7,3}
magic square {4,2,6,8,3,1,0,7,5}
magic square {3,8,1,2,4,6,7,0,5}
magic square {3,7,2,1,5,6,8,0,4}
magic square {4,6,2,8,1,3,0,5,7}
magic square {4,8,0,2,3,7,6,1,5}
magic square {4,6,2,0,5,7,8,1,3}
magic square {4,8,0,6,1,5,2,3,7}
magic square {5,6,1,0,4,8,7,2,3}
magic square {6,1,5,2,3,7,4,8,0}
magic square {5,7,0,1,3,8,6,2,4}
magic square {5,1,6,7,3,2,0,8,4}
magic square {6,2,4,1,3,8,5,7,0}
magic square {5,0,7,6,4,2,1,8,3}
magic square {6,5,1,2,7,3,4,0,8}
magic square {6,2,4,5,7,0,1,3,8}
magic square {7,5,0,3,1,8,2,6,4}
magic square {7,0,5,2,4,6,3,8,1}
magic square {7,2,3,0,4,8,5,6,1}
magic square {7,3,2,5,1,6,0,8,4}
magic square {8,3,1,0,7,5,4,2,6}
magic square {8,0,4,1,5,6,3,7,2}
magic square {8,0,4,3,7,2,1,5,6}
magic square {8,1,3,0,5,7,4,6,2}
cudaDeviceSynchronize returned error code 77 after launching addKernel!

If I run it with cuda-memcheck, it reports an invalid global read error at this line:

https://github.com/stefanberg96/Magic-squares/blob/master/ParkerSquare2/kernel.cu#L79