Hello,
I am very new to CUDA programming, although I have some experience with parallel programming finding 56-bit DES crypto keys, and simple things like that.
I have some good books and a reasonable understanding of host and device memory, the usual beginner programs like vector addition, cudaMalloc, cudaFree, cudaMemcpy, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, sync threads, cudaDeviceSynchronize(), but just the bare basics.
I’m trying to wrap my head around the kernel()<<<X, Y>>> parts now, for later, then figuring out how I can just get the compute capability of a GTX 1060 and use safe values to start with. Then dynamically query all cards later on and compute on the fly what I can do with kernel<<<X, Y>>>(), e.g.
int blockSize = 256;
int numBlocks = (N + blockSize - 1) / blockSize;
add<<<numBlocks, blockSize>>>(N, x, y);
But first I will try to keep it simple with kernelfunc<<<1, 256>>> or something like that, or just <<<1, 1>>> in the early stages of testing in case I run into issues with any race conditions and read/write same memory locations.
Back to the main topic, to practice with I like to create cryptocurrency brainwallets. I find my books have the most boring and obscure topics, so I am not learning anything, and getting stuck. They usually center around graphics and mathematical topics, which I don’t understand or find interesting. It’s not their fault of course, I just need to find topics of interest to me so I can move on.
The first task for me is to find a sha256 kernel and ripemd160 kernel (maybe translate some C code and create them, but I don’t know where to find the specs of these hash functions) that I can test with, and verify the hash outputs are correct.
I had in mind something like this, as easy to use as the OpenSSL functions (or as close as possible!):
//
// Maybe later a vector<std::string> passphrases
//
// unsigned char tmp_hash[32];
//
// Pass in size of vector and try to unroll that loop
// for (size_t i = 0; i < vec_size; i++)
//
// Do the SHA256 on each
// SHA256(vec[i], tmp_hash)
// memcpy(tmp_hash, out_hash_vec[i], 32);
//
// Stash them back in an array of unsigned char* with 32 bytes for the hash
//
__global__ void brainwallet_keys(const unsigned char *in_passphrase, const size_t length, unsigned char *out_hash)
{
SHA256_CTX *ctx;
SHA256_Init(&ctx);
SHA256_Update(&ctx, in_passphrase, length);
SHA256_Final(out_hash, &ctx);
}
int main()
{
std::cout << "Enter brainwallet passphrase> ";
std::string s;
std::getline(std::cin, s);
const size_t pass_size = s.length();
// password for host to give to device
// host hash device gives to us
unsigned char host_password[pass_size + 1];
unsigned char host_hash[SHA256_DIGEST_LENGTH];
// password for device, device memory for hash output
unsigned char *dev_password, *dev_hash;
memset(host_password, 0, pass_size + 1);
memcpy(host_password, s.c_str(), pass_size);
// alloc memory for device passowrd, copy it from host to device memory
cudaMalloc( (unsigned char**)&dev_password, pass_size );
cudaMemcpy( dev_password, host_password, pass_size, cudaMemcpyHostToDevice );
// alloc memory for the output hash
// no copy across because it's output only?
cudaMalloc( (unsigned char**)&dev_hash, SHA256_DIGEST_SIZE );
brainwallet_keys<<<1, 1>>>(dev_password, pass_size, dev_hash);
// do I need this here?
cudaDeviceSyncrhonize();
cudaMemcpy(host_hash, dev_hash, SHA256_DIGEST_SIZE, cudaMemcpyDeviceToHost);
// Deallocate memory, host hash safely stashed away
cudaFree(dev_password);
cudaFree(dev_hash);
cudaDeviceReset();
// Print out hash in hex from host_hash array
...
return 0;
}
Sorry this is so ugly and hackish, I’m trying to illustrate what I am trying to achieve first before I get on with more advanced topics and finishing the brainwallets, printing the uncompressed and compressed pubkey variants, their 1Btc addresses, and WIF keys. I will get that later, but first hash routines, whether I have basically got this right allocating, copy and back and forth, sync, free memory, device reset.
Thank you for any help!