nvComp: nvcompBatchedZstdDecompressAsync fails if cudaDeviceReset() called anytime after its first run

Pretty easy bug to reproduce: If you call nvcompBatchedZstdDecompressAsync once (successfully), and then call it again anytime after calling cudaDeviceReset(), it will begin to return invalid arg errors (errorcode 10). This will happen even if you ensure all pointers passed to the function are malloc’d only after the call to cudaDeviceReset().

Presumably nvcompBatchedZstdDecompressAsync performs some kind of internal malloc on a static var invisible to the user, which is invalidated by the call to cudaDeviceReset().

Can repro by taking the zstd_cpu_compression.cu example, and placing run_example in a loop, with a call to cudaDeviceReset() immediately prior to it. First loop will succeed, second will not:

for (int q = 0; q < 10; q++)
{
cudaDeviceReset();
run_example(data, compression_level, warmup_iteration_count, total_iteration_count); //fails when q==1
}

Also worth noting that setting the logging env variables outlined in the docs have no effect and produce no output.

1 Like