In my process if I set envvar CUDA_LAUNCH_BLOCKING to 1 before cuInit(), and afterwards I set CUDA_LAUNCH_BLOCKING=0.
Will asynchronous kernel launches be enabled or not? Because from the timeline it still seems that asynchronous kernel launches are disabled.

These environment variables are sampled at CUDA initialization time, and changes to them after that point have no effect. The same is true for CUDA_VISIBLE_DEVICES for example.

