I can reproduce the observation and at the moment I am unable to explain it. If you would like a workaround, try this:
*signal = 1;
OneTimeKernel<<<1, 1, 0, stream>>>(-1); // ADD THIS LINE
PersistentKernel<<<1, 1, 0, stream_background>>>(signal);
int n = 100;
for (int i = 0; i < n; i++) {
OneTimeKernel<<<1, 1, 0, stream>>>(i);
}
I understand that work-around may not be acceptable for you, and at the moment I repeat that I am unable to explain this observation. You may wish to file a bug.