cuDNN failed to launch only the first time

I have a service inside a docker image built-in CUDA 9.0 and cuDNN 7.4.1, as well as mxnet 1.3.0. It was supposed to be deployed on 1080 chip.

Recently, we are testing 2070, when the strange thing happens.

After the service is up, the first request will result in

2019/04/12 17:48:43 rpc error: code = Unknown desc = [09:48:43] src/operator/nn/./cudnn/cudnn_activation-inl.h:129: Check failed: e == CUDNN_STATUS_SUCCESS (8 vs. 0) cuDNN: CUDNN_STATUS_EXECUTION_FAILED

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7fc8573afcab]
[bt] (1) /usr/local/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7fc8573b0818]
[bt] (2) /usr/local/lib/libmxnet.so(mxnet::op::CuDNNActivationOp<float>::Forward(mxnet::OpContext const&, mxnet::TBlob const&, mxnet::OpReqType const&, mxnet::TBlob const&)+0x605) [0x7fc85bfd90b5]
[bt] (3) /usr/local/lib/libmxnet.so(void mxnet::op::ActivationCompute<mshadow::gpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x444) [0x7fc85bfd2444]
[bt] (4) /usr/local/lib/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0x59) [0x7fc859c51719]
[bt] (5) /usr/local/lib/libmxnet.so(+0x3548dd8) [0x7fc859bfcdd8]
[bt] (6) /usr/local/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8e5) [0x7fc85a2ca325]
[bt] (7) /usr/local/lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0xeb) [0x7fc85a2e0c9b]
[bt] (8) /usr/local/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x4e) [0x7fc85a2e0f0e]
[bt] (9) /usr/local/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7fc85a2c992a

However, if I repeat the request, it will succeed.

Such phenomena can be consistently reproduce.

Can anyone explain to me what is going on?