check failed : cudaMalloc() check failure stack trace : *** aborted(core dumped)

OS:ubuntu 16.04.4
OS type: 64-bit
GPU:GeForce GTX 750/PCIe/SSE2
Nvidia driver version:384.130
CUDA version:8.0
CUDNN version:7.0.5
TensorRT version:3.0.4

when I try to usr sample_mnist to complete myself code ,I get the same error all the time whenever I change the cudaMalloc parameter several time.

The error is:

Check failed: cudaMalloc(&buffers[inputIndex], 3*1 * INPUT_H * INPUT_W * sizeof(float))
*** Check failure stack trace: ***
Aborted (core dumped)

sampleSPHERE.cpp (7.09 KB)
prototxt.txt (14.1 KB)

I met this problem also, have you solved it already? if so , it will be nice to tell me the method.
thanks a lot.


can you please clarify what you meant by “get the same error all the time whenever I change the cudaMalloc parameter several time.”?

platform:Jetson Xavier with Jetpack 4.1.1
I means that I also met the problem about cudaMalloc failed when I run the inference, so that I want to know the factors which will cause cudaMalloc to fail, and are there any restrictions on calling this function ?
Thank for your reply.
the errors:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0125 09:38:37.156844 10949 TensorRtCaffeModel.cpp:157] Check failed: cudaMalloc(&buffers[inputIndex], inputSize) 
*** Check failure stack trace: ***
Aborted (core dumped)

the code:

void doInference(IExecutionContext& context, float* input, float* output0,int* output1, int batchSize)
   const ICudaEngine& engine = context.getEngine();

   // input and output buffer pointers that we pass to the engine
   assert(engine.getNbBindings() == 3);
   void* buffers[3];

   int inputIndex  = engine.getBindingIndex(INPUT_BLOB_NAME);
   DimsCHW inputDims = static_cast<DimsCHW&&>(engine.getBindingDimensions(inputIndex));
   size_t inputSize = batchSize * inputDims.c() * inputDims.h()*inputDims.w() * sizeof(float);
   // allocate GPU buffers and a stream, inputSize = 3*1024*1024*4
   CHECK(cudaMalloc(&buffers[inputIndex], inputSize)); 
   CHECK(cudaMalloc(&buffers[outputIndex0], outputSize0 ));
   CHECK(cudaMalloc(&buffers[outputIndex1], outputSize1 ));


Hello, please reference cudaMalloc for API limitations and restrictions.

But I see “check failed”. what was the error returned by cudaMalloc()? the core dump maybe a consequence of you continuing to use the faulted handle.