CNMeM limitations when using cuDNN

Hi there!

I am doing a benchmark regarding the gpu-based execution of Theano (0.8.2) and Keras (1.0.5) on Windows 10. I am comparing the training time of a CNN over the MNIST dataset in 3 scenarios: CPU, GPU without cuDNN and GPU with CUDNN. CUDA v7.5 and cuDNN v5.0

First two approaches go well, but I have gone into troubles with the sysenv variables configuration for the GPU+cuDNN trial. My error source is the CNMeM configuration. If I run the same python code with CNMeM values from 0 (disbaled) up to 0.75, I have no problems. Moreover, I obtain a drop in the training time of more than 50% compared to the general GPU execution.

But I cannot go behind from 0.8. Wit 0.8, I receive the following compilation error

Traceback (most recent call last):
File “C:\eclipse\workspace\01_MachineLearning_Sparring_project\src\01_DeepLearningCrashCoruses\”, line 79, in
verbose=0, validation_data=(X_test, Y_test))
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\”, line 413, in fit
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\engine\”, line 1026, in fit
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\engine\”, line 695, in _make_test_function
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\backend\”, line 541, in function
return Function(inputs, outputs, updates=updates, **kwargs)
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\backend\”, line 527, in init
File “C:\Anaconda2\lib\site-packages\theano\compile\”, line 320, in function
File “C:\Anaconda2\lib\site-packages\theano\compile\”, line 479, in pfunc
File “C:\Anaconda2\lib\site-packages\theano\compile\”, line 1777, in orig_function
File “C:\Anaconda2\lib\site-packages\theano\compile\”, line 1641, in create
input_storage=input_storage_lists, storage_map=storage_map)
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 690, in make_thunk
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 1003, in make_all
File "C:\Anaconda2\lib\site-packages\theano\sandbox\cuda_init
.py", line 256, in make_thunk
compute_map, no_recycling)
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 970, in make_thunk
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 879, in make_c_thunk
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 1200, in make_thunk
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 1143, in compile
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 1595, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 1101, in module_from_key
module = self._get_from_key(key)
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 1000, in _get_from_key
return self._get_module(name)
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 674, in _get_module
self.module_from_name[name] = dlimport(name)
File “C:\Anaconda2\lib\site-packages\theano\gof\”, line 299, in dlimport
rval = import(module_name, {}, {}, [module_name])
RuntimeError: (‘The following error happened while compiling the node’, GpuDnnConv{algo=‘small’, inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode=‘valid’, subsample=(1, 1), conv_mode=‘conv’, precision=‘float32’}.0, Constant{1.0}, Constant{0.0}), ‘\n’, ‘could not create cuDNN handle: CUDNN_STATUS_INTERNAL_ERROR’)

Could somebody put a little bit of light on this? Is it a limitation of my GPU? or an cuDNN internal error?

Thanks a lot in advanced