CNMeM limitations when using cuDNN

Hi there!

I am doing a benchmark regarding the gpu-based execution of Theano (0.8.2) and Keras (1.0.5) on Windows 10. I am comparing the training time of a CNN over the MNIST dataset in 3 scenarios: CPU, GPU without cuDNN and GPU with CUDNN. CUDA v7.5 and cuDNN v5.0

First two approaches go well, but I have gone into troubles with the sysenv variables configuration for the GPU+cuDNN trial. My error source is the CNMeM configuration. If I run the same python code with CNMeM values from 0 (disbaled) up to 0.75, I have no problems. Moreover, I obtain a drop in the training time of more than 50% compared to the general GPU execution.

But I cannot go behind from 0.8. Wit 0.8, I receive the following compilation error

Traceback (most recent call last):
File “C:\eclipse\workspace\01_MachineLearning_Sparring_project\src\01_DeepLearningCrashCoruses\02_Keras_test.py”, line 79, in
verbose=0, validation_data=(X_test, Y_test))
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\models.py”, line 413, in fit
sample_weight=sample_weight)
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\engine\training.py”, line 1026, in fit
self._make_test_function()
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\engine\training.py”, line 695, in _make_test_function
**self.function_kwargs)
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\backend\theano_backend.py”, line 541, in function
return Function(inputs, outputs, updates=updates, **kwargs)
File “C:\Anaconda2\lib\site-packages\keras-1.0.5-py2.7.egg\keras\backend\theano_backend.py”, line 527, in init
**kwargs)
File “C:\Anaconda2\lib\site-packages\theano\compile\function.py”, line 320, in function
output_keys=output_keys)
File “C:\Anaconda2\lib\site-packages\theano\compile\pfunc.py”, line 479, in pfunc
output_keys=output_keys)
File “C:\Anaconda2\lib\site-packages\theano\compile\function_module.py”, line 1777, in orig_function
defaults)
File “C:\Anaconda2\lib\site-packages\theano\compile\function_module.py”, line 1641, in create
input_storage=input_storage_lists, storage_map=storage_map)
File “C:\Anaconda2\lib\site-packages\theano\gof\link.py”, line 690, in make_thunk
storage_map=storage_map)[:3]
File “C:\Anaconda2\lib\site-packages\theano\gof\vm.py”, line 1003, in make_all
no_recycling))
File "C:\Anaconda2\lib\site-packages\theano\sandbox\cuda_init
.py", line 256, in make_thunk
compute_map, no_recycling)
File “C:\Anaconda2\lib\site-packages\theano\gof\op.py”, line 970, in make_thunk
no_recycling)
File “C:\Anaconda2\lib\site-packages\theano\gof\op.py”, line 879, in make_c_thunk
output_storage=node_output_storage)
File “C:\Anaconda2\lib\site-packages\theano\gof\cc.py”, line 1200, in make_thunk
keep_lock=keep_lock)
File “C:\Anaconda2\lib\site-packages\theano\gof\cc.py”, line 1143, in compile
keep_lock=keep_lock)
File “C:\Anaconda2\lib\site-packages\theano\gof\cc.py”, line 1595, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File “C:\Anaconda2\lib\site-packages\theano\gof\cmodule.py”, line 1101, in module_from_key
module = self._get_from_key(key)
File “C:\Anaconda2\lib\site-packages\theano\gof\cmodule.py”, line 1000, in _get_from_key
return self._get_module(name)
File “C:\Anaconda2\lib\site-packages\theano\gof\cmodule.py”, line 674, in _get_module
self.module_from_name[name] = dlimport(name)
File “C:\Anaconda2\lib\site-packages\theano\gof\cmodule.py”, line 299, in dlimport
rval = import(module_name, {}, {}, [module_name])
RuntimeError: (‘The following error happened while compiling the node’, GpuDnnConv{algo=‘small’, inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode=‘valid’, subsample=(1, 1), conv_mode=‘conv’, precision=‘float32’}.0, Constant{1.0}, Constant{0.0}), ‘\n’, ‘could not create cuDNN handle: CUDNN_STATUS_INTERNAL_ERROR’)

Could somebody put a little bit of light on this? Is it a limitation of my GPU? or an cuDNN internal error?

Thanks a lot in advanced

Borja