Lazy or stenciled convolution

Hi, is there a way with cuDNN (or any other cuda-capable convolution library) to perform some form of"lazy" or “stenciled” convolution ?

What I mean is I need to get the output of a convolution operation (or a series of convolution operations) but only some of the values. For example, for a convolution with a 1D 1000-cells vector output, I’d just need cells 0, 3, 45, 432, 532, 999.

I can write the formula, and I know how to obtain the values writing a hand-crafted CUDA kernel, but I’m looking for something heavily optimized, state-of-the-art, as the rest of cuDNN is. So, something that would (hopefully) be far more performant than performing the full convolution operation and discarding the “unwanted” results ?

I’m probably not clear on the vocabulary as I’m not sure how it would be called…

Thanks in advance for your help.