The reason of getting negative number is the way this number to be calculated: (cache_requests - cache_misses)/cache_requests.
So the reason is cache_misses value is larger than cache_requests. Strange External Image
I can only suggest that in my case (I fetch uint2) and in yours’ the fetched data from the texture cache has more then 32 bit. As the result, one request (e.g. tex1Dfetch) split internally into several low-level requests.
How is it possible to solve this? that is the question.
Any comments from NVidia employee are highly required.