Device memory utilization is low

With the visual profiler, I ran the simple matrixMul example with two metrics: Device memory utilization and System memory utilization. I ran the example with two large matrices where nvidia-smi reported about 2200M out of 4000M memory usage.

After the run, now I see that:
Device memory utilization = Low (2)
System memory utilization = Low (1)

I guess something is wrong here? Maybe the meaning of the utilization is different here. I expect to see Device memory utilization of about Medium (5). Isn’t that?

these are measures of bandwidth (or transactions, if you prefer) not the sizes of allocations

high utilization of device memory for this metric refers to the idea that your application is generating a high level of traffic to device memory, over the course of the application execution duration.

If we accept that utilization for memory is about a measure of bandwidth, then what can we say about L2 cache utilization? It is stated that

The utilization level of the L2 cache relative to the peak utilization on a scale of 0 to 10

So, I expect a measure of occupancy. Is that correct?

The device is M2000 which is compute-compatibility 5.x. According to [1], we see dram_utilization which is not available in the metrics from visual profiler of cuda toolkit 9.1 So, I thought that is Device memory utilization. Is that right?

[1] Profiler :: CUDA Toolkit Documentation

no, L2 cache utilization is bandwidth also. I don’t know what occupancy means in the context of a L2 cache.