Page size issues when measuring global memory access and L1/L2 cache access latencies of Jetson TK1

Hi all,

I am trying to run a GPU micro-benchmark ( to get latency of global memory access and L1/L2 access of Jetson TK1. But in the micro-benchmark, it contains page size as one of parameters when using pointer-chasing code.

Does anyone know why the benchmark include page size when measuring GPU memory access latency? What is the default page size configuration of Jetson TK1?


In that web page, it explains the page size:

Page Size

The translation page size used by the GPU has been found to depend on both the hardware and the CUDA driver version used by the system (Analyse de l’architecture GPU Tesla). Our published experiments observed 4 KB pages, but we have observed 64 KB pages after our CUDA driver was updated to version 190.18. Microbenchmarks that depend on the page size have been modified to scale strides/array sizes with page size, set using const int page_size = 4; (in KB) near the top of the relevant source files. You will likely have to change this value for your system. Our microbenchmarks were designed for 4 KB pages and was briefly tested using 64 KB pages, but weird things may happen when page sizes change, like running out of memory.

In jetson, you can get page size by the following command:
getconf PAGESIZE

It should be 4K page in Jetson TK1.