High frequency external clock and data GPIO sampling

Hello,

I wish to set up a 4-bit parallel bus sampler using my Jetson TK1. The 4-bit bus has an external clock line, so I have a total of 5 wires to insert into the GPIO header: CLK, D1-D4.

The CLK line runs at 50MHz. I realize this might be pushing the Jetson’s GPIO capabilities through either the J3A1/J3A2, but I’m here to confirm whether it’s possible or not.

I’ve tried setting up simple script to see just how quickly I can sample even one of the GPIOs. My current method seems to provide roughly less than 60KHz, most likely due to the continued open() operation, although I’m not sure how else to go about it given that there seems to be some file access conflict with the underlying GPIO driver which sets the /sys/class/gpio/gpio166/value file integer (0 or 1). If I don’t close and re-open the file then the file value never changes even though I am toggling the line voltage between 0V and 1.8V.

The code is generally:

struct timeval t1, t2;
double elapsedTime;
int i = 0, fid = 0;
char val;
gettimeofday(&t1, NULL);
while(i=0;i<1000;i++){
   fid = open("/sys/class/gpio/gpio166/value",O_RDONLY);
   read(fid,&val,1);
   //cout << val; //uncomment for debug to confirm value changes when toggling line voltage
   close(fid);
}
gettimeofday(&t2, NULL);
elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0;
elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0;
cout << elapsedTime << " ms.\n"

Might there be a faster method of accessing this file value? An equivalent Bash script runs samples even more slowly than this C++ code. Additionally, what is the best method for enforcing a certain GPIO sampling frequency to reduce sampling variance, as opposed to having it run “free” in a while loop using the above method?

As an alternative to using the GPIO libraries, I’ve considered using the 4-lane CSI-MIPI interface as it certainly supports these sampling rates. (and faster) I am just unsure of how I would go about writing a driver for this, as there doesn’t seem to be a straight forward way of accessing the interface pins as easily as with the GPIO interface. Is it imaginable that a C++ script could be written to process this 4-bit bus using the CSI-MIPI hardware on the Jetson TK1?

Thank you.

I don’t know about any of the questions you’re asking, but I can tell you immediately that a while loop which opens and closes each time there is access guarantees bad performance. You wouldn’t even need to access the file for it to provide bad performance…the amount of work needed to set up the open() is an enormous drain on performance. Try putting the first open() prior to the gettimeofday(), and don’t close until after the second gettimeofday(). I would expect the read() to take far less time this way. Then you can start measuring the real GPIO performance without all of the security and other setup associated with open().

I’ve tried this but as I mentioned, there seems to be an access issue. In any case, I’m led to believe that a C++ implementation of this through the standard GPIO interface probably won’t get me the rates and timing/sampling fidelity required.

Where would I start if I am interested in getting high fidelity hardware timing without OS related delays/interruptions, specifically with the Jetson TK1? Does this necessarily preclude the use of an OS like Ubuntu, and require a custom built kernel? I have to imagine that an assembly language implementation would be quickest, but there really isn’t much literature for how to get started doing something like this with the Jetson. The same goes for accessing the CSI-MIPI 4-lane interfaces (supposedly broken out on the J3A headers) and purported external clock input. Anything helps at this point… Some of the clock rates within the TK1 greatly exceed 50MHz, so I’m not inclined to say this is impossible.

If your application is user space you could just increase the priority. Maximizing hardware for performance also would help. About maximizing hardware:
http://elinux.org/Jetson/Performance

For changing a user space priority, you’d need sudo. See “man nice” and “man renice”.

You have essentially two places which might get in the way. One is just file I/O in general…if you open and close a file each time through a loop it will never be good performance. Any file type control requires the file to remain open until operations are complete. The other is that the driver for the GPIO is triggered by a hardware interrupt in combination with the scheduler. Nice and renice help with scheduling, but there will be other processes competing for CPU0 which is where all hardware IRQs are handled. Linux is not a “hard” realtime O/S, so you will get good average performance, some good almost realtime performance on some drivers, but you can’t get guaranteed realtime.

C++ won’t have any effect on this per se. One thing to consider though, which many people writing C++ do not realize, is that in streams if you tie the C I/O for stdin/out/err/log (e.g., making std::out and C stdout synchronize buffers), the synchronization of those streams is a big performance hit. You don’t have to actually use C I/O with your C++ I/O…the tie will occur if you merely link against any C I/O libraries. You may need steps to specifically compile avoiding C standard I/O via compiler options. ldd on your executable might tell you if there are any C libraries being linked. This is where many people mistakenly think it is C++ I/O which is slow…C++ I/O should be the same speed so long as it is not burdened by tying alternate I/O in sync.

Note that hardware IRQ is required to service hardware interrupts, and the architecture makes all hardware IRQ require CPU0. This cannot be changed in this architecture. Waiting for the IRQ to be serviced is one of the less predictable parts of performance because this involves knowing how the scheduler deals not only with your driver, but also with the other drivers in competition. Despite being able to get some gains by nice or renice to something like “-1” (which has higher priority than a default nice level of “0”), you can end up with issues if you increase priority to more than -1 or -2.