GNUDD Logo fiq-engine 1.2


Next: , Previous: (dir), Up: (dir)

fiq-engine

This package shows how to run the FIQ (“fast irq”) mechanism offered by the ARM processor in order to run a custom task with minimal jitter and delay in activation time.

The timing source for the user task is any interrupt source for the target platform. This package includes support for the AT91SAM926x internal timer and the PXA270 internal timer. You can change the code to use another interrupt source by changing the cpu-specific code.

Kernel code included has been tested on linux-2.6.26 and linux-2.6.27 (although the PXA has only been tested on 2.6.20). No other operating system kernels are supported as this code is Linux-specific, but other versions may be added if needed.

The included S50fiq script is an example of what I use in my embedded systems to run this package. The script includes the following commands, that you can copy and paste to your ARM shell:

        insmod ./bprintk.ko
        insmod ./sysctl-stamp.ko
        insmod ./fiq-misc.ko
        mknod /dev/fiqmisc c 10 68
        insmod ./fiq-engine.ko fiq=1
        insmod ./fiq-task.ko period=100


Next: , Previous: Top, Up: Top

1 Background and Design

The fast irq, or FIQ, behaves somewhat like a non-maskable interrupt; it is part of the ARM CPU core and is thus available in all processors built around that architecture. More correctly, the core defines two different interrupts, the IRQ and the FIQ, which are independently maskable; however, until nobody masks the FIQ (and nobody does it in the Linux kernel) it acts like a non-maskable interrupt – while still allowing protection of critical sections if the programmer needs it.

Most interrupt controllers (actually, all of them as far as I know) allow to route each interrupt source to either the normal or the fast IRQ. It is therefore possible to use such a feature with whatever event source you want: a UART device, a cpu timer, an external signal.

Since the FIQ is not masked during normal system activity, the FIQ handler is not delayed by other interrupts, nor is it disabled during critical sections of the kernel, as spinlocks and the like only disable normal interrupts. This environment allows low latency and low jitter, but requires serious limitations in what FIQ code can do. Basically, the FIQ handler can't share data with other code nor can it call kernel functions (as they might access kernel data), since the interrupts might be disabled when the FIQ handler is running. Moreover, page-fault can't happen in the FIQ handler, as the kernel is not ready to handle them. In practice, the situation is not unlike what happens when a real-time kernel (such as RTAI) coexists with the Linux kernel.

This package offers a few support modules and the glue code to fire a user-defined procedure based on a FIQ interrupt source. Support code includes a printk-like function, a time-stamping mechanism to keep track of your delays and jitters, and a shared-memory area for communication. The main code deals with the hairy details of installing the vector within the Linux interrupt management and register saving/restoring, calling a user-defined function when the FIQ triggers.

Before you load the modules you need to apply a kernel patch to export access to the FIQ vector.


Next: , Previous: Background and Design, Up: Top

2 The Kernel Patch

To load the modules you must patch the kernel, in order to export access to the FIQ vector (as described below). The patch is applied using “patch -p1” from within the main kernel sources. It is provided for 2.6.26 through 2.6.29, but the 2.6.26 patch works back to linux-2.6.20, with only some offset in line numbers. This is an example use:

        cd linux-2.6.27
        patch -p1 < /path/to/fiq-engine-1.2/patches/linux-2.6.27-fiq.patch
        make uImage

If you have git, the 2.6.28 and 2.6.29 patches have been created withg it, so you can “git am” them to your branch.

The patches modify the following files:

arch/arm/kernel/entry-armv.S
A vector is provided for the fast interrupt. These are few lines of code as the real handler is implemented in the external module though the fiq_userptr symbol.
arch/arm/kernel/armksyms.c
The file exports fiq_userptr.
mm/vmalloc.c
Whenever the virtual memory for the kernel is modified, the change is propagated to all existing virtual memory maps. Otherwise we could have a fatal page fault while in FIQ context.


Next: , Previous: The Kernel Patch, Up: Top

3 Compiling and loading

To compile the code you should just run make, with the LINUX variable pointing to your kernel sources, patched as described. As usual, the CROSS_COMPILE and ARCH variables should be set as well (unless you compile natively). All variables can be set in the environment or on the command line of make. Example:

        export LINUX=/usr/src/linux-at91
        make ARCH=arm CROSS_COMPILE=/opt/eldk-4.2/usr/bin/arm-linux-

No make install is provided, you can load the modules where they have been compiled or move them by hand in another place of your choice.

Please note that the convention for platform-specific header files changed over time, from “#include <asm/arch/...>” to “#include <mach/...>”. In this package, at91 uses the new convention thile pxa uses the old one. You may need to fix it yourself.


Next: , Previous: Compiling and loading, Up: Top

4 Kernel Modules

The package is made up of a few kernel modules, they are explained one by one.


Next: , Previous: Kernel Modules, Up: Kernel Modules

4.1 bprintk.ko

The module exports the bprintk function. You don't need to load it unless you use tha function (most likely yoùll use it at least during debugging).

The function is a printing function just like printk, but unlike printk it doesn't send anything to the console or to the kernel buffer. Instead, it uses its own buffer to host the strings until a safe time. A kernel timer is used to send such text to the real printk function.

If you need to use a printing function from within a FIQ handler, you can't call printk directly, as explained earlier, so this module offers a workaround to help debugging. Note, however, that if your system blocks in the FIQ handler, no bprintk can help, as the accumulated data won't have a way to reach the console.

Please note that calling printk from FIQ isn't guaranteed to fail: it might even work a number of times before failing or before triggering bugs that may be very difficult to track.

Everything you need to do with bprintk.ko is loading it, or having it automatically loaded by modprobe if you arrange for it.


Next: , Previous: bprintk.ko, Up: Kernel Modules

4.2 sysctl-stamp.ko

This module offers a timestamping API, for use in your own modules. Such functions are safe to call from a FIQ context. The timing source being used is up to the caller; the module only offers the interface to user space through /proc/sys/dev.

Use of this module is exemplified in fiq-task, where you can see the details in action. Moreover, the header file is well commented.

The module exports the following functions:

     struct scstamp_table *scstamp_register(char *name,
                                        struct scstamp_item *items, int nitems,
                                        int num, int den, int avgweight);
     void scstamp_one(struct scstamp_table *table, int offset, unsigned long count);
     void scstamp_unregister(struct scstamp_table *table);

The first function registers a new directory in /proc/sys/dev, under the name name. The items and nitems arguments define a list of files, each used to stamp a different event. The function must be called from Linux context (not FIQ context).

For each file in the array, timestamps are reported as count values passed to scstamp_one. The num, den and avgweight are used to convert such values in human-readable stamps (for example, microseconds). Each file in the array can declare its conversion values in the data structure passed to scstamp_register, whereas values passed as explicit arguments are used as default for the items that don't specifying them. Conversion is performed by issuin an integer multiplication by num followed by an integer division by den.

In each file four integer numbers can be read at any time by a user space program: the minimum, the average, the maximum and the last stamp reported. All numbers are converted as explained, and the average reported is a running average using avgweight as scale factor.

The scstamp_one function can be called from any context and adds a new timestamp (count) to the file (“item”) in position offset of the list registered to scstamp_register. The count is converted as explained and contributes to the average according to the avgweight in use.

When you are done with timestamping (for example, when your own module is unloaded), you can remove the files from /proc/sys/dev by calling scstamp_unregister from Linux context.

As suggested, fiq-task.c shows how to use the functions in practice.


Next: , Previous: sysctl-stamp.ko, Up: Kernel Modules

4.3 fiq-misc.ko

This module registers a misc device driver that exports a shared-memory area to user space. Such memory area is meant to be accessed by both FIQ code and normal Linux code.

The module receives two optional parameters:

The device is created as a miscdevice with 68 as minor number (actually, FIQMISC_MINOR in fix-misc.h). If you are not running udev you should create the device yourself:

        mknod /dev/fiq-misc c 10 68

On the device node, the following file operations are supported:

mmap
The mmap family of system calls is the only method to access the shared-memory area. There is no limit enforced over generic file-permission checks.
read
Using read a process can get retrieve the number of times the fiq task has been run, as a binary integer number.
poll
The device is always reported as readable, as read will never block.
write
When a process writes to the device, a cache flush is forced.


Next: , Previous: fiq-misc.ko, Up: Kernel Modules

4.4 fiq-engine.ko

This module is the core of the FIQ implementation. It is made up of three object files: an assembly file with the register-saving part, the generic C source and a CPU-specific part.

The module receives two parameters:

This module exports two symbols to other modules:

        int fiq_register(void (*handler)(void), int irq);
        int fiq_unregister(void (*handler)(void), int irq);

The handler being registered is invoked whenever the interrupt event occurs. According to the fiq= argument the event is a fast interrupt or a normal interrupt.

The CPU-specific engine is currently implemented for the AT91SAM92 family of processors (9260, 9261, 9263) and for the PXA270 CPU. In order to port to a different CPU implementation you need to offer the following low-level functions (either inline or real functions) using existing code as reference:

int __fiq_register(void (*handler)(void), int irq);
void __fiq_unregister(void (*handler)(void), int irq);
These are the low-level relatives of the main module entry points. They should program the interrupt source and configure it as a FIQ or a normal interrupt according to the integer variable fiq_use_fiq.
void __fiq_ack(void);
The function should acknowledge the interrupt event to host hardware. It may be empty or not, according to what hardware requires.
void __fiq_sched_next_irq(int usec);
This is called by the FIQ task to program the next interrupt event, at a time usec microseconds after the previous event (which is actually being serviced when this function is called). According to how the timer hardware works, you might act like AT91 code (where the counter resets to 0 at interrupt time) or like PXA code (where the counter keeps running and you need to set a compare value).
unsigned long __GETSTAMP(void);
This macro should read a timestamp source and report the number of counts elapsed from the last interrupt. The macro can just read a register (it does so for AT91), make a subtraction (if the counter counts down, like Sam sung s3c processors do) or use the previous match value (if the counter keeps running up, e.g. PXA).


Next: , Previous: fiq-engine.ko, Up: Kernel Modules

4.5 fiq-task.ko

This example is released to the public domain. You are expected to write your own FIQ task based on this one.

The example, as released, fires a periodic task with a period of 250 microseconds (on either AT91SAM92 or PXA) and toggles a GPIO bit at each run; you thus get a 2kHz square wave on the pin. Note that while this example task is periodic, the FIQ module is designed to allow aperiodic tasks, as each invocation specifies how far in the future the next interrupt should fire.

The integer module parameter period= can be used to specify the period, so you can change the default value of 250 usec. I used it successfully down to 20 usec at 200MHz or 10 usec at 400MHz.

The module parameter bit= specifies which GPIO pin to move. The default value is defined in fiq-at91sam92.h and fiq-pxa.h with the macro FIQ_BITNR. For the numeric definition of the bits on at91 see the relevant header files; on the PXA the integer directly represents the GPIO number.xxxxxxxx

The example, in addition, registers four sysctl files that report the entry and exit times of the task, expressed both as raw timer counts and as microseconds.

The following hardware resources are used:

AT91SAM9263
The code uses timer-counter 0 (TC0) programmed at 3.1 MHz ( 99.3 MHz / 32) and toggles PB20. The code has been used on the 9263EK board and is tailored to its frequencies. Since this CPU has a single interrupt source for its three timers, use of FIQ reporting for one of them prevents using the interrupt on the other two timers. The actual time between interrupts is wrong by 3% since the code approximates the counter rate to 3MHz.
AT91SAM9260
AT91SAM9261
The code uses timer-counter 0 (TC0) programmed at 3.1 MHz ( 99.3 MHz / 32) at and toggles PB20 (9260) or PA20 (9261) (unless pin= is specified). The code has been used on the 9260EK/9261EK boards and is tailored to their frequencies. The code can use TC1 or TC2 if so configured, since the CPU has three different interrupt sources for the three timers. Like above, time lapses are calculated assuming a 3MHz counter speed.
PXA270
The code uses OSMR1, with the 3.25MHz OSCR counter. PXA255 has not been tested but it should work equally well (but there the timer counts at 3.6864MHz so the time will be wrong by 13%). The code has been used on a custom board running 2.6.20; it toggles GPIO11 (unless pin= is specified) and uses OSMR1 as a timer source. The code can work with OSMR2 or OSMR3 (according to fiq-pxa.h, while OSMR0 is used by Linux. Other timers found in PXA270 and later are not supported.

The pin being toggled can be observed with the scope. Moreover, the timestamps of task activation (and task completion) can be observed in /proc/sys:

        # ls /proc/sys/dev/task/
        entry--us  entry-raw  exit--us   exit-raw

As described in section sysctl-stamp.ko, the files report minimum, average, maximum and last reported values. By writing to one of the files, the user can reset all counts.

The following example (which has been re-indented for readability) shows how to read the stats, reset them, read again. As a reminder, the entry files report the delay measured in the activation time of the task, as both raw counts and microseconds, while the exit files show the stamp when the task is done. On the PXA processor, however, the exit time is not the time elapsed from the next event but rather the time before next event. This was measured on a 9260 running 2.6.26 (“exit” time is elapsed time):

        # cd /proc/sys/dev/task
        # grep . *
        entry--us:   1   1   4   2
        exit--us:    2   2   9   5
        entry-raw:   2   2  13   6
        exit-raw:    6   6  29  15
        # echo 0 > entry-raw; grep . *
        entry--us:   1   1   4   1
        exit--us:    3   3   9   5
        entry-raw:   2   3  12   7
        exit-raw:    8   9  28  17

These data points shows how both task-activation time and task-completion time can change by a factor of 3-4 times. This can be tracked down to a cache effect: if the task is not in cache when the FIQ triggers, all of the code takes more time than it takes when it is already in cache. Activation time remains within a few microseconds of the timer trigger in any case, even with a high interrupt load, but please note that the FIQ is acknowledged before the time stamp is taken, and this eats up most of the measured time (see section fiq-empty.ko, later on).

As a comparison, this is what happens when the task uses the normal interrupt (first data point is taken on idle machine, second data point is taken under ping flood “ping -s 1024 -f”).

        entry--us:   3   4  36   5
        exit--us:    5   5  40   9
        entry-raw:  10  10 113  14
        exit-raw:   15  16 123  20
     
        entry--us:   2   3  89   5
        exit--us:    4   4  92   9
        entry-raw:   7   8 275  32
        exit-raw:   11  12 285  23

In this case, the delay in entering the task is on average a few microseconds more than in the FIQ case, but the jitter is definitely worse, with a measured delay of up to 36 microseconds in an unloaded system. This activation jitter is the result of interference between different interrupt sources. Such interference doesn't happen with the fast irq as a FIQ handler fires even while another interrupt is being serviced.


Next: , Previous: fiq-task.ko, Up: Kernel Modules

4.6 fiq-empty.ko

The empty module is a new addition in version 1.1 of this package. Together with busy.ko it can be used to benchmark FIQ execution.

This module registers a periodic task just like fiq-task, and it takes the same arguments: period= (default 250) and pin= (default defined in header file as documented in section fiq-task.ko). But unlike fiq-task, the pin is raised as soon as the task is entered and lowered just before the task is existed. This allows to benchmark the time needed to acknowledge the interrupt and schedule the next one, using a scope. On the PXA I measured 6 microseconds (I hoped it was much less).


Previous: fiq-empty.ko, Up: Kernel Modules

4.7 busy.ko

The module (added in version 1.1 of the package), is a simple Linux module (not a fiq-related one) that continuously toggles a pin, without ever releasing the CPU. Since interrupts (or FIQs) are not disabled, normal device activity goes on but no process is scheduled. The module returns -EIO so you won't need to rmmod it to make another run.

The module takes two parameters: bit=, to select the GPIO bit to toggle, and duration= (default: 5) to choose how many seconds it should run.

By using a scope with two probes, you can look at both this GPIO bit and the one toggled by fiq-task or fiq-empty and see the overall overhead of the FIQ event, since the bit will not be switching when the interrupt is being served. With fiq-task you'll a quiet period of a few microseconds before and after the bit change (for sysctl stamping, acknowledging the event, scheduling the next one). With fiq-empty you'll see almost no quiet period before its own bit is raised and after it is lowered. This because the CPU takes very few instructions from the time it serves the FIQ event and the time your own module gets control.

I have a few photographs, feel free to ask if you can't wire your own scope and would like to better understand the idea (I might include them in a later release, I've no time to clean them up right now).


Previous: Kernel Modules, Up: Top

5 Acknowledgements

The FIQ has been used in a few projects of mine, and code evolved while being supported by my clients. Most of them currently prefer to remain anonymous although they all agreed generic code would go public over time.

Initial work and the Samsung code has been sponsored by Dataprocess Europe. I didn't yet clean up the Samsung code for publication.

Cleanup and documentation has been sponsored by cori.it.

The PXA port has been sponsored by BTicino SpA.