\chapter{The ARM processor}


\section{Overview of the ARM}

The ARM CPU is a 32 bit RISC processor originally designed
by Acorn Computers Limited.  Acorn formed Advanced RISC
Machines Ltd to designing future versions of the processor.
ARM Ltd now licence the designs to a large number of
semiconductor, consumer electronics and other companies
worldwide.  Details about the ARM710a processor which is
typical of the processors currently available may be found
in \cite{arm}.

It has a largely orthogonal instruction set with a
load/store architecture.  The ARM has 16 general purpose
32-bit registers and a Program Status Register (which
contains arithmetic result flags, processor mode and interrupt
status) accessible at any time.  Some of these registers are
shadowed in other processor modes.  The processor has an
unprivileged user mode (\arm{usr}) and several privileged modes
(\arm{svc}, \arm{irq}, \arm{fiq}, \arm{abt}, \arm{und}), the last
two of which are not available on ARM CPUs before the ARM6.
This presents some problems for virtual memory systems, as we
shall see later.

All instructions are conditionally executed, not just branch
instructions.  This allows for a large reduction in the number
of branch instructions required which makes pipeline refills
much less common.

Although 15 of the 16 registers are general purpose, in order
to ease the job of the compiler, an ARM Procedure Call
Standard is defined which assigns additional meaning to
specific registers.  This is shown in Figure~\ref{apcs}

\begin{figure}
\begin{tabular}{lll}
Number & APCS  & Description        \\
r15    & pc    & program counter    \\
r14    & lr    & link register      \\
r13    & sp    & stack pointer      \\
r12    & ip    & scratch register   \\
r11    & fp    & frame pointer      \\
r4-r10 & v1-v7 & variable registers \\
r0-r3  & a1-a4 & argument registers
\end{tabular}
\caption{The APCS register bindings}\label{apcs}
\end{figure}

The program counter is automatically copied to the link
register by the ARM when it performs a function call (Branch
with Link).  The caller of the function places the first 4
arguments into a1-a4 and any further arguments go on the
stack.  The called function may corrupt all of a1-a4, but
must preserve v1-v7, fp and sp.  ip may be corrupted by the
procedure call before entry to the procedure proper and may
not be used as an argument register or to save data over the
call.

\section{Software Interrupts}

The ARM has a number of ways to enter privileged modes in
order to perform operations which are not permitted to tasks
running in \arm{usr} mode.  One of these ways is via a
Software Interrupt instruction, or \arm{swi} for short which
causes the processor to enter \arm{svc} mode and jump to
address 0x08.  This would normally be a branch into the part
of the kernel that will take appropriate action.  8 bits of
the 32-bit instruction are used for the condition code and to
indicate that this is a SWI.  This leaves a 24-bit field in
this instruction which is not interpreted by the processor,
and this can be used to decide what action to take.  \arm{svc}
mode has its own private \arm{R13\_svc} and \arm{R14\_svc}.
The address of the instruction following the \arm{swi}
instruction is placed in \arm{R14\_svc} by the processor.
\arm{R13\_svc} should have been previously set up to point to
the kernel stack; this is generally part of the bootstrap code.

\section{Hardware Interrupts}

There are two types of hardware interrupt on the ARM, \arm{irq}
and \arm{fiq}.  \arm{fiq} is an abbreviation of Fast Interrupt.
The ARM has 2 interrupt lines entering it, one for each interrupt.
An \arm{irq} cannot interrupt a \arm{fiq}.  When an interrupt is
signalled on one of these lines, the ARM switches into the
corresponding privileged mode and will typically enter the
kernel, indirected via 0x18 for \arm{irq} and 0x1C for \arm{fiq}.
These modes also have their own private registers, \arm{R13\_irq}
and \arm{R14\_irq}; and \arm{R8\_fiq} to \arm{R14\_fiq}.  Again,
R14 is set up by the processor to point to the appropriate
return address once the interrupt has been handled.

Devices are multiplexed onto these two lines by the IO
controller (IOC in old machines and IOMD in newer machines). 
In order to find out which device triggered the interrupt,
it is necessary to read the status registers from IOC (which
is memory mapped).  The I/O map is illustrated in Figure~\ref{iomap}

\begin{figure}
\begin{tabular}{|l|l|l|}			\hline
Address & Read		& Write		\\	\hline
0x00	& Control	& Control       \\
0x04	& Keyboard	& Keyboard	\\
0x08	&		&		\\
0x0C	&		&		\\
0x10	& IRQ A status	&		\\
0x14	& IRQ A request	& Clear IRQ	\\
0x18	& IRQ A mask	& IRQ A mask	\\
0x1C	&		&		\\
0x20	& IRQ B status	&		\\
0x24	& IRQ B request	&		\\
0x28	& IRQ B mask	& IRQ B mask	\\
0x2C	&		&		\\
0x30	& FIQ status	&		\\
0x34	& FIQ request	&		\\
0x38	& FIQ mask	& FIQ mask	\\
0x3C	&		&		\\	\hline
\end{tabular}
\caption{Memory map of interrupt sources}\label{iomap}
\end{figure}

\section{Aborts}

There is an abort line entering the ARM processor which can be
pulled high by an external memory manager when the ARM attempts
an illegal access to memory.  The ARM has two abort traps,
depending on what it was attempting to do when it received
the abort.  If it was attempting to fetch data from an illegal
address, it enters \arm{abt} mode and jumps to 0x10, and if it
attempts to execute an instruction which is marked as having been
from an illegal address\footnote{The abort will not be taken
immediately since the abort should not be occur if the aborting
instruction enters the pipeline but is not subsequently taken}
then it enters \arm{abt} mode and jumps to 0x0C.  In either
case, it preserves the return address in \arm{R14\_abt}.

When the ARM attempts to execute an instruction which it does
not understand it enters \arm{und} mode, stores the address of
the instruction following the undefined one in \arm{R14\_und} and
jumps to address 0x04.  This is normally used to implement a
software floating point emulator in machines with no floating
point hardware.  Unfortunately, there is no freely available
floating point emulator for the ARM, and this is someting that
would need to be implemented.

Versions of the ARM before the ARM6 did not have \arm{abt} or
\arm{und} modes.  In the ARM2 and ARM3, when illegal memory
accesses or undefined instructions occur, the ARM switches
into \arm{svc} mode instead.  This makes it very difficult to
implement a virtual memory system since if the processor is
in \arm{svc} mode and it accesses memory which is not currently
paged in then \arm{R14\_svc}, which would normally contain the
return address from the system call, will be overwritten with
the address of the aborting instruction.  To get around this, it
is necessary to preserve the return address into a different
register before attempting to access any memory, possibly
including the \arm{svc} stack.  Acorn's RISCiX (a derivative of
4.3BSD Unix) works in this manner.  Acorn's RISC OS does not
bother, and simply does not implement virtual memory.  Under
RISC OS, it is also not normally permitted to issue floating
point instructions while in \arm{svc} mode since this will
also overwrite \arm{R14\_svc}.

\section{Memory Management}

The family of ARM processors have been attached to several
different memory management systems.  Acorn originally designed
the MEMC to go with the ARM2, and this was retained for the ARM3.
The ARM6 core is available with an MMU in the ARM610 chip, and
without an MMU in the ARM60 chip.  The MMU in the ARM610, ARM710
and StrongARM can be considered to be roughly equivalent.  The
MEMC chip is primitive by today's standards and it would be
extremely difficult to implement a sophisticated memory management
system with the MEMC.  I will consider only the intersection of
the feature sets of the MMUs contained in the ARM610, ARM710 and
StrongARM since this produces a design which is compatible with
all current production processors.

The MMU contains a Translation Look-aside Buffer (TLB), access
control logic and translation table walking logic.  The MMU
translates virtual addresses generated by the ARM into physical
addresses which are output onto the address lines.  Before the
MMU is activated, it is necessary to prepare a Translation Table
which is 16k of Descriptors.  Descriptors allow for either
single-indirection (Sections) or double-indirection (Pages).
Sections contain a pointer to 1MB of memory, and Page
Descriptors contains a pointer to 4k of memory.  The advantage
of using Sections is that they are quicker to translate and
only take 1 entry in the processor's TLB for an entire Megabyte.

When translating an address, the MMU uses the top 12 bits to
index the Translation table.  If it finds a Section descriptor,
it replaces the top 12 bits with the reference that it finds in
the table.  If it finds a Page Descriptor, it uses the next 8
bits of the virtual address to index the Page Table that the
Page Descriptor points to, which contains the top 20 bits of
the new physical address.

