<!-- Forthmacs Formatter generated HTML V.2 output -->
<html>
<head>
<title>ARM Assembler</title>
</head>
<body>
<h1>ARM Assembler</h1>
<hr>
<p>
<p>
<h2>Using the assembler</h2>
<p>
Coding in ARM assembler is very straightforward.  If you have used an ARM 
Assembler before, you will already know the instructions.  Otherwise I recommend 
<p>
<em>Acorn Risc Machine Family Data Manual, Prentice Hall, ISBN 0-13-781618-9</em> 
<p>
for further information.  It tells you everything about the ARM cpu you should 
know and covers the whole instruction set plus lots of hardware details.  In 
fact it was my only source of information at hand when starting this 
RISC OS Forthmacs port.  
<p>
This documentation is by far not complete, but it covers most aspects.  If you 
are writing code, just have a look at some kernel sources and see how it works.  
Whenever you are not sure about the produced code, have a look at it by 
<p><pre>  code demo ...... c;</pre><p>
<p><pre>  see demo</pre><p>
and you have the code just in front of you.  
<p>
Also there is a chapter "Assembler Tutorial".  
<p>
As most Forth assemblers are, this assembler is really just a vocabulary which 
contains the words for assembling ARM code.  It is "activated" by adding the 
assembler vocabulary to the search order.  There are also some common ways to 
control assembly which do more than just put the assembler vocabulary in the 
search order.  It also uses a <em>data first - operand last</em> syntax as Forth 
generally does.  
<p>
Lets now have a look at some kernel source and see what the syntax looks like in 
the forth assembler syntax and in the original Acorn Syntax ( displayed by the 
disassembler utility).  
<p>
<p><pre>
code count      (s adr -- adr1 cnt ) 
        r0      top     mov
        top     r0 byte )+ ldr
        r0      sp      push c;
see count
code count 
 (   a148 )  mov     r0,r10
 (   a14c )  ldrb    r10,[r0],#1
 (   a150 )  str     r0,[r13,#-4]!
 (   a154 )  ldr     pc,[r8],#4
</pre><p>
<p>
<p>
<h2>General syntax</h2>
<p>
All instructions follow the general syntax: 
<p>
<p><pre>
        ARM:        opcode  r-dest r-n operand
        Forth:      r-dest r-nsrc operand modifiers  condition op-code
</pre><p>
<p>
<p>
The brackets and commas in the original assembler source are replaced by spaces, <em>addressing mode indicators</em> 
and macros.  The 
<p>
ia [r0],#4 will be )+ , indicating a postincrement by 1 or 4 according to 
byte/word access.  
<p>
push is a macro meaning 
<p><pre>  -( str.</pre><p>
<p>
c; at the end assembles a next instruction 
<p><pre>  ldr     pc,[r8],#4</pre><p>
<p><pre>  pc  ip )+  ldr</pre><p>
and quits assembling.  
<p>
The operands ( registers or numbers ) must appear in the correct order followed 
by modifiers.  
<p>
<p>
<h2>Conditions</h2>
<p>
All instructions can be conditionally executed on ARM cpus.  All condition codes 
are implemented, they should be preferably written just before the opcode 
itself.  You don't have to write down the <strong>al</strong> condition, it is 
the default.  
<p>
Note: According to ARM standards, <strong>nv</strong> is <code><A href="_smal_AJ#279">not</A></code> 
implemented and should never be used because of future instruction set 
extensions.  
<p>
Condition codes available : <strong>eq ne cs cc mi pl vs vc hi ls ge lt gt le al</strong> 
<p>
<p>
<h2>Shifts</h2>
<p>
There a numerous shifts for operators available, 
<p>
<strong>asl #asl lsl #lsl lsr #lsr asr #asr ror #ror rrx</strong> , 
<p>
all shift operator leaded by a # mean count of shift specified by a number, 
otherwise by a register.  
<p>
This assembler is clever enough to find out shifted immediates itself, so you 
don't have to worry about lines like 
<p><pre>  top th f0 #  td 24 #lsl mov</pre><p>
just write 
<p><pre>  top th f0000000 # mov</pre><p>
instead.  
<p>
<p>
<h2>Register usage</h2>
<p>
Registers <strong>r0 - r6</strong> are available for use within code 
definitions.  Don't try to use them for permanent storage, because they are used 
by many code words with no attempt to preserve the previous contents.  
<p>
<p><pre>  r9      user area pointer       up</pre><p>
<p><pre>  r10     top-of-stack register   top</pre><p>
<p><pre>  r11     return stack pointer    rp</pre><p>
<p><pre>  r12     instruction pointer     ip</pre><p>
<p><pre>  r13     stack pointer           sp</pre><p>
<p><pre>  r14     link register           lk</pre><p>
<p><pre>  r15     pc + status + flags     pc</pre><p>
Note: In future CPU Versions, the internal structure of the <code><A href="_smal_BI#50">pc</A></code> 
register will be different, it seems to be better, to imagine <code><A href="_smal_BI#50">pc</A></code> 
and status register as two registers.  The hardware-errors and the <code><A href="_smal_AT#43">.registers</A></code> 
instruction know about this already.  
<p>
<p>
<h2>Structured programming</h2>
<p>
This assembler supports structured programming not by using labels but common 
forth-like structures instead.  The structures do not have to fit on one line, 
and they may be nested to any level.  The range of the branches assembled by 
these structures is not restricted.  
<p>
Implemented structures are: 
<p><pre>  set the flags                   \ produce the condition</pre><p>
<p><pre>  condition if ...                \ if condition is met do this</pre><p>
<p><pre>            else ...              \ otherwise this</pre><p>
<p><pre>            then</pre><p>
<p><pre>  </pre><p>
<p><pre>  </pre><p>
<p><pre>  </pre><p>
<p><pre>  begin ....</pre><p>
<p><pre>        set the flags             \ produce the condition</pre><p>
<p><pre>  condition     while ...         \ do this when condition met</pre><p>
<p><pre>        ( you may set the flags )</pre><p>
<p><pre>  ( condition ) repeat            \ the repeat is normally always done</pre><p>
<p><pre>                                  \ but you may also test for another</pre><p>
<p><pre>                                  \ condition.</pre><p>
<p><pre>  </pre><p>
<p><pre>  </pre><p>
<p><pre>  begin ...</pre><p>
<p><pre>        set the flags             \ produce the condition</pre><p>
<p><pre>  condition until                 \ leave the loop when condition is met</pre><p>
<p><pre>  </pre><p>
<p><pre>  </pre><p>
<p><pre>  </pre><p>
<p><pre>  begin ... again                 \ loop until whatever may happen</pre><p>
<p>
<p>
<h2>Porting</h2>
<p>
The ARM assembler can be used also by other Forth systems, all hardware specific 
parts are written portable and can be changed in case of problems very easily.  
So a 68k-Forthmacs can metacompile ARM code by this assembler without any 
change.  In fact, the very first metacompilation of this RISC OS Forthmacs took 
place on an ATARI-ST having 1MB Ram and a 720k disk.  
<p>
<p>
<h2>Byte-sex</h2>
<p>
Both byte-sexes can be produced by this assembler, this allows portable 
assembler code for all ARM CPUs.  <strong>little-endian</strong> and <strong>big-endian</strong> 
do the switch.  
<p>
<p>
<h2>ARM2/3/6</h2>
<p>
The assembler takes care of some cpu dependent restrictions, <strong>arm2</strong> 
disallows the more advanced instructions, <strong>arm3</strong> allows them.  
<p>
<p>
<h2>Forth Virtual Machine Considerations</h2>
<p>
The Forth parameter stack is implemented with r13, but the name <code><A href="_smal_BJ#51">sp</A></code> 
should be used instead of r13, in case the virtual machine implementation should 
change.  
<p>
The return stack is implemented with r11, and the name <code><A href="_smal_BD#1B">rp</A></code> 
should be used to refer to it.  
<p>
The base address of the user area ( the user pointer) is r9 but should be 
referred to as <code><A href="_smal_BD#31B">up</A>.</code> User variable number 
124 (for instance) may be accessed with the 
<p><pre>  up td 124 d)</pre><p>
addressing mode.  There is a macro <code><A href="_smal_BF#1D">'user</A></code> 
which will assemble this addressing mode for you.  
<p>
The interpreter pointer <code><A href="_smal_BC#1A">ip</A></code> is r12.  The 
interpreter is post-incrementing, so when a code definition is being executed, <code><A href="_smal_BC#1A">ip</A></code> 
points to the token after the one being executed.  A "token" is the number that 
is compiled into the dictionary for each Forth word in a definitions.  For 
RISC OS Forthmacs, a token is a 32-bit absolute address.  
<p>
<p>
<h2>Assembler Glossary </h2>
<p>

<hr><h3><A name="17">pc</A> ( -- n )</h3>
<br>
portable name for the <code><A href="_smal_BI#50">pc</A></code> register 

<hr><h3><A name="18">sp</A> ( -- n )</h3>
<br>
portable name for the stack pointer 

<hr><h3><A name="19">up</A> ( -- n )</h3>
<br>
portable name for the user pointer 

<hr><h3><A name="1A">ip</A> ( -- n )</h3>
<br>
portable name for the instruction pointer 

<hr><h3><A name="1B">rp</A> ( -- n )</h3>
<br>
portable name for the return stack pointer 

<hr><h3><A name="1C">top</A> ( -- n )</h3>
<br>
portable name for the top of stack register 

<hr><h3><A name="1D">'user</A> ( --  )</h3> <kbd>name</kbd> 
<br>
Executed in the form: 
<p><pre>       top   'user &lt;name&gt;   ldr</pre><p>
&lt;name&gt; is the name of a User variable.  Assembles the appropriate 
addressing mode for accessing that User variable.  
<p>
In RISC OS Forthmacs, the addressing mode for User variables is 
<p><pre>               up #n d)</pre><p>
where #n is the offset of that variable within the User area.  

<hr><h3><A name="1E">;code</A> ( --  )</h3>
 Extra: C,I
<br>
<h5>C: ( --  )</h5><br>
<br>
Used in the form: 
<p><pre>               : &lt;name&gt;  ... create ... ;code ... c; (or end-code)</pre><p>
Stops compilation, terminates the defining word &lt;name&gt;, executes <code><A href="_smal_BT#14B">arm-assembler</A>,</code> 
and does <code><A href="_smal_AS#1C2">do-entercode</A>.</code> 
<p>
When &lt;name&gt; is later executed in the form: 
<p><pre>               &lt;name&gt; &lt;new-name&gt;</pre><p>
to define the word &lt;new-name&gt;, the later execution of &lt;new-name&gt; 
will cause the machine code sequence following the <code><A href="_smal_BE#10C">;code</A></code> 
to be executed.  
This is analogous to <code><A href="_smal_BC#1CA">does&gt;</A>,</code> except 
that the behavior of the defined words &lt;word-name&gt; is specified in 
assembly language instead of high-level Forth.  
<p>
<code><A href="_smal_BE#10C">;code</A></code> calls <code><A href="_smal_AS#1C2">do-entercode</A>,</code> 
this is implementation specific and used to assemble some instructions to set 
the parameter-field-address on top the code needed to start the assembler code 
with the body of the defined 
<p><pre>       top     sp      push</pre><p>
<p><pre>       top     lk      th fc000003 # bic</pre><p>
From version 3.1/2.62 on this isn't the case any more, you have to write those 
instructions yourself.  

See: <code><A href="_smal_AC#182">code</A></code> <code><A href="_smal_BC#1CA">does&gt;</A></code> 
<p>

<hr><h3><A name="1F">adr</A> ( rx addr --  )</h3>
<br>
Assembler macro with the following effect: 
<p>
addr is moved to register rx.  Within short distances this is achieved by a <code><A href="_smal_AI#38">pcr</A></code> 
instruction, otherwise it's more complicated.  
<p>
Note: The address will be relocated correctly! 

<hr><h3><A name="20">aligning?</A> ( -- addr  )</h3>
<br>
variable holding flag, true means assembler does aligning on its own.  
Implemented for CPU independent metacompiling.  

<hr><h3><A name="21">alu-instructions</A> ( r-dest r-op1 op2{r-op2|imm} --  )</h3>
<br>
Available instructions with this syntax: 
<p>
<strong>and eor sub rsb add adc sbc rsc tst teq cmp cmn orr bic </strong> 
<p>
These instructions all have two data-inputs to the alu, the register r-op1 and 
the operand op2.  This can be another register or an 8-bit immediate.  
<p>
The register r-op2 can be "shifted" in any way specified by a shift specifier, 
either a 5-bit integer or another register plus the shifted register.  The 
immediate operand can be rotated right by 2*(4-bit-integer).  
<p>
If you give "large" literals as arguments, the assembler will generate the 
correct shifts itself.  
<p>
The <strong>#</strong> modifier declares an immediate operand as in: \ top r0 3 
# add 
<p>
The <strong>s</strong> modifier will set the flags according to the result, the 
instruction will be <strong>adds</strong> instead of <strong>add</strong> .  
<strong>mov</strong> and <strong>mvn</strong> are somewhat different, the 
operand r-op1 isn't needed.  Also, both can handle "big" immediates themselves, 
<p><pre>       top th 12345678 # mov</pre><p>
won't be a problem, <strong>mov</strong> assembles all instructions needed.  
<p>
<strong>cmp</strong> and <strong>cmn</strong> can both handle negative immediate 
operandes, they try to find out which operand is possible.  
<p>

<hr><h3><A name="22">asm-allot</A> ( n --  )</h3>
 Extra: deferred
<br>
Allocates n bytes in the dictionary.  The address of the next available 
dictionary location is adjusted accordingly.  
<p>
default <code><A href="_smal_BM#144">allot</A>,</code> implemented for ( cpu 
independent ) metacompiling.  

<hr><h3><A name="23">arm-assembler</A> ( --  )</h3>
<br>
Execution replaces the first vocabulary in the search order with the <code><A href="_smal_BT#14B">arm-assembler</A></code> 
vocabulary, making all the assembler words accessible.  

<hr><h3><A name="24">big-endian</A> ( -- )</h3>
<br>
Switches assembler to big-endian target code 

<hr><h3><A name="25">branch</A> ( addr --  )</h3>
<br>
Assembles a branch instruction to here.  Can be modified by <code><A href="_smal_BT#2B">dolink</A></code> 
and all condition codes.  

<hr><h3><A name="26">byte</A> ( -- )</h3>
<br>
modifier for the assembler, memory accesses mean byte wide access 

<hr><h3><A name="27">code</A> ( --  )</h3> <kbd>name</kbd> 
 Extra: M
<br>
A defining word executed in the form: 
<p><pre>               code &lt;name&gt; ... end-code or c;</pre><p>
Creates a dictionary entry for &lt;name&gt; to be defined by a following 
sequence of assembly language words.  Words thus defined are called code 
definitions or primitives.  Executes <code><A href="_smal_BT#14B">arm-assembler</A></code> 
and sets the opcode defaults .  
<p>
This is the most common way to begin assembly.  
<p>

See: <code><A href="_smal_BT#1DB">end-code</A></code> <code><A href="_smal_BD#16B">c;</A></code> 

<hr><h3><A name="28">c;</A> ( --  )</h3>
<br>
Terminates the current code definition and allows its name to be found in the 
dictionary.  
<p>
Sets the <code><A href="_smal_AQ#190">context</A></code> vocabulary to be same 
as the <code><A href="_smal_BG#19E">current</A></code> vocabulary (which removes 
the <code><A href="_smal_BT#14B">arm-assembler</A></code> vocabulary from the 
search order, unless you have explicitly done something funny to the search 
order while assembling the code).  
<p>
Executes <code><A href="_smal_AG#36">next</A></code> to assemble the "next" 
routine at then end of the code word word being defined.  The "next" routine 
causes the Forth interpreter to continue execution with the next word.  
<p>
<p>
This is the most common way to end assembly, calls <code><A href="_smal_BT#1DB">end-code</A>.</code> 

<hr><h3><A name="29">conditions</A> ( --  )</h3>
<br>
All instruction are executed only if the correct condition is met, the 
assemblers default is <strong>al</strong> (always), but these are also 
available: 
<p>
<strong>eq ne cs cc mi pl vs vc hi ls ge lt gt le al</strong> 

<hr><h3><A name="2A">decr</A> ( reg n# --  )</h3>
<br>
Macro, n# will be subtracted from reg.  

<hr><h3><A name="2B">dolink</A> ( --  )</h3>
<br>
modifier for <code><A href="_smal_AT#163">branch</A></code> instruction, the 
current pc will be saved to the link register.  

<hr><h3><A name="2C">end-code</A> ( --  )</h3>
<br>
Terminates a code definition and allows the &lt;name&gt; of the corresponding 
code definition to be found in the dictionary.  
<p>
The <code><A href="_smal_AQ#190">context</A></code> vocabulary is set to the 
same as the <code><A href="_smal_BG#19E">current</A></code> vocabulary (which 
removes the <code><A href="_smal_BT#14B">arm-assembler</A></code> vocabulary 
from the search order, unless you have explicitly done something funny to the 
search order while assembling the code).  
<p>
The <code><A href="_smal_AG#36">next</A></code> routine is not automatically 
added to the end of the code definition.  Usually you want <code><A href="_smal_AG#36">next</A></code> 
to be at the end of the definition, but sometimes the last thing in the 
definition is a branch to somewhere else, so the <code><A href="_smal_AG#36">next</A></code> 
at the end is not needed.  
<p>

See: <code><A href="_smal_BD#16B">c;</A></code> 

<hr><h3><A name="2D">entercode</A> ( --  )</h3>
<br>
Starts assembling after stack checking, setting the assembler defaults and 
switching to <code><A href="_smal_BT#14B">arm-assembler</A>.</code> 

See: <code><A href="_smal_AS#1C2">do-entercode</A></code> <code><A href="_smal_BE#10C">;code</A></code> 

<hr><h3><A name="2E">get-link</A> ( -- reg --  )</h3>
<br>
Assembler macro, equivalent for: 
<p><pre>  lk fc000003 # bic</pre><p>
this is useful to get the address after a branch instruction.  
<p><pre>  xxxxx dolink branch  ---+</pre><p>
<p><pre>    A) data ...           |</pre><p>
<p><pre>                          |</pre><p>
<p><pre>                          |</pre><p>
<p><pre>    B) top get-link  &lt;----+</pre><p>
So after branching to B), <code><A href="_smal_BE#1C">top</A></code> will be set 
to A) 

<hr><h3><A name="2F">incr</A> ( reg n# --  )</h3>
<br>
Macro, n# will be added to reg.  

<hr><h3><A name="30">label</A> ( --  )</h3> <kbd>name</kbd> 
 Extra: F83
<br>
A defining word used in the form: 
<p><pre>  label &lt;name&gt; ... end-code</pre><p>
<p><pre>  label &lt;name&gt; ... c;</pre><p>
Creates a dictionary entry for &lt;name&gt; consisting of a following sequence 
of assembly language words.  When &lt;name&gt; is later executed, the address of 
the first word of the assembly language sequence is left on the stack.  
<p>

See: <code><A href="_smal_BT#1DB">end-code</A></code> 

<hr><h3><A name="31">ldm</A> ( rx1 rx2 .. rxn  n#  r-adr --  )</h3>
<br>
Load multiple registers from the address pointed to by r-adr, an addressing 
modes must be defined.  
<p>
The register list is given by all register names (don't name a register twice) 
and the number of registers.  
<p><pre>   r0 r1 r2 r3 4   sp ia! ldm</pre><p>
This loads registers r0-r3 from the stack and sets the stack pointer to the next 
stack entry.  
<p>

See: <code><A href="_smal_AC#32">ldr</A></code> <code><A href="_smal_AL#3B">stm</A></code> 

<hr><h3><A name="32">ldr</A> ( r-data r-adr operand2 --  )</h3>
<br>
r-data is read from memory, the default is word (32-bit) wide, but the modifier <code><A href="_smal_BO#26">byte</A></code> 
sets this byte-wide access.  
<p>
The address is calculated using r-adr and the operand2.  It can be another 
register (the shift specified as usual by a 5-bit literal and a shift type) or a 
12-bit immediate offset.  
<p>
operand2 can be added to or subtracted from r-adr according to the addressing 
mode defined by two letters.  The first tells whether (i)ncreasing or 
(d)decreasing should be used, the second whether the in/decreasing takes place 
(b)efore or (a)fter the memory access.  A "!" at the end tells "write-back" will 
take place.  So these modes are possible 
<p><pre>       da  ia  db  ib    \ decrease/increase after/before</pre><p>
<p><pre>       da  ia! db! ib!   \ as above plus write-back</pre><p>
<p>
Some macros make live a bit more easy, they are somewhat 68k alike, and must 
follow a <code><A href="_smal_BO#26">byte</A></code> modifier because an offset 
will be calculated by the assembler itself.  
<p>
<p><pre>  : )      0 #   ib ;</pre><p>
<p><pre>  : )+     @increment  ia ;</pre><p>
<p><pre>  : )-     @increment  da ;</pre><p>
<p><pre>  : -(     @increment  db! ;</pre><p>
<p><pre>  : +(     @increment  ib! ;</pre><p>
<p><pre>  </pre><p>
<p><pre>  : d)     dup abs # offset?  swap 0&lt;  if db  else ib  then ;</pre><p>
<p><pre>  : d)!    dup abs # offset?  swap 0&lt;  if db! else ib! then ;</pre><p>
<p><pre>  : push   -( str ;</pre><p>
<p><pre>  : pop    )+ ldr ;</pre><p>
Examples: 
<p><pre>    top  r6 byte )+ ldr</pre><p>
<p><pre>    top  up 8 d)    ldr</pre><p>
<p>

See: <code><A href="_smal_AM#3C">str</A></code> 
<p>

<hr><h3><A name="33">little-endian</A> ( -- )</h3>
<br>
Switches assembler to little-endian target code 

<hr><h3><A name="34">mla</A> ( r-dest r-op1 r-op2 r-add )</h3>
<br>
Assembles a multiply-and-accumulate instruction, r-dest is r-add + (r-op1*rop2) 

<hr><h3><A name="35">mul</A> ( r-dest r-op1 r-op2 )</h3>
<br>
Assembles a multiply instruction.  

<hr><h3><A name="36">next</A> ( --  )</h3>
<br>
Assembler macro which assembles the <code><A href="_smal_AG#36">next</A></code> 
routine, which is the Forth address interpreter.  
<p>
In RISC OS Forthmacs this is one single instruction.  
<p><pre>       pc  ip  )+  ldr</pre><p>

<hr><h3><A name="37">nop</A> ( --  )</h3>
<br>
Assembler macro, equivalent to 
<p><pre>  r0 r0 mov</pre><p>

<hr><h3><A name="38">pcr</A> ( addr -- pc offset  )</h3>
<br>
Assembler macro, expects an address on the stack and calculates its address 
offset from <code><A href="_smal_BI#50">pc</A>.</code> The addressing mode is 
also set.  

<hr><h3><A name="39">return</A> ( --  )</h3>
<br>
macro for 
<p><pre>  pc  lk  mov</pre><p>

<hr><h3><A name="3A">s</A> ( --  )</h3>
<br>
modifier, the instruction will set the flags according to the result.  default 
for tst, teq tstp teqp cmp cmn cmpp cmnp.  

<hr><h3><A name="3B">stm</A> ( rx1 rx2 .. rxn  n#  r-adr --  )</h3>
<br>
Store multiple registers to the address pointed to by r-adr, an addressing modes 
must be defined.  
<p>

See: <code><A href="_smal_AB#31">ldm</A></code> for more details.  

<hr><h3><A name="3C">str</A> ( r-data r-adr operand2 --  )</h3>
<br>
r-data is stored to memory, the default is word (32-bit) wide, but the modifier <code><A href="_smal_BO#26">byte</A></code> 
sets this byte-wide access.  
<p>

See: <code><A href="_smal_AC#32">ldr</A></code> 

<hr><h3><A name="3D">swi</A> ( swi# --  )</h3>
<br>
assembles a swi instruction, the number is swi#.  

<hr><h3><A name="3E">swix</A> ( swi# --  )</h3>
<br>
assembles a swix instruction, the number is swi#.  

<hr><h3><A name="3F">swp</A> ( r-dest r-base r-source --  )</h3>
<br>
assembles a swp instruction if Arm3-code is allowed by <strong>arm3</strong> 

<hr><h3><A name="40">t</A> ( --  )</h3>
<br>
modifier, force -T pin.  

<hr><h3><A name="41">^</A> ( --  )</h3>
<br>
modifier, force access to user-mode registers.  
</body>
</html>
