
ARM Assembler
*************


Using the assembler
===================

Coding in ARM assembler is very straightforward.  If you have used an ARM 
Assembler before, you will already know the instructions.  Otherwise I 
recommend 

'Acorn Risc Machine Family Data Manual, Prentice Hall, ISBN 0-13-781618-9' 

for further information.  It tells you everything about the ARM cpu you should 
know and covers the whole instruction set plus lots of hardware details.  In 
fact it was my only source of information at hand when starting this 
RISC OS Forthmacs port.  

This documentation is by far not complete, but it covers most aspects.  If you 
are writing code, just have a look at some kernel sources and see how it 
works.  Whenever you are not sure about the produced code, have a look at it 
by 
    code demo ...... c;
    see demo
and you have the code just in front of you.  

Also there is a chapter "Assembler Tutorial".  

As most Forth assemblers are, this assembler is really just a vocabulary which 
contains the words for assembling ARM code.  It is "activated" by adding the 
assembler vocabulary to the search order.  There are also some common ways to 
control assembly which do more than just put the assembler vocabulary in the 
search order.  It also uses a 'data first - operand last' syntax as Forth 
generally does.  

Lets now have a look at some kernel source and see what the syntax looks like 
in the forth assembler syntax and in the original Acorn Syntax ( displayed by 
the disassembler utility).  

code count      (s adr -- adr1 cnt ) 
        r0      top     mov
        top     r0 byte )+ ldr
        r0      sp      push c;
see count
code count 
 (   a148 )  mov     r0,r10
 (   a14c )  ldrb    r10,[r0],#1
 (   a150 )  str     r0,[r13,#-4]!
 (   a154 )  ldr     pc,[r8],#4


General syntax
==============

All instructions follow the general syntax: 

        ARM:        opcode  r-dest r-n operand
        Forth:      r-dest r-nsrc operand modifiers  condition op-code


The brackets and commas in the original assembler source are replaced by 
spaces, 'addressing mode indicators' and macros.  The 

ia [r0],#4 will be )+ , indicating a postincrement by 1 or 4 according to 
byte/word access.  

push is a macro meaning 
    -( str.

c; at the end assembles a next instruction 
    ldr     pc,[r8],#4
    pc  ip )+  ldr
and quits assembling.  

The operands ( registers or numbers ) must appear in the correct order 
followed by modifiers.  


Conditions
==========

All instructions can be conditionally executed on ARM cpus.  All condition 
codes are implemented, they should be preferably written just before the 
opcode itself.  You don't have to write down the AL condition, it is the 
default.  

Note: According to ARM standards, NV is not implemented and should never be 
used because of future instruction set extensions.  

Condition codes available : EQ NE CS CC MI PL VS VC HI LS GE LT GT LE AL 


Shifts
======

There a numerous shifts for operators available, 

ASL #ASL LSL #LSL LSR #LSR ASR #ASR ROR #ROR RRX , 

all shift operator leaded by a # mean count of shift specified by a number, 
otherwise by a register.  

This assembler is clever enough to find out shifted immediates itself, so you 
don't have to worry about lines like 
    top th f0 #  td 24 #lsl mov
just write 
    top th f0000000 # mov
instead.  


Register usage
==============

Registers R0 - R6 are available for use within code definitions.  Don't try to 
use them for permanent storage, because they are used by many code words with 
no attempt to preserve the previous contents.  

    r9      user area pointer       up
    r10     top-of-stack register   top
    r11     return stack pointer    rp
    r12     instruction pointer     ip
    r13     stack pointer           sp
    r14     link register           lk
    r15     pc + status + flags     pc
Note: In future CPU Versions, the internal structure of the pc register will 
be different, it seems to be better, to imagine pc and status register as two 
registers.  The hardware-errors and the .registers instruction know about this 
already.  


Structured programming
======================

This assembler supports structured programming not by using labels but common 
forth-like structures instead.  The structures do not have to fit on one line, 
and they may be nested to any level.  The range of the branches assembled by 
these structures is not restricted.  

Implemented structures are: 
    set the flags                   \ produce the condition
    condition if ...                \ if condition is met do this
              else ...              \ otherwise this
              then
    
    
    
    begin ....
          set the flags             \ produce the condition
    condition     while ...         \ do this when condition met
          ( you may set the flags )
    ( condition ) repeat            \ the repeat is normally always done
                                    \ but you may also test for another
                                    \ condition.
    
    
    begin ...
          set the flags             \ produce the condition
    condition until                 \ leave the loop when condition is met
    
    
    
    begin ... again                 \ loop until whatever may happen


Porting
=======

The ARM assembler can be used also by other Forth systems, all hardware 
specific parts are written portable and can be changed in case of problems 
very easily.  So a 68k-Forthmacs can metacompile ARM code by this assembler 
without any change.  In fact, the very first metacompilation of this 
RISC OS Forthmacs took place on an ATARI-ST having 1MB Ram and a 720k disk.  


Byte-sex
========

Both byte-sexes can be produced by this assembler, this allows portable 
assembler code for all ARM CPUs.  LITTLE-ENDIAN and BIG-ENDIAN do the switch.  


ARM2/3/6
========

The assembler takes care of some cpu dependent restrictions, ARM2 disallows 
the more advanced instructions, ARM3 allows them.  


Forth Virtual Machine Considerations
====================================

The Forth parameter stack is implemented with r13, but the name sp should be 
used instead of r13, in case the virtual machine implementation should change.  

The return stack is implemented with r11, and the name rp should be used to 
refer to it.  

The base address of the user area ( the user pointer) is r9 but should be 
referred to as up. User variable number 124 (for instance) may be accessed 
with the 
    up td 124 d)
addressing mode.  There is a macro 'user which will assemble this addressing 
mode for you.  

The interpreter pointer ip is r12.  The interpreter is post-incrementing, so 
when a code definition is being executed, ip points to the token after the one 
being executed.  A "token" is the number that is compiled into the dictionary 
for each Forth word in a definitions.  For RISC OS Forthmacs, a token is a 
32-bit absolute address.  


Assembler Glossary 
===================


____ pc             ( -- n )                                
portable name for the pc register 
____

____ sp             ( -- n )                                
portable name for the stack pointer 
____

____ up             ( -- n )                                
portable name for the user pointer 
____

____ ip             ( -- n )                                
portable name for the instruction pointer 
____

____ rp             ( -- n )                                
portable name for the return stack pointer 
____

____ top            ( -- n )                                
portable name for the top of stack register 
____

____ 'user          ( --  )        'name'                   
Executed in the form: 
         top   'user <name>   ldr
<name> is the name of a User variable.  Assembles the appropriate addressing 
mode for accessing that User variable.  

In RISC OS Forthmacs, the addressing mode for User variables is 
                 up #n d)
where #n is the offset of that variable within the User area.  
____

____ ;code          ( --  )                  C,I            semi-colon-code
                    ( --  )
Used in the form: 
                 : <name>  ... create ... ;code ... c; (or end-code)
Stops compilation, terminates the defining word <name>, executes 
arm-assembler, and does do-entercode. 

When <name> is later executed in the form: 
                 <name> <new-name>
to define the word <new-name>, the later execution of <new-name> will cause 
the machine code sequence following the ;code to be executed.  
____
This is analogous to does>, except that the behavior of the defined words 
<word-name> is specified in assembly language instead of high-level Forth.  

;code calls do-entercode, this is implementation specific and used to assemble 
some instructions to set the parameter-field-address on top the code needed to 
start the assembler code with the body of the defined 
         top     sp      push
         top     lk      th fc000003 # bic
From version 3.1/2.62 on this isn't the case any more, you have to write those 
instructions yourself.  

See: code does> 


____ adr            ( rx addr --  )                         
Assembler macro with the following effect: 

addr is moved to register rx.  Within short distances this is achieved by a 
pcr instruction, otherwise it's more complicated.  

Note: The address will be relocated correctly! 
____

____ aligning?      ( -- addr  )                            
variable holding flag, true means assembler does aligning on its own.  
Implemented for CPU independent metacompiling.  
____

____ alu-instructions  ( r-dest r-op1 op2{r-op2|imm} --  )   
Available instructions with this syntax: 

AND EOR SUB RSB ADD ADC SBC RSC TST TEQ CMP CMN ORR BIC  

These instructions all have two data-inputs to the alu, the register r-op1 and 
the operand op2.  This can be another register or an 8-bit immediate.  

The register r-op2 can be "shifted" in any way specified by a shift specifier, 
either a 5-bit integer or another register plus the shifted register.  The 
immediate operand can be rotated right by 2*(4-bit-integer).  

If you give "large" literals as arguments, the assembler will generate the 
correct shifts itself.  

The # modifier declares an immediate operand as in: \ top r0 3 # add 

The S modifier will set the flags according to the result, the instruction 
will be ADDS instead of ADD .  
____
MOV and MVN are somewhat different, the operand r-op1 isn't needed.  Also, 
both can handle "big" immediates themselves, 
         top th 12345678 # mov
won't be a problem, MOV assembles all instructions needed.  

CMP and CMN can both handle negative immediate operandes, they try to find out 
which operand is possible.  


____ asm-allot      ( n --  )                deferred       
Allocates n bytes in the dictionary.  The address of the next available 
dictionary location is adjusted accordingly.  

default allot, implemented for ( cpu independent ) metacompiling.  
____

____ arm-assembler  ( --  )                                 
Execution replaces the first vocabulary in the search order with the 
arm-assembler vocabulary, making all the assembler words accessible.  
____

____ big-endian     ( -- )                                  
Switches assembler to big-endian target code 
____

____ branch         ( addr --  )                            
Assembles a branch instruction to here.  Can be modified by dolink and all 
condition codes.  
____

____ byte           ( -- )                                  
modifier for the assembler, memory accesses mean byte wide access 
____

____ code           ( --  )        'name'    M              
A defining word executed in the form: 
                 code <name> ... end-code or c;
Creates a dictionary entry for <name> to be defined by a following sequence of 
assembly language words.  Words thus defined are called code definitions or 
primitives.  Executes arm-assembler and sets the opcode defaults .  

This is the most common way to begin assembly.  


See: end-code c; 
____

____ c;             ( --  )                                 c-semi-colon
Terminates the current code definition and allows its name to be found in the 
dictionary.  

Sets the context vocabulary to be same as the current vocabulary (which 
removes the arm-assembler vocabulary from the search order, unless you have 
explicitly done something funny to the search order while assembling the 
code).  

Executes next to assemble the "next" routine at then end of the code word word 
being defined.  The "next" routine causes the Forth interpreter to continue 
execution with the next word.  


This is the most common way to end assembly, calls end-code. 
____

____ conditions     ( --  )                                 
All instruction are executed only if the correct condition is met, the 
assemblers default is AL (always), but these are also available: 

EQ NE CS CC MI PL VS VC HI LS GE LT GT LE AL 
____

____ decr           ( reg n# --  )                          
Macro, n# will be subtracted from reg.  
____

____ dolink         ( --  )                                 
modifier for branch instruction, the current pc will be saved to the link 
register.  
____

____ end-code       ( --  )                                 
Terminates a code definition and allows the <name> of the corresponding code 
definition to be found in the dictionary.  

The context vocabulary is set to the same as the current vocabulary (which 
removes the arm-assembler vocabulary from the search order, unless you have 
explicitly done something funny to the search order while assembling the 
code).  

The next routine is not automatically added to the end of the code definition.  
Usually you want next to be at the end of the definition, but sometimes the 
last thing in the definition is a branch to somewhere else, so the next at the 
end is not needed.  


See: c; 
____

____ entercode      ( --  )                                 
Starts assembling after stack checking, setting the assembler defaults and 
switching to arm-assembler. 

See: do-entercode ;code 
____

____ get-link       ( -- reg --  )                          
Assembler macro, equivalent for: 
    lk fc000003 # bic
this is useful to get the address after a branch instruction.  
    xxxxx dolink branch  ---+
      A) data ...           |
                            |
                            |
      B) top get-link  <----+
So after branching to B), top will be set to A) 
____

____ incr           ( reg n# --  )                          
Macro, n# will be added to reg.  
____

____ label          ( --  )        'name'    F83            
A defining word used in the form: 
    label <name> ... end-code
    label <name> ... c;
Creates a dictionary entry for <name> consisting of a following sequence of 
assembly language words.  When <name> is later executed, the address of the 
first word of the assembly language sequence is left on the stack, executes 
arm-assembler. 


See: end-code 
____

____ ldm            ( rx1 rx2 .. rxn  n#  r-adr --  )       
Load multiple registers from the address pointed to by r-adr, an addressing 
modes must be defined.  

The register list is given by all register names (don't name a register twice) 
and the number of registers.  
     r0 r1 r2 r3 4   sp ia! ldm
This loads registers r0-r3 from the stack and sets the stack pointer to the 
next stack entry.  


See: ldr stm 
____

____ ldr            ( r-data r-adr operand2 --  )           
r-data is read from memory, the default is word (32-bit) wide, but the 
modifier byte sets this byte-wide access.  

The address is calculated using r-adr and the operand2.  It can be another 
register (the shift specified as usual by a 5-bit literal and a shift type) or 
a 12-bit immediate offset.  

operand2 can be added to or subtracted from r-adr according to the addressing 
mode defined by two letters.  The first tells whether (i)ncreasing or 
(d)decreasing should be used, the second whether the in/decreasing takes place 
(b)efore or (a)fter the memory access.  A "!" at the end tells "write-back" 
will take place.  So these modes are possible 
         da  ia  db  ib    \ decrease/increase after/before
         da  ia! db! ib!   \ as above plus write-back
____

Some macros make live a bit more easy, they are somewhat 68k alike, and must 
follow a byte modifier because an offset will be calculated by the assembler 
itself.  

    : )      0 #   ib ;
    : )+     @increment  ia ;
    : )-     @increment  da ;
    : -(     @increment  db! ;
    : +(     @increment  ib! ;
    
    : d)     dup abs # offset?  swap 0<  if db  else ib  then ;
    : d)!    dup abs # offset?  swap 0<  if db! else ib! then ;
    : push   -( str ;
    : pop    )+ ldr ;
Examples: 
      top  r6 byte )+ ldr
      top  up 8 d)    ldr


See: str 


____ little-endian  ( -- )                                  
Switches assembler to little-endian target code 
____

____ mla            ( r-dest r-op1 r-op2 r-add )            
Assembles a multiply-and-accumulate instruction, r-dest is r-add + 
(r-op1*rop2) 
____

____ mul            ( r-dest r-op1 r-op2 )                  
Assembles a multiply instruction.  
____

____ next           ( --  )                                 
Assembler macro which assembles the next routine, which is the Forth address 
interpreter.  

In RISC OS Forthmacs this is one single instruction.  
         pc  ip  )+  ldr
____

____ nop            ( --  )                                 
Assembler macro, equivalent to 
    r0 r0 mov
____

____ pcr            ( addr -- pc offset  )                  
Assembler macro, expects an address on the stack and calculates its address 
offset from pc. The addressing mode is also set.  
____

____ return         ( --  )                                 
macro for 
    pc  lk  mov
____

____ s              ( --  )                                 
modifier, the instruction will set the flags according to the result.  default 
for tst, teq tstp teqp cmp cmn cmpp cmnp.  
____

____ stm            ( rx1 rx2 .. rxn  n#  r-adr --  )       
Store multiple registers to the address pointed to by r-adr, an addressing 
modes must be defined.  


See: ldm for more details.  
____

____ str            ( r-data r-adr operand2 --  )           
r-data is stored to memory, the default is word (32-bit) wide, but the 
modifier byte sets this byte-wide access.  


See: ldr 
____

____ swi            ( swi# --  )                            
assembles a swi instruction, the number is swi#.  
____

____ swix           ( swi# --  )                            
assembles a swix instruction, the number is swi#.  
____

____ swp            ( r-dest r-base r-source --  )          
assembles a swp instruction if Arm3-code is allowed by ARM3 
____

____ t              ( --  )                                 
modifier, force -T pin.  
____

____ ^              ( --  )                                 
modifier, force access to user-mode registers.  
____
