Extended BASIC Assembler

Version 1.77, 1 March 1998
by Darren Salt <arcsalt@spuddy.mew.co.uk>
Contains code from v1.00 and v1.30 by Adrian Lees


It is recommended that you read through this document before first using
ExtBASICasm, and check through it quickly when upgrading. It's entirely
possible that one of the extras the module provides may cause a clash with
existing programs, for example the APCS-R register names clashing with
variables used as register names.

Note also that APCS-R register names are disabled by default.


The module ExtBASICasm provides a patch for versions 1.05, 1.06, 1.14 and
1.16 of BASIC V, as supplied with RISC OS 3.1, 3.5, 3.6 and 3.7 respectively,
to allow the direct use of the extra instructions provided by the ARM3, ARM6,
ARM7, ARM8 and StrongARM processors. The missing floating-point and general
coprocessor instructions, and some assembler directives more familiar (and a
few unfamiliar) to Acorn Assembler users have been added; also the APCS-R
register names may be used.

To make the necessary changes to the BASIC module it must be located in RAM.
The ExtBASICasm module will therefore attempt to RMFaster the BASIC module
which will require a small amount of memory in the RMA, in addition to that
required by the ExtBASICasm module itself. Attempting to run it while BASIC
is active and in ROM will not work - try "*RMFaster BASIC" at the BASIC
prompt and you'll see why.


Enabling ExtBASICasm


Unlike earlier versions, this version is initialised into a dormant state
whenever you start up the BASIC interpreter, eg. by double-clicking on a
BASIC program or by typing BASIC at the * prompt.

You can enable or disable the extensions by using the assembler pseudo-op
	EXT n
where n is 0 to disable and 1 to enable. (Other values are currently mapped
to 1; do not rely on this.)

Setting any of the three extension OPT bits is also enough to enable
ExtBASICasm.

Certain extensions remain enabled at all times: specifically, ALIGN always
zero-fills, and the ".foo = bar" bug remains fixed. I don't think that
this'll inconvenience anybody :-)


ExtBASICasm uses the BASIC data word TIMEOF, which is documented as "unused"
for all versions of BASIC V which it recognises, for its 'enabled' flag.


The instructions added by the module are as follows:


Extensions


	Optional parts are enclosed in {}

OPT	<value>
	Bit 4:	ASSERT control (1 = enabled on 'second pass')
	Bit 5:  APCS register names (1 = enabled)
	Bit 6:	UMUL/UMULL control (0 = short forms, 1 = long forms)

ALIGN
	Zero-initialises the memory if required.

ALIGN	<const>[,<const2>]
	Aligns to a multiple of const bytes plus an optional offset. const
	must be a power of 2 between 1 and 65536; const2 must be between 0
	and const-1 (default is 0). Also zero-initialises the memory.
	P% becomes (P% AND const-1)+const2; O% is also updated if necessary.
	Examples:
		ALIGN 4
		ALIGN 32
		ALIGN 16,8

MUL{cond}{S}	Rd,Rm,#<const>
	variable length; Rd=Rm if <2 ADD/RSB
	May cause 'duplicate register' if Rd=Rm and const is not simple - ie.
	not 0, (2^x)-1, 2^x, (2^x)+(2^y)

MLA{cond}{S}	Rd,Rm,#<const>,Ra
	variable length; Rd=Rm if <2 ADD/RSB
	Rd=Ra causes 'duplicate register' error if const is not simple, as
	for MUL; Rd=Rm=Ra is special in that MLA Rd,Rd,#c,Rd = MUL Rd,Rd,#c+1
	If Rd=Ra and const=0, no code is generated (none necessary).

DIV		Rq,Rr,Rn,Rd,Rt [SGN Rs]
	Integer division by register
	Rq = quotient		Rn = numerator		Rt = temporary store
	Rr = remainder		Rd = denominator	Rs = sign store
	If Rs omitted then division is unsigned.
	Rr may be same register as Rn *or* Rn may be same as Rs.
	All other registers must be different.
	Rt and Rs (if specified) are corrupted.

DIV		Rq,Rr,Rn,#d[,[Rt]] [SGN Rs]
	Integer division by constant
	Registers as above
	If Rs omitted then division is unsigned.
	If Rt omitted and is required for this division then error given.
	All registers must be different.
	If specified, Rt and Rs are corrupted.
	(Uses generator to build code - fast but may be long)
	Notes:	Uses Fourier method. For unsigned values, this is fixed to
		handle unsigned top-bit-set properly, *except* for div by 3
		which works for values up to &C0000000. Ideas and code
		gratefully received...

	*** Note no conditional for either form of DIV!

ADR{cond}L	Rd,<const>
	Fixed length (two words)

ADR{cond}X	Rd,<const>
	Fixed length (three words)

ADR{cond}W	Rd,<const>
	Addressing relative to R12, one to three words
	<const> MUST be defined before it is used
	Adds/subtracts const to/from R12, storing result in Rd
	Up to you to ensure that R12 correctly set up...

LDR, STR
	xxx{cond}{B}W	Rd,<offset>
	  Load/store word/byte at [R12,#<offset>]

	LDR{cond}{B}L	Rd,<address>
	LDR{cond}{B}L	Rd,[Rm,#<offset>]{!}
	LDR{cond}{B}WL	Rd,<offset>
	STR equivalents
	  Addressing range is 1MB; some offsets outside this range are also
	  valid. Lengths are (in words):
	    LDR        2  ADD/SUB Rd,Rm,#a:LDR Rd,[Rd,#b]
	    LDR ...]!  2  ADD/SUB Rd,Rm,#a:LDR Rd,[Rd,#b]!
	    STR        3  ADD/SUB Rm,Rm,#a:STR Rd,[Rm,#b]:SUB/ADD Rm,Rm,#a
	    STR ...]!  2  ADD/SUB Rm,Rm,#a:STR Rd,[Rm,#b]!

	LDR{cond}{B}L	Rd,{Rn},<address>
	LDR{cond}{B}L	Rd,{Rn},[Rm,#<offset>]
	LDR{cond}{B}WL	Rd,{Rn},<offset>
	STR equivalents
	  [{Rn} is NOT optional]
	  Equivalent to the LDR/STRs above, except that Rn (rather than Rd)
	  is used to hold the address; always two words long. For example,
	  ADRL R0,wibble:LDR R1,[R0] may be replaced with LDRL R1,{R0},wibble
	  - one word shorter.
	  Rd=Rn is not allowed.
	  Assembles to ADD/SUB Rn,Rm,#a : LDR/STR Rd,[Rn,#b]!

	LDR{cond}{B}L	Rd,[Rm],#<offset>
	STR equivalent
	  Addressing range is 1MB; some offsets outside this range are also
	  valid. Two words long.
	  Assembles to LDR/STR Rd,[Rm],#b:ADD/SUB Rm,Rm,#a

	NOTE: You should try to avoid using *sequences* of LDRLs or STRLs -
	there is usually a more efficient way.

	Also supported are the new ARM7M and StrongARM forms:
	LDRxxH, LDRxxSH, LDRxxSB and STRxxH
		The standard forms are used, with the following exceptions:
			- no shifts
			- constant offsets in range -255 to 255
		The W forms are also supported.
		Long LDR{H|SH|SB} not yet implemented.

SWAP{cond}[S]	Rd,Rn
	Swaps Rd and Rn without using temporary store.
	Uses EOR method, is therefore three words long.
	If S is specified, then the flags are set according to Rn.

VDU{cond}{X}	<const>
	= SWI "OS_WriteI"+<const>
	With X present, XOS_WriteI is used instead.

NOP{cond}
	= MOV{cond} R0,R0

BRK{cond} [#<const>]
	Undefined instruction. If <const> is specified, then R14 is set to
	this value before the undefined instruction trap is taken.

EQUx, DCx, =
	xxx <value>[,<value>]^
	Extended form of EQUD, EQUW, DCB, etc.
	Instead of, eg. DCD 0 : DCD 12 : DCD branch
	you can now use DCD 0, 12, branch

Negative constants
	Allowed in the following instructions:
		ADD, SUB	ADC, SBC	ADF, SUF
		AND, BIC	MOV, MVN	MVF, MNF
		CMP, CMN	CMF, CNF	CMFE, CNFE
	If the constant is invalid for one of these, it is negated or
	inverted, as appropriate, and the instruction changed to the other of
	the pair (eg. ADC becomes SBC). If the constant is still invalid, the
	"bad immediate constant" error is generated as normal.


ARMv2a (ARM3, ARM250) and later


SWP{cond}{B}	Rd,Rm,[Rn]


ARMv3 (ARM6) and later


MRS{cond}	Rd,<psr>
	<psr> may only be CPSR or CPSR_all, or SPSR equivalents

MSR{cond}	<psr><f>,Rm
	<psr> must be CPSR_ or SPSR_, and <f> must be one of the following:
		_ctl	control bits only
		_flg	flag bits only
		_all	both
	or a combination of _c, _x, _s, _f. _c_f, _cf, _f_c, _fc are all
	equivalent to _all.


ARMv4 (ARM8, StrongARM) and later


UMUL, SMUL, UMLA, SMLA:
	xxx{cond}{S}	Rl,Rh,Rm,Rn

	The 'official' forms UMULL, SMULL, UMLAL, SMLAL are used *instead of*
	the 'short' forms if OPT bit 6 is set.
	Unfortunately it's not possible to allow both forms at once: how
	would you interpret "UMULLS" - UMUL condition LS or UMULL with S bit?


Floating-point instructions


Floating point coprocessor data transfer

LDF, STF:
	xxx{cond}prec	Fd,[Rn]{,#<offset>}
	xxx{cond}prec	Fd,[Rn,#<offset>]{!}
	xxx{cond}prec	Fd,<label | const>
	xxx{cond}precW	Fd,<offset>

LFM, SFM:
	xxx{cond}	Fd,m,[Rn]{,#<offset>}
	xxx{cond}	Fd,m,[Rn,#<offset>]{!}
	xxx{cond}	Fd,m,<label | const>

LFM{cond}{stack}	Fd,m,[Rn]{!}
SFM{cond}{stack}	Fd,m,[Rn]{!}
LFS{cond}{stack}	Rn{!},<fp register list>
LFS{cond}{stack}	Rn{!},<fp register list>

	LFM, SFM, LFS and SFS use extended precision. The <fp register list>
	is much as for LDM and STM, with restrictions: you must specify a
	register or a sequence of registers, and the list must be compatible
	with LFM and SFM - eg.
	LFSFD R13!,{F3}		LFMFD F3,1,[R13]!	LFM F3,1,[R13],#12
	SFSFD R13!,{F5-F0}	SFMFD F5,4,[R13]!	SFM F5,4,[R13,#-36]!
	LFSDB R13,{F1,F0}	LFMDB F0,2,[R13]	LFM F0,2,[R13,#-24]
	- for each row, all the instructions have the same effect.
	Available stack types are DB, IA, EA, FD.
	Note that example 2 wraps around - F5, F6, F7, F0 _in that order_.

* Floating point coprocessor register transfer

FLT{cond}prec{round}	Fn,Rd
FIX{cond}{round}	Rd,Fn
WFS, RFS, WFC, RFC:
	xxx{cond}	Rd

* Floating point coprocessor data operations

ADF, MUF, SUF, RSF, DVF, RDF, POW, RPW, RMF, FML, FDV, FRD, POL:
	xxx{cond}prec{round}	Fd,Fn,<Fm | #value>

MVF, MNF, ABS, RND, SQT, LOG, LGN, EXP,
SIN, COS, TAN, ASN, ACS, ATN, URD, NRM:
	xxx{cond}prec{round}	Fd,<Fm | #value>

* Floating point coprocessor status transfer

CMF, CNF, CMFE, CNFE:
	xxx{cond}	Fm,<Fn | #value>


General co-processor instructions


* Coprocessor data operations

CDO, CDP:
	xxx{cond}	CP#,copro_opcode,Cd,Cn,Cm{,<const>}

	The values of copro_opcode and the optional constant must lie within
	the range 0..15.

* Coprocessor data transfer

MCR, MCR:
	xxx{cond}	CP#,<const1>,Rd,Cn,Cm{,<const2>}

LDC, STC:
	xxx{cond}{L}{T}	CP#,Cd,[Rn]{,#offset}
	xxx{cond}{L}	CP#,Cd,[Rn{,#offset}]{!}

	L and T may be specified in either order. So if you want an
	unconditional LDC with both flags set, use LDCTL or LDCALLT since
	LDCLT will be assembled as "LDC with T and L clear, if less than".
	The T flag is retained for compatibility reasons; it is automatically
	set anyway.


Assembler directives


* Conditional - will STOP if expression is FALSE:

ASSERT	<expression>

	Bit 4 of the OPT value controls ASSERT. When it and bit 1 are zero,
	ASSERTs are ignored.

* Constants

=	<const|string>
	The bug causing an error when used in the form
		.label = "something"
	has been fixed.

EQUFS, EQUFD, EQUFE, EQUFP, EQUF
	xxx	<const>

also the DC.. and |.. equivalents

	These directives accept an expression that evaluates to either an
	integer or a real number. The result is then converted into the
	required precision and stored in the object code at P%, or O% if
	indirect assembly is being used. EQUF is a synonym for EQUFD.

	directive	EQUFS	EQUFD	EQUFE	EQUFP
	bytes used	4	8	12	12

EQUP, DCP, P
	xxx	<string>,<const>
	xxx	<const>,<string>
	Fixed-length string allocation. If the string is too short, then the
	remaining space is padded with nulls; if it is too long, it is
	truncated to the specified length.

EQUPW, DCPW, PW
	xxx	<pad_byte>,<string>,<const>
	xxx	<pad_byte>,<const>,<string>
	Like EQUP, except that you specify the padding byte.

EQUZ, DCZ, Z
	xxx	<string>
	EQUS with automatic zero termination

EQUZA, DCZA, ZA
	xxx	<string>
	Equivalent to EQUZ followed by ALIGN

	Note: *ALL* the EQU... directives (and their equivalents) may have
	their arguments repeated as described in the Extensions section.

FILL, %
	xxx{B|W|D}	<const>{,{<value>}}
	Allocates <const> bytes of memory, initialised to <value> (or 0).
	B, W and D represent data lengths as for EQU; if omitted, then byte
	length is assumed. If the comma is present but no fill value, this is
	equivalent to adding the constant to P% (and O% if appropriate).

FILE	<filename>
	Loads the specified file, allocating just enough space for it.

^	<offset>
	Initialises the workspace address pointer to the given value.
	This is used and updated by #.
	Typical use:
		^ 0
		...
		# flags, 4
		...
		LDRW	R0,flags

#	<variable>, <length>
	Sets the variable to the current value of the workspace address
	pointer, which is then incremented by <length>.
	This does not alter P% or O%.
	(Note: the variable is assigned before the length is evaluated.)

COND	<cond>
	Sets the condition code for use with = (when used as a condition
	code). It may be supplied as a condition code literal, a number (0 to
	15), or a string containing a condition code literal. For example,
	all of the following are equivalent:
		COND	7		; number
		COND	VC		; condition code literal
		COND	vc		; condition code literal
		COND	"Vc"		; string containing cond. code lit.
	Example code:
		COND	LT		; select LT condition code
		MOV=	R0,#2		; MOVLT  R0,#2
		MOV=S	R1,R2		; MOVLTS R1,R2


Notes


* Registers are specified in the following form:

	ARM registers:			R0..R15
		using APCS-R names:	A1..A4 V1..V6 SL FP IP SP LR PC
	Floating-point registers:	F0..F7
	General co-processor registers:	C0..C15

  To help cope with any potential name clashes, the floating point and APCS-R
  register names (except for PC) must be terminated with some character not
  valid in a variable name in order to be recognised; they are otherwise
  treated as part of a variable name.

* Coprocessor numbers (CP#) may be specified using either of the following
  forms:

	P0..P15
	CP0..CP15

* Wherever a register or coprocessor number is specified, an expression may
  be substituted in the usual manner allowed by BASIC V. This module employs
  the routines used within BASIC to evaluate all expressions (eg. register
  numbers, offsets and labels) and hence its interpretation of expressions is
  guaranteed to be the same as BASIC.


Credits


  Adrian Lees (last known, AFAIK, at A.M.Lees-CSEE93@@cs.bham.ac.uk):
  - for the original ExtBas and the EQU comma extension, and for the use of
    some of his code

  Michael Rozdoba (formerly of TechForum / Acorn Answers):
  - for including the "General recursive method for Rb := Ra * C, C a
    constant" from Appendix C of the manual for Acorn's desktop assembler,
    and the late Acorn Computing (Sept 1994) for printing it;
  - for the division code generator (Archimedes World, May 1995), which was
    included, slightly trimmed, and debugged to handle top-bit-set unsigned
    numbers properly... I hope!

  Dominic Symes of !Zap fame (dominic.symes@armltd.co.uk):
  - for pointing out that ANDEQ R0,R0,R0 could usefully be replaced by DCD 0

  Martin Willers (m.willers@tu-bs.de):
  - for bug hunting :-)

  Reuben Thomas (rrt1001@cam.ac.uk):
  - for pointing out it might be useful to disable the APCS register names,
    suggesting B/W/D suffix for FILL (and %) and -ve immediate constants, and
    bug encountering

  Mohsen Alshayef (mohsen@qatar.net.qa):
  - for some useful long MUL, STRH and [CS]PSR info
