
Forthmacs Database
******************

This chapter gives a short description about the RISC OS Forthmacs database 
system and includes a glossary of the user interface words.  

The model sees a database consisting of n RECORDS , each having the same 
number and sort of ITEMS .  Both the number of records and items have no size 
limit as long as 

      record-size = size(item0) + size(item1) + .... + size(lastitem)
      2^31 > record-size * number-of-records


The database 'xyz' consists of some files in a directory 
<Forthmacs$Dir>.data.xyz.  The file DATA holds the actual records, STRUCT 
defines the database structure, all other files called 0 ..  63 are index 
files.  

A database definition does two things, it constructs the DATA and STRUCT files 
- already existing data files are not overwritten -, it also defines forth 
words in the current vocabulary as a programmers user interface.  


Generating a database
=====================

Let us look at this example: 

        <database words
                address         /int
                wordname        /string 31
                vocname         32
        database>

First the database name is defined with <database words.  A directory 
'<Forthmacs$Dir>.data.words' and the data file '
<Forthmacs$Dir>.data.words.data' are made if the don't exist already.  

Also the word DATA:WORDS is constructed in the current vocabulary.  This word 
would be called a DATABASE DESCRIPTOR .  It is used in all database operations 
to describe the used database.  It is a pointer - plus some more information 
hidden to the user - pointing to itself if the database is NOT open, otherwise 
it points to an allocated data structure describing the database.  


Defining the record structure
=============================

Now all items within a record are defined.  The first word in a line is the 
items name.  You may later get information about the item, each item has a 
name, a size, possibly a type and an ordinary number as the 5-th item in a 
record.  

To make item handling more convenient for the user, there is an ITEM OBJECT 
created in the current vocabulary called DATABASE:ITEMNAME .  In our example 
there would be 3 items: 
    words:address
    words:wordname
    words:vocname

Each item name is followed by one (or two parameters).  Normally this is a 
decimal number telling the items size in bytes/address units.  Items having 
just a size lack one thing, the database system can NOT do any more advanced 
operations on them.  To do indexing for faster sorted access each item must 
also have a type.  

Currently 3 item types are supported, more might be added easily: 

\INT defines a cell wide signed integer 

\DINT defines a signed double integer 

\STRING 31  defines a 'counted string' with 31 characters plus count byte 

database> finishes the database declaration.  

IMPLEMENTATION RESTRICTION: As RISC OS filenames can't be longer than 10 
characters, a database can't have a longer name than that.  Don't expect 
upper/lowercase letters to be significant.  Use only characters A-Z/a-z/0-9 
for database and item names.  


Database operations
===================

After defining the database you will want to do some operations on the 
database.  First you have to open-database before you might put some data in 
it.  The following example will put all words from the context dictionary into 
such a database.  

Whenever you want to extend the database you have to do new-records. 

: include-current
        Data:words open-database
        astring 0 locals| rec buff |
        context token@ follow
        begin   another?
        while   1 Data:words new-records        \ extend the database by one record
                Data:words records# 1- is rec   \ set the current record#
                \ copy data to buff and set the database items
                dup buff !                      rec buff is words:address
                buff "copy                      rec buff is words:wordname
                context token@ >name buff "copy rec buff is words:vocname
        repeat
        Data:words close-database ;
forth include-current
hidden include-current

INCLUDE-CURRENT puts all words from the current vocabulary into the database.  
The begin ..  while ..  repeat loop threads through the vocabulary, for each 
word found a new record is constructed and the items are set.  
    rec buff is words:wordname
puts the data found at buff into the object WORDS:WORDNAME within record rec.  

You may use this database to search for an unknown word.  
    : test blword astring "move astring >r searchin words:wordname
           nip r> words:wordname ". ;

TEST SWA will look for the first matching word beginning with SWA in the 
database.  


Glossary
========


____ <database      ( --  )        'name'                   
<database name starts the definition of a database.  It is followed by pairs 
of words each defining one ITEM .  

    itemname itemsize
    Name 32

defines an item called Databasename:Name with a size of 32 chars. 

Instead of the decimal number describing the size of an item, there may also 
be one of the following keywords /INT is a single-cell signed integer, /DINT 
is a double-cell signed integer, /STRING XX is a "counted string" with a size 
of xx characters plus count byte.  

database> ends the definition of the database.  
____

____ database>      ( --  )                                 
ending a database definition 

See: <database 
____

____ open-database  ( descriptor --  )                      
The database defined by the descriptor is opened for use.  

See: <database close-database 
____

____ close-database  ( descriptor --  )                     
The database defined by the descriptor is closed.  This makes sure, that all 
data and index files are written to disc, the allocated memory is closed and 
the file handlers are closed.  

It is very important to do this after using a database, as this updates the 
whole STRUCT file.  If you don't close the database no corrupt files are left, 
BUT new-records are NOT included and valid indexes are NOT marked as valid.  

It would also be possible to do the updating when using the database, but as 
files are used for indexes and the descriptor, it would definitly slow down 
the process.  


See: open-database 
____

____ items#         ( descriptor -- items  )                
finds the number of items in a database.  The database must be opened before 
use.  

See: open-database 
____

____ records#       ( descriptor -- items  )                
finds the number of records in a database.  The database must be opened before 
use.  

See: open-database 
____

____ recsize#       ( descriptor -- items  )                
finds the size of a record in a database.  The database must be opened before 
use.  

See: open-database 
____

____ ?find-item     ( str descriptor -- item# true | false  )   
looks for an item called STR in the database.  If this is found, the items# 
and true are returned, otherwise false. The database must be opened before 
use.  

See: open-database 
____

____ index->record  ( idx item descriptor -- record  )      
calculates the record number from the index number IDX in the database 
DESCRIPTOR .  This ensures a valid index file is available.  If not, an index 
file is generated if the item has a supported type like /INT /DINT or 
/STRING NN .  

The index generation is rather complex.  It uses a HEAP-SORT mechanism and 
works on all database files.  This works in two possible modes 

1 - the amount of memory available allows the complete database plus indexfile 
to be loaded into memory.  This is >100 times faster than 

2 - in this mode, all comparing and swapping is done in the file.  This 
allowes BIG databases to be indexed but it is very slow.  


See: open-database 
____

____ get-record     ( record addr descriptor -- addr  )     
writes the contents of a database record to memory addr 

See: store-record swap-records 
____

____ store-record   ( record addr descriptor -- addr  )     
writes the contents at memory addr to a database record 

See: get-record swap-records 
____

____ swap-records   ( record1 record2 descriptor  --  )     
swaps two record in a database 

See: get-record store-record 
____

____ new-records    ( n descriptor --  )                    
creates n empty records in the database 
____

____ get-item       ( record item addr descriptor -- addr  )   
writes the contents of a database item to memory addr 

See: store-item 
____

____ store-item     ( record item addr descriptor --  )     
writes the contents at memory addr to a database item 

See: get-item 
____

____ search-item    ( parameter item descriptor -- index record  )   
Do an indexed search for PARAMETER in all items of the database.  According to 
the items type, PARAMETER can be an integer, a double-integer or a counted 
string.  

See: <database get-item index->record 
____

____ searchin       ( parameter --  ) 'item-object'         
This is a method to do a search for parameter on an item object.  The database 
must be open and the item-object must be an indexable type.  

See: <database open-database 
____

____ indexed        ( index -- record  ) 'item-object'      
This is a method to calculate the record number from an index on the item 
object.  

See: <database open-database 
____

____ .database      ( descriptor --  )                      
print some information about the database.  

See: open-database 
____

Item Objects
============

In the first part of this chapter the item objects were introduced.  We had 
defined an item object called 'words:wordname' This object has 5 methods it 
can handle 

Method 1 
    words:wordname ( record addr -- addr )
does a get-item on record and writes it's content to addr.  

Method 2 
    to words:wordname  ( record addr -- )
does a store-item on record and writes the content at addr to the item.  

Method 3 
    addr words:wordname
returns the objects address.  A warning is printed as there is probably no 
real use for this function to the user.  

Method 4 
    searchin words:wordname ( parameter -- ixd record )
searchin does an indexed search in the database for a given parameter.  
Returned are both the record and the index number.  If the parameter is the 
same as the found item, index and record are always correct, if they don't 
match, the next higher index is given.  Also note, that the database always 
knows about valid index files, if an index is invalid, reindexing is done 
before search.  If enough memory is available for the whole database file and 
the index, the reindexing is done in RAM which is pretty fast.  Otherwise all 
indexing is done in the file, which is VERY slow for large databases but 
allowes databases to be very large principally.  If you have a very large 
database, you might split the database in several files with smaller record or 
you could offer a preindexed database and would use another database for user 
added records.  

Method 5 
    indexed words:wordname ( index -- record )
indexed calculates the record number from an index on the item object.  So you 
could the phrase: 
    100 0 do i indexed words:wordname here words:wordname ". loop
to list the first 100 alphabetically sorted words in the database.  

