![]() | ||
| Database | ![]() ![]() ![]() ![]() | Introduction |
SIR2002 database management is covered under a number of topics:
Many of the features are available through the menus and dialogs which are the normal way to manage a database on a regular basis.
SIR2002 provides a concurrent environment if multiple people need to update a database at the same time across a network.
A database is made up of one or more record types. Each record type contains variables. Each record type may have one or more variables defined as keys. Keys are used to identify each individual record.
A record may have secondary indexes allowing access by VisualPQL and PQLForms through values in non-key variables.
The definition of the database consists of overall information about the database plus definitions for each record type with definitions of each variable and any secondary indexes. This is collectively known as the Schema. Records and variables are retrieved by name and applications do not need to know anything about the physical organisation of data. i.e. Applications are independent of the physical structure of data. Every application that accesses data from a database uses the schema which means that the information is always verified and conforms to the schema definition. An application can access any record type within a database and can access multiple record types for complex processing.
The schema can be defined through interactive dialogs or by creating a set of commands and running them in a similar way to a running a program. The resulting database is exactly the same regardless of the way that it is defined.
The database definition can be modified even after the data has been loaded without, in many cases, having to
There is a second method to store data known as Tables. A table holds a single record type and is independent of any database. Tables are held on tabfiles. A tabfile can hold many tables. An application can operate on many tables and many tabfiles.
Some databases have a natural structure known as a Case structure. For
example, a medical database might hold information about patients; there could be
a number of record types such as patient demographics, visits, procedures, followups, etc. most with multiple occurrences but all of the information about a single individual makes up a case. i.e. A case is a set of records which all refer to one single entity.
A case structure can be easier to use. Queries are simpler to write and less
error prone within a case. A case structure can also be faster. Since all of the
information for any given case is available with a minimum of searching, access
is very fast.
A case structure can be found in many applications. The following guidelines may
help determine whether a particular set of data has a case structure.
The main index which SIR2002 maintains to locate the records is built from the key and the key determines how the record is physically stored in the database.
Because the keys go into the index, they are relatively more "expensive" to maintain than non-key variables. Do not declare more keyfields than needed to make records unique or to define relationships. Whenever there is a choice as to which variables to use for a key, use variables which are short and well defined. Use numeric codes wherever possible, especially in place of long alphabetic strings. Using strings as keys leads to misspellings and possible confusions as well as being inefficient. Avoid floating point numbers as key fields.
Secondary indexes can be used to retrieve data in other sequences.
The most efficient retrieval of a single record is one in which the whole key is specified. Since there is an index built from the key, the record can be directly retrieved.
Records can also be located from just part of the key. The higher level key variables must be specified to find records efficiently. Higher level keys should be the ones that are more likely to retrieve by.
Variables are always referred to by their name. Descriptive names are usually best. For example, a variable called '
Within a record type, variable names must be unique. Variables in different record types may have the same name. Beware of using the same name for different things in different record types in the same database. For example, it would be a mistake to use the name '
A categorical variable is a character string which has a limited number of values specified as an ordered list. When the variable is input as a string, it is compared to the list and the number which corresponds to the matching position in the list is stored instead of the value. This has the advantage that only valid entries are held and considerable space is saved. In programs and reports, the full string is displayed and retrieved.
For example, a categorical variable might be a list of the names of American states. If 'Alabama' were the first entry in the list, a variable entered as 'Alabama' results in '1' being stored.
The list is held in the data dictionary and is searched sequentially. It is a very simple and easy to use facility for short lists which are not updated very often. If there are hundreds or thousands of entries, or there is more information about each entry than just the name, or users need to modify the entries, use tables or record types with indexes to store this type of reference data.
The software handles any scaling issues involved. For example, if a scaled integer has a scaling factor of 2 (hundreds), and it is computed equal to 50 x 4, the database stores a 2. If the variable is printed, 200 is output. If used in another calculation, it would be 200. For all practical purposes it has the value 200, except it saves storage space.
The major limitation on scaled integers is that the maximum integer value is 2,147,483,643. If scaled to a negative power, this may not be large enough. For example, scaling to -2 for money, gives a maximum value of 21,474,836.43. (For larger money values, use whole cents in a double precision floating point variable:R*8)
For most data definition purposes, specify how the variable looks externally and the appropriate internal format is created automatically. However it is as well to understand these transformations.
To input a negative number (whether integer or floating point), simply precede the number with a minus sign (-).
For example, a variable might have two valid values such as 'Y', and 'N' meaning 'Yes' and 'No'. A further three missing values could be defined such as 'X' for 'Not relevant', 'R' for 'Refused to Answer' and 'Z' for 'Invalid Answer'. When producing statistics on that variable, only the Y and N answers are included as the others are defined as missing values.
Blanks may be declared as a missing value. If a numeric field is blank on input and blank has been defined as a missing value, the variable is stored as missing. If blanks are not declared as a missing value for the variable, zero is stored when a numeric field is blank on input.
Some existing computer files may not have explicit decimal points but may have an implied decimal point at a given position. For example, a format of F4.1 or a scale of (-1) means that the number in these four positions has one decimal place if an explicit decimal point is not quoted. If this field contained "0012", it is read as "1.2", however if it contained 1.234, then 1.234 is the value that would be stored.
The decimal positions describe an implicit input format and an implicit output format. They do not describe the maximum number of decimal positions that can be stored in a field. If a number is defined as D10.0, it can still be computed equal to "0.12345" and .12345 is stored.
On some schema definition commands (
Overview
There can be any number of different databases for different applications. A SIR2002 session can have any number of databases connected at any one time and one of these is the current or default database. Any database operations happen on the default database. SQL, Forms, PQLForms and VisualPQL can access multiple databases.UNLOAD and
RELOAD data.CASE Structured Databases
SIR2002 is a true relational database and databases can be created which are simply sets of record types. However SIR2002 also allows databases which can take advantage of structure in the data:The Common Information Record
In a case structured database some information is maintained about each case. This is held in the Common Information Record or CIR. The CIR is a summary of data about the case. It holds the case identifier, counts of records belonging to the case and other specified common variables. It typically holds values which only occur once in the case although it can hold a copy of the last value entered for variables which occur multiple times in the case.CASELESS Databases
Records in a caseless database have no single common element that relates them. If a single database has various records types which are each independent from the other (e.g. Parts, Customers, Employees), then it should be a caseless database or perhaps the various entities should be in individual case structured databases.Keys
Each record in a SIR2002 database is unique and that uniqueness is defined in terms of the values of keys. A record may have one or more variables which make up the key. No two records in the database can have the same combination of record type and key values.Order of Variables in Keys
In keys made up of multiple variables, the order in which the variables are
specified in the key is important.Joining Records with Keyfields
Data from two record types can be joined by using a keyfield that is common to both record types. If keyfields in different record types have the same name and are in the same order, these record types are implicitly joined. A record type with one extra keyfield defines a hierarchy. For example, one record type may have CUSTNO as the key, a second record type CUSTNO and ORDERNO, a third record type CUSTNO, ORDERNO, LINENO, etc. These can go down through many levels of record types if needed, and the implicit joins
make retrievals simpler and more efficient.Defining Variables
Every variable in a database has a name and a structured definition. Whenever a variable is entered into the database, it is checked to ensure that it conforms to the definition.Variable Names
Database variable names are up to eight characters, begin with a letter and can contain letters, numbers and four special characters (dollar sign $, hash sign #, at sign @ , underscore _ ). Letters in names are translated to uppercase. (These same naming conventions apply to all database names such as record names and passwords.)PHONE' is easier to remember than one called 'VAR112'.DATE' for variables when they are different dates, such as Birth Date, Visit Date, etc.Variable Data Types
Every variable is one of three basic data types:INTEGER
FLOATING POINT
STRING
Variable Special Data Types
Although every variable is stored as one of the three basic types, there are several special data types which automatically translate information as they are entered or retrieved. These are:CATEGORICAL
DATE
TIME
SCALED
Variable Formats
Variables can be defined in terms of an external format. These formats are:Aw
Iw
Date 'format'
Time 'format'
Fw.n
Variable Size
The size of a variable is defined by either the external format or internal storage type:Variable Internal Formats
Variables are stored as:I* 1 | 2 | 4
I*1 have a range of -128 to +123;
I*2 have a range of -32,768 to +32,763;
I*4 have a range of -2,147,483,648 to 2,147,483,643.R* 4 | 8
STRING*n
Variable Quality Control
Whenever variables are entered, they are checked to make sure that the input conforms to the defined data type, size and format. Several checks may be specified:Valid ValuesVariable RangesMissing ValuesBatch Data InputVariable Labels
Variables on screen displays and printed output can be displayed with the variable name or with an optional, longer label. Names as are used as labels if there is no further description.Value Labels
Labels can be defined for specific values of variables. For instance, for a variable Gender, the value 1 may have the label 'Male' and the value 2 may have the label 'Female'. These labels may be used in reports, etc. but generally, internally within programs, the numeric values are used.Decimal Points
When reading numbers from external files or specifying numbers in definitions or programs, an actual decimal point "." can be present in the number. In definitions and programs, if the decimal point is omitted, it is assumed at the right of the number. For example:
COMPUTE A = 1.2
COMPUTE B = 100VALUE LABELS, VAR RANGES, MISSING VALUES etc.), numbers can be specified. If the number has a decimal component, specify an explicit decimal point and the actual value regardless of any input definition of implied decimals. For example:
MISSING VALUES RESULT (99.99)


