![]() | ||
| VisualPQL Procedures | ![]() ![]() ![]() ![]() | Tabulate |
TABULATE procedure selects and summarises data.
It then produces cross-tabulations with summary data, statistics or percentages.
See syntax.
A wide variety of cross-tabulations can be produced:
Syntax
TABULATE
FILENAME = { filename } /
expressions /
Record Filtering
BOOLEAN = (boolean expression)
SAMPLE = fraction, seed
WEIGHT = varname
Cell Statistics
TOTAL = varname ['label']...
COUNT = varname ['label']...
CSS = varname ['label']...
CV = varname ['label']...
CVERR = varname ['label']...
MAXIMUM = varname ['label']...
MEAN = varname ['label']...
MEDIAN = varname ['label']...
MINIMUM = varname ['label']...
MISSING = varname ['label']...
NORMALIZED = varname ['label'](n1)['label1'],(n2)['label2']...
PERCENT = varname ['label']...
QUANTILE = varname ['label'] n ['label1','label2'...]...
QUANTILE = varname ['label'](n1)['label1'],(n2)['label2']...
RANGE = varname (lo,hi) ['label']...
STDERR = varname ['label']...
STDEV = varname ['label']...
TSTATISTIC = varname ['label']...
USS = varname ['label']...
ISDNUMBER = n
Data Print Formatting
PRINTFORMATS= varlist (option)...
Page Formatting
PAGETITLE = 'string'
PAGELENGTH = n
PAGEWIDTH = n
COLLAPSE
Header Formatting
HEADERWIDTH = n
HEADERINDENTATION = n
HEADERDIVIDER = 'character'
NODIVIDERS
NOHEADERCENTER
Stub Formatting
STUBTITLE = option 'string'
STUBWIDTH = n
STUBINDENTATION = n
STUBCONTINUATION= n
STUBHEIGHT = n
STUBFILLER = 'character'
STUBDIVIDER = 'character'
Wafer Formatting
WAFERTITLE = option 'string'
NOWAFERTITLE
WAFERCENTER
WAFERSTART = n
WAFERINDENTATION = n
WAFERCONTINUATION = n
WAFERHEIGHT = n
Other Options
HTML
NOBOTTOMBORDER
NOLEFTBORDER
NORIGHTBORDER
NOTES= 'string'
FOOTNOTES= 'string'
WAFERNOTES= 'string'
NOZEROS
ZEROCHARACTER= 'character'
NOEMPTYROWS
DELETEMISSING
UPPERCASE
SPANNERS
DUMMY = varname 'label' ...
LANGUAGE = DUTCH | GERMAN
The FILENAME and the header expression
are the only required clauses.
FILENAME
|
Specify the filename
created by the procedure. If a filename is
not specified on a subsequent TABULATE procedure, the output
is appended to the last TABULATE output file.
If the filename is the special name
|
BY or THEN.Percentages can also be used in expressions.
The type of variable, Variable Modifiers, and statistics all effect the Cell contents.
header expression
| Defines the columns of the tabulation. This clause is required. |
stub expression
| Defines the rows of the tabulation. If this clause is specified, it precedes the header clause and is separated from the header clause by a comma. |
wafer expression
| Defines the sections of the tabulation. Sections are a number of two dimensional sections specified by the stub and header. If this clause is specified, it precedes the stub clause and is separated from the stub clause by a comma. |
| BY | Nest a variable within another thus producing combinations of the variable values within one dimension of the table. The asterisk (*) symbol may be used instead of the keyword. |
| THEN | Appends a second variable or statistical keyword to the set of variables displayed within one dimension of the table. The plus (+) symbol may be used instead of the keyword. |
| BY and THEN | Can be used in combination to create complex structures within a single dimension or expression. |
Errors may occur, not only if the table is misspecified, but also if it cannot fit the output page.
Expressions
A table consists of up to three components or dimensions: header, stub
and wafer. Each dimension is defined by an expression. Each expression
is identical in format but applies to a different dimension of the table. These
expressions name the
variables
to tabulate and their relation to each other in
the table. Separate the expressions with commas. The dimension defined by an
expression is determined by how many expressions there are and the expression's
position in the list.
The header defines variables whose values correspond to columns; the stub defines variables whose values correspond to rows. The rows and columns define a two-dimensional array called the wafer. The wafer is a higher level categerisation resulting in multiple individual row/column combinations. The table description consists of one to three expressions.
The TABULATE command has the general form:
TABULATE [ [wafer_express,] stub_express,] header_expressDefine an expression with variable names and keywords. At the most basic an expression consists of a single variable name. Multiple variables or statistics may be defined in a single expression separated with the keywords
BY and THEN. BY specifies
a level of nesting within a single dimension; THEN specifies
multiple variables within a single dimension.The structure of the table is defined by the values of the control variables specified in the various expressions. The values in the variables named in these expressions are used as the labels for the particular dimension being specified. The number of categories of any control variable is taken from the schema definition which contains information on variable ranges, valid values and value labels.
The body of the table consists of Cells. The content of a cell is a summary of data which applies to that particular combination of variables in the expressions. This might be a count of occurrences of data, a sum of values, a percentage or some other statistic.
If the command has a single expression, (i.e. the tabulate statement contains no commas), it is a header expression and a table containing a single value for each column is produced. That is one line of data is produced with one entry per column. For example:
TABULATE REGION TABULATE AGE BY SEX TABULATE SALARY BY (MIN THEN MEAN THEN MAX)If the tabulate statement consists of two expressions separated by a comma, a two-dimensional table is produced. The leftmost expression defines the stub or rows of the table going down the page. The second expression defines the header dimension or columns going across the page. For example,
TABULATE REGION, RACE TABULATE REGION, RACE THEN SEX TABULATE REGION BY RACE, SALARY BY (MEAN THEN MEDIAN)By specifying variables in two expressions, a cross-tabulation is produced with one cell for each value of each variable. For example, if a cross tabulation for two variables is specified, where each variable has two values, four cells are produced. The more values, the more cells.
Nesting variables with the BY operator also produces a
cross-tabulation within a single dimension. If the same variables are used as a
stub and header or as a header with a BY clause, the same
cells are produced but the table has a different shape.
If the tabulate statement contains three expressions separated by commas, a set of two-dimensional tables is produced. The two rightmost expressions of the tabulate statement define the stub (rows) and header (columns) for each wafer. The leftmost expression controls the number of wafers produced. When more than one wafer is generated a table of contents is also produced. If a particular wafer contains no data, it is not printed and is marked deleted in the table of contents. If only one wafer is produced, it is printed even if it is empty. For example:
TABULATE REGION, AGE, SEX TABULATE REGION BY RACE, AGE, SEX TABULATE COUNT THEN PCT, RACE BY SEX, AGE BY REGION
SEX BY AGE BY INCOME
When two variables are joined by
AGE THEN INCOME BY SEX
The next example concatenates
Other combinations might be:
AGE BY INCOME THEN SEX
AGE BY (INCOME THEN SEX)
Nesting and concatenation can also be used in the stub and wafer expressions.
Tabulate determines the tabulation type of a variable from the information in
the variable schema map according to the following rules:
If a control variable contains
missing values, the record is normally excluded
from the body of the tabulation. Special categories can be generated to show
cells for missing control variables. If an observation variable contains missing
values, the record is normally included but the variable is not computed into
the cell contents. This can give rise to situations where overall totals do not
agree with counts in the body of the table. Specify the
Control variables provide
classifications which determine the structure of the table and the presence of a
particular cell. Observation variables give the result that appears in the cell.
When two control variables are cross tabulated or nested, the content of the
cell is a count of the number of occurrences of data which match that
combination of values.
When an observation variable is cross tabulated or nested with a control
variable, the total (sum) for the observation variable is the default. That is, the
structure of the table is unaltered by the inclusion of an observation variable
but, instead of a count of data appearing in the cell contents, a total of the
observation variable is calculated and an appropriate header label printed.
Other statistics can be specified for observation variables.
Two observation variables cannot be cross tabulated or nested within each other
and therefore observation variables may be specified in one and only one
expression (dimension) of the
Different types of information are reported in the cell contents for Control and
Observation variables. For Control variables, which have discrete values,
counts or percentages are reported. For Observation variables, which have
continuous values, a variety of cell statistics are reported, the default being
a sum of the values. For any cell that cross tabulates a control and an
observation variable, the observation variable statistics are reported.
REGION,SALARY BY AGE BY SEX
AGE, SEX THEN (SALARY BY SEX)
REGION,AGE BY (SEX THEN SALARY)
The
The
When a control variable has a missing or undefined value, it is normally
excluded from the detail cells in the table. Specify the
By default, the rows and columns of a table are
labeled with value labels (if they exist) or with the "variable=value"
notation. The
REGION,AGE BY SEX
REGION.S,AGE.S BY SEX.S
The
There are two methods of specifying statistics.
The first method is to specify a pseudo-variable in the expression and to use one of the statistical option
clauses to specify the statistic and label to produce in the form:
The second method is to use a statistical keyword directly.
In effect, these are pre-defined, convenient pseudo-variables
with standard labels.
(This method cannot be used with
BY
Specifying BY between two variables nests each of the values
of the second variable in each of the values of the first variable. If a second
BY is specified, a third level of nesting takes place.
BY defines a structure within a particular dimension. Each
variable specifies a further level of nesting. The number of cells produced
is the product of the number of values for each variable. There is no limit to
the number of levels allowed within one expression. The asterisk (*)
symbol can be used instead of BY.Example: BY
AGE BY INCOME
Young Old
Rich Poor Rich Poor
6 4 8 3
Male Female
Young Old Young Old
Rich Poor Rich Poor Rich Poor Rich Poor
4 1 6 2 2 3 2 1
THEN
Specifying THEN between two variables or two clauses in effect
concatenates two independent tables with all values of the first table followed
by all values of the table. The plus (+) symbol can be used instead of
THEN.Example: THEN
The following header expression results in the independent tabulation of
individual occurrences of the two values for each of the two variables:AGE THEN SEX
Young Old Male Female
10 11 13 8 THEN, each value of each
variable defines a cell. The number of cells produced is the sum of the number
of values for each variable.
Combining BY and THEN
BY takes precedence over THEN
when the two operators are used in combination in an expression. Use
parentheses to specify a different precedence.Example: BY and THEN
The following examples show various combinations of the same variables in a
header expression. In the first example, since nesting takes precedence,
SEX is nested with INCOME, and the result of
that nesting operation is concatenated with AGE.
Young
Old Rich Poor
Male Female Male Female
10 11 10 4 3 4 AGE and INCOME
and nests SEX within both AGE and
INCOME, using parentheses to cause the concatenation operator
to be applied first.
(AGE THEN INCOME) BY SEX
Young Old Young Old
Male Female Male Female Male Female Male Female
5 5 8 3 10 4 3 4
Young Old
Male
Female
Rich Poor Rich Poor
6 4 8 3 13 8
Young Old
Rich Poor Male Female Rich Poor Male Female
6 4 5 5 8 3 8 3
Variable Types
Tabulate uses numeric variables only. This includes categorical variables,
which are numeric codes representing discrete string values.
In a VisualPQL program, string variables can be
recoded into numeric
categorical values.
There are two types of numeric variables;
variables with discrete categories, known as
CONTROL variables
and variables with continuous values, known as
OBSERVATION variables.
Most variables are either control or observation. For example, salary would be
an observation variable and department would be a control variable. However,
some variables may be reasonably used in either way. A variable containing
number of persons per family might be used as a control variable in one table
but might be aggregated in another table along with a count of families, in
order to get the average persons per family. Such a variable can be defined as
observation type in the VisualPQL program or database and then reclassified in the
CATEGORICAL, or have
VALID VALUES
or have
VALUE LABELS are
control variables. All other variables are observation variables.CONTROL VARS or OBSERVATION VARS in the data
dictionary. and OBSERVATION
VARS commands in a VisualPQL program assign an explicit tabulation type to
local variables created in the program.CONTROL variables
must have ranges specified.TABULATE statement as a control variable.DELETEMISSING keyword to exclude records where any
cross-tabulated variable has a missing or undefined value.
Cell Contents
Different types of variables combine to give different cell contents and
a different table.TABULATE statement.Example: Observation Variables
In the following three examples, the observation variable SALARY is
included in the expression. The number in any given cell is the sum of the
SALARY values from each record having the combination of
values for the control variables for that particular cell. The placement of the
observation variable in the header expression determines the placement of the
label, but does not alter the content of the table. The observation variable
does not alter the number of cells. Observation variables and counts may be
specified in a single expression in various ways:
SALARY
Young Old
Male Female Male Female
North..........
South.......... 2800
2200 5650
11700 20100
49503200
5000
Male
Female Current Monthly Salary
Male Female
Young...
Old..... 5
8 5
3 13050
21900 14450
8200
Young Old
Male
Female Current Monthly Salary
Male
Female Current Monthly
Salary
North....
South.... 3
2 2
3 13600
13900 6
2 1
2 20150
9950
Variable Modifiers
Modifiers change the attributes of a variable in an expression.
These have the form:
variable[.modifier][.modifier]...
Several modifiers can be appended to a single variable. The modifiers and their
abbreviations are:
.CONTROL (.C)
.OBS (.O)
.FIRST (.F)
.TOTAL (.T)
.MISSING (.M)
.UNDEFINED (.U)
.SPAN (.S)
.NOSPAN (.NOS)
The .CONTROL and .OBS modifiers alter the
default tabulation type of a variable (i.e. control or observation). The
.CONTROL modifier may only be specified for an observation
variable if valid ranges have been specified for it..FIRST modifier is used in conjunction with the percent
base marker as described in the section on percentages. It specifies that the
first occurrence of the percent base marker is used..TOTAL modifier automatically concatenates a
TOTAL control variable as described in the section on
TOTAL. The TOTAL is displayed before
the variable being modified..MISSING or .UNDEFINED modifiers to generate additional cells in the
table. The .UNDEFINED
modifier groups all missing and undefined values into a single cell. The
.MISSING modifier displays undefined and each defined missing
value as separate cells. These modifiers override any DELETEMISSING
specification for that variable.Example: Undefined Modifier
AGE.T.U
TOTAL Young Old AGE=
UNDEFINED
22 10 11 1 .SPAN modifier generates a second level of
heading for a variable. This heading is either the variable label (if it
exists) or the variable name. This new heading "spans" all the categories
of the variable, hence the name .SPAN.
Example: SPAN Modifier
The next two tables illustrate the effect of .SPAN. The only
difference between them is the .SPAN modifier in the second
table.
Young Old
Male Female Male Female
North..........
South.......... 3
2 2
3 6
2 1
2
AGE
Young Old
Gender
Male Female Male Female
REGION
North..........
South..........
3
2
2
3
6
2
1
2SPANNERS option can be used to turn on
spanning labels for all variables. The .NOSPAN modifier turns
off spanning labels for a particular variable when the
SPANNERS option is in effect.
Statistics
Other than TOTAL,
statistics are used with observation variables.
Statistics are specified with the BY keyword
to associate them to an observation variable.
keyword = pseudo-variable 'label'
The 'label' is optional but the variable name is required.NORMALIZED, PERCENT,
QUANTILE and RANGE, since additional data is
required on the specification.)
The statistical clauses are:
COUNT | A count of records in this category. |
CSS | Corrected Sum of Squares, where the cell = sum of squares - ((sum**2)/n). |
CV | Coefficient of Variance, where the cell = (standard deviation/mean) * 100. |
CVERR | Coefficient of Error, where the cell = (standard error/mean) * 100. |
MAXIMUM | Maximum value of the variable. |
MEAN | Mean value of the variable. |
MEDIAN | Median value of the variable. |
MINIMUM | Minimum value of the variable. |
MISSING | Count of missing values in the variable. |
| NORMALIZED * |
Normalised quantiles (by median value) where cell = (quantile/median) * 100. |
| PERCENT * | Percentage, where cell = (cell/base) * 100. |
| QUANTILE * | Quantile, equal or unequal quantiles can be specified. |
RANGE *
|
Range produces the range of the variable expressed
as the difference between the lowest and the highest values encountered.
The format of the range clause is:RANGE = variable (lo,hi) ['label']The specification of the lo and hi values defines the outer limits of any values to be included in the calculation. This can be used, for example, to exclude negative or zero values or very high numbers used as some coding convention. |
STDERR
| Standard Error where cell = standard deviation/sqrt(n). |
STDEV
| Standard Deviation about the mean. |
| TOTAL | Count of records. When cross tabulated with an observation variable, gives sum of values of the observation variable. |
TSTATISTIC
| T-Statistic where cell = mean/standard error. |
USS
| Uncorrected Sum of Squares where cell = sum of squares. |
TOTAL clause specifies that the pseudo-variable N is a
total, with the label 'Valid Cases'. The MEAN and
STDEV clauses assign labels for these pseudo-variables to 'X
Bar' and 'Sigma'.
TABULATE SEX, N THEN SALARY BY (MEAN THEN STDEV) /
TOTAL = N 'Valid Cases' /
MEAN = MEAN 'X Bar' /
STDEV = STDEV 'Sigma' /
| Valid Cases | Current monthly salary | ||
| X Bar | Sigma | ||
| Male........... Female......... | 13 8 | 2688 2831 | 486 746 |
The pseudo-variables created can be used anywhere that is valid. If the
pseudo-variable is a TOTAL it is a control variable, otherwise
it is an observation variable.
Multiple pseudo-variables can be created and assigned to the same keyword. In
the next example, both N and SUM are totals,
N is the count of records and SUM is the
total SALARY.
TABULATE SEX, N THEN SALARY BY (SUM THEN MEAN THEN STDEV) /
TOTAL = N 'Valid Cases' SUM 'Sum' /
MEAN = MEAN 'X Bar'/
STDEV = STDEV 'Sigma' /
| Valid Cases | Current monthly salary | |||
| Sum | X Bar | Sigma | ||
| Male........... Female......... | 13 8 | 34950 22650 | 2688 2831 | 486 746 |
| Current monthly salary | ||
| MINIMUM | MAXIMUM | |
| Male........... Female......... | 2000 1650 | 3600 4000 |
Keywords must be nested within an observation variable.
Keywords may not be nested within each other; they may be
concatenated. For example, SALARY BY MEAN BY STDEV is not
allowed; SALARY BY (MEAN THEN STDEV) is allowed.
ISD number
.
This number is initially set to 4
which means that the maximum relative error in a median or quantile estimate
should not exceed 12.5%. The value computed is usually more accurate if the
values are distributed smoothly. The ISD number can be set explicitly by using
the clause:
ISDNUMBER = n /which sets the
ISD number to n. Increasing the ISD
number by 1 cuts the maximum relative error in half while doubling the internal
storage required to compute the medians or quantiles. Whenever it is used, the
ISDNUMBER clause must precede the clauses it is to affect.
For example,
TABULATE TOTAL THEN REGION, SALARY BY (MEAN THEN Q) /
ISDNUMBER = 5 /
QUANTILES = Q 'Quantiles' 4 /
The control variable,
SALARY THEN TOTAL
SALARY BY AGE BY (TOTAL THEN SEX)
Since
In the first form, the quantity "n" specifies the number of equal-sized
quantiles to be produced. Of these, n-1 are printed. Thus for n=4,
A specification in the first form , with multiple labels:
A specification in the second form:
Tables can be specified which contain row percentages, column
percentages or overall percentages. Multiple percentages can be displayed in a single
table. A single table can have percentages computed from more than one base
value. However a table cannot have any cell which is an intersect of two
percentage calculations. This means that all percentage cells must appear in a
single expression; there cannot be percentages in both a row and column
expression.
To display percentages, specify where the percents appear and the base of each
percent calculation. Use the
A brief review of .T and %X notation: An expression such as
The basics of producing percentages in tables are covered in
Simple percentages.
You can also get
percentages with observation
variables and tables with
values and percentages.
Tables can have
percentages
for row totals and can be displayed in different
wafers.
There are numerous examples of different tables using percentages in
Percentage examples.
TOTAL THEN AGE,TOTAL THEN REGION
To add percentages to this table, define a pseudo-variable, say X, and reference
this on
Decide where to place the percent variable X, ensuring that it is nested
correctly. Specifying it in the header expression or in the stub expression
only affects the labeling of the table.
The total (100%) and regional percentages for each age group (row) are displayed
in the table. The table contains only percentages because the percent variable,
X, is at the highest level of the header expression nesting. The remainder of
the header,
Moving X to the stub expression changes the labeling of the table but not its
contents:
In the example, move the base marker,
As in the preceding table, the percent variable,
In this table, all percentages are measured using the first cell of the table as
the base. This cell is at the intersection of the stub
The corresponding percentage table is:
Moving the clause
The following has alternating columns of original values and percents. Note
only the cells nested within the percent variable X contain percents.
Instead of presenting the data in alternating columns, it could be grouped into
separate blocks of frequencies and percents by inverting the order of the header
expression.
An analogous set of tables can be designed for percentages and totals of an
observation variable. The following table displays alternating columns of total
salary and percentage.
This table groups all salary columns followed by all the percent columns.
The following shows total salary for both sexes and percentage by sex.
(On next page)
Suppose the table is a breakdown of males and females under and over forty. The
basic table (without percentages) is easily produced with a table statement such
as:
It is simple to add percentages to either the row totals, giving percentages of
males and females or the column totals giving percentages of age groups but not
both at once.
The solution is to produce two tables in the one run:
Total
TOTAL can be specified as an independent control variable
which is a count of records. It is specified in exactly the same way as other
statistics (using either pseudo-variables or a keyword).
However, since it is a control variable rather than an
observation variable, it operates in a different manner.Example: TOTAL
AGE THEN TOTAL
Young Old TOTAL
10 11 22 AGE, with two classifications is
concatenated with TOTAL. The total column equals the count of
all occurrences at the appropriate level of nesting. (Note that records
with undefined values have still been counted.)
Current
monthly
salary
TOTAL
57600 22
SALARY is an observation variable and thus the sum is
displayed by default. The keyword TOTAL is a control variable
and displays a count.
Cross tabulating or nesting TOTAL with either control or
observation variables is allowed since it is a control variable. As per the
standard rules for cell contents, if TOTAL is nested with a
control variable, a count is produced; if it is nested with an observation
variable, a sum of the observation variable is produced.Example: TOTAL with BY and THEN
The following header expression uses the observation variable
SALARY and the control variables AGE and
SEX:
Current Monthly Salary
Young Old
TOTAL Male Female TOTAL Male Female
27500 13050 14450 30100 21900 8200 TOTAL is nested within an observation variable, this
column contains the total SALARY for each
AGE group.
TOTAL shorthand notation
There is a shorthand notation for TOTAL. Appending a single T
to the variable name separated by a period specifies that a
TOTAL is produced before the variable. The following
two expressions are identical:TOTAL THEN AGE
AGE.T
A tabulate statement can be shortened as follows:TABULATE SALARY BY AGE BY (TOTAL THEN SEX) /
TABULATE SALARY BY AGE BY SEX.T /
Normalised and Quantiles
specifications are very similar
Normalised and Quantile to other statistics but have to supply some additional information. This can
take one of two forms:TABULATE prints output for the 25%, 50% and 75% quantiles. In
the second form a number of non-equal sized quantiles are produced and each n
specifies the quantiles to produce. For example, specifying Ns of 15 and 85
produces the 15th and 85th percentage quantiles. Both forms allow specification
of labels. In the first form, when labels are used for individual quantiles, the
commas separating 'label1', 'label2', etc. are required.1) keyword = variable ['label'] n ['label1', ...]
2) keyword = variable ['label'] (n1) ['label'] (n2)...
Example: Quantiles
TABULATE REGION.T, SALARY BY (MEAN THEN Q) /
QUANTILES = Q 'Quantiles' 4 /
Current monthly salary
MEAN Quantiles
QUANTILE-
=25 QUANTILE-
=50 QUANTILE-
=75
TOTAL.......
North........
South........ 2743
2813
2650
2311
2238
2369 2678
2825
2538 3222
3315
2913TABULATE REGION.T, SALARY BY (MEAN THEN Q) /
QUANTILES = Q 'Quantiles' 4 '25%', 'Median', '75%' /
Current monthly salary
MEAN Quantiles
25% Median 75%
TOTAL........
North........
South........ 2743
2813
2650
2311
2238
2369 2678
2825
2538 3222
3315
2913TABULATE REGION.T, SALARY BY (MEAN THEN D) /
QUANTILES = D 'Deciles' (10) '10th' (50) '50th' (90) '90th'/
Current monthly salary
MEAN Deciles
10th 50th 90th
TOTAL........
North........
South........ 2743
2813
2650
2051
1973
2267 2678
2825
2538 3407
3647
3265
Percentages
TABULATE can calculate and display percents. Tables can be
specified which display just percentages or which also include the original
values.PERCENT option to define a
pseudo-variable and then reference this in an expression to display the percent.
Specify the base variable to use to calculate the percentage by appending a
percentage symbol (%) and the same pseudo-variable name to a base variable.
The only valid base variables are TOTAL
control variables, typically cross tabulated against the observation variable
that is the percentage variable.
Shorthand Notation for Base Marker Specification
If a percent base marker is appended to a variable, it specifies that a TOTAL
precedes the variable and that this is the base. For example, the following
expressions are equivalent:TOTAL%X THEN REGION
REGION%X
This shorthand notation can simplify the tabulate expressions. For example, both
the following statements are equivalent:TABULATE X BY (TOTAL%X THEN AGE), TOTAL%X THEN REGION /
TABULATE X BY AGE%X, REGION%X /
The table output is the same in both cases. With the shorthand form, the label
of the created TOTAL category is always "TOTAL".AGE.T, produces three rows or columns - TOTAL,
Young and Old. An expression such as
REGION%X, produces three categories -
TOTAL, North and South. It
also specifies the TOTAL column as the percent base. The
sub-expression, X.T, is the same as TOTAL THEN
X and produces two cells for each category.
Percentages with Counts
The simplest percentage tables are produced with control variables
and cell contents are therefore counts.
Consider the following standard table:
TOTAL North South
TOTAL..........
Young..........
Old............ 22
10
11 12
5
7
9
5
4PERCENT clause, for example:
PERCENT = X 'Percent'
Choose the base row or column for the percentage computations. Base cells are
indicated within an expression by appending a percentage symbol (%) and the
pseudo-variable to the base variable. For
row percentages, there must be a total column for the base or 100%
column. In the example, the TOTAL variable in the header
expression is the total column. Therefore, append %X to
TOTAL to make it the base for percentage calculations.Row Percents
TABULATE TOTAL THEN AGE, X BY (TOTAL%X THEN REGION) /
PERCENT = X 'Percent'
Percent
TOTAL North South
TOTAL..........
Young..........
Old............ 100
100
100
55
50
64 41
50
36(TOTAL THEN REGION), is nested within X.
The percent marker, %X, specifies the header column
TOTAL as the 'set' of base cells for this table's percent
calculations. Each cell value for TOTAL is the base for the
percents in its row. For example, the calculations for the first row of the
table are:TOTAL = 2200/2200 * 100 = 100
North = 1200/2200 * 100 = 55
South = 800/2200 * 100 = 41
Note: The above illustrates the effect of missing values in a control
variable. Some records had missing values in REGION and thus
the columns do not reflect these records. If required, specify the
MISSING modifier on REGION to generate an
extra column for those records.TABULATE X BY (TOTAL THEN AGE), TOTAL%X THEN REGION /
PERCENT = X 'Percent' /
TOTAL North South
Percent
TOTAL..........
Young..........
Old............
100
100
100
55
50
64
41
50
36Column Percents
To display column percentages instead of row percentages, use a total row
to be the base or 100% row.%X, from TOTAL in the header and append it
to TOTAL in the stub:TABULATE TOTAL%X THEN AGE, X BY (TOTAL THEN REGION) /
PERCENT = X 'Percent' /
Percent
TOTAL North South
TOTAL..........
Young..........
Old............ 100
45
50
100
42
58 100
56
44X, is nested
with all cells of the table so that all cells contain percents. In this table,
however, the percent marker, %X, specifies the
TOTAL row in the stub expression as the set of base cells for
this table's percent calculations. For example, the calculations for the first
column of the table are:
TOTAL = 2200/2200 * 100 = 100
Young = 600/2200 * 100 = 45
Old = 1400/2200 * 100 = 50
Base Markers in Both the Header and the Stub
To get overall or grand total percentages, place the
percent base markers in both the stub and the header. By convention, the cell
(or cells) at the intersection of the base row and column are considered the
base cells for the table.
TABULATE TOTAL%X THEN AGE, X BY (TOTAL%X THEN REGION) /
PERCENT = X 'Percent' /
Percent
TOTAL North South
TOTAL..........
Young..........
Old............ 100
45
50
55
23
32 41
23
18TOTAL
row and the header TOTAL column.
Percentages with Observation Variables
Percentages of observation variables can be produced but at least one control
variable must be nested with the percent variable.Example: Percentages for Observation Variables
The following shows salary by age group:TABULATE SALARY BY (TOTAL THEN AGE) /
Monthly salary
TOTAL Young Old
57600 27500 30100
TABULATE X BY SALARY BY (TOTAL%X THEN AGE)
PERCENT = X 'Percent' /
Percent
Monthly salary
TOTAL Young Old
100 48 52
(TOTAL%X THEN AGE) to the front of the
expression, results in the same data in the table with different labeling:
TABULATE (TOTAL%X THEN AGE) BY X BY SALARY /
PERCENT = X 'Percent' /
TOTAL Young Old
Percent Percent
Monthly salary Monthly salary Monthly salary
100 48 52
Original Values and Percents
Original values can be included as well as the percents. In the examples that
follow, all of the manipulations are performed in the header expression.
However, they could equally be done in stub or wafer expressions.TABULATE AGE.T, REGION%X BY X.T /
PERCENT = X 'Pct' /
TOTAL North South
TOTAL Pct TOTAL Pct TOTAL Pct
TOTAL......
Young......
Old........ 22
10
11 100
100
10012
5
7 55
50
64 9
5
4 41
50
36TABULATE AGE.T, X.T BY REGION%X /
PERCENT = X 'Pct' /
TOTAL Pct
TOTAL North South TOTAL North South
TOTAL.....
Young.....
Old....... 22
10
11 12
5
7 9
5
4
100
100
100 55
50
64 41
50
36
Percents for Row Totals.
TABULATE AGE.T, TOTAL THEN (X BY REGION%X) /
PERCENT = X 'Pct' /
TOTAL Pct
TOTAL North South
TOTAL..........
Young..........
Old............ 22
10
11
100
100
100 55
50
64 41
50
36TABULATE AGE.T, SALARY BY SEX%X BY X.T /
PERCENT = X 'Pct' /
Current monthly salary
TOTAL Male Female
TOTAL Pct TOTAL Pct TOTAL Pct
TOTAL.....
Young.....
Old....... 57600
27500
30100 100
100
100
34950
13050
21900 61
47
73 22650
14450
8200 39
53
27TABULATE AGE.T, SALARY BY X.T BY SEX%X /
PERCENT = X 'Pct' /
Current monthly salary
TOTAL Pct
TOTAL Male Female TOTAL Male Female
TOTAL.....
Young.....
Old....... 57600
27500
30100
34950
13050
21900 22650
14450
8200 100
100
100 61
47
73
39
53
27TABULATE AGE.T, SALARY BY (TOTAL THEN (X BY SEX%X)) /
PERCENT = X 'Pct' /
Current monthly salary
TOTAL Pct
TOTAL Male Female
TOTAL.....
Young.....
Old....... 57600
27500
30100 100
100
10061
47
73 39
53
27
Percentages in Wafers
It is possible to display frequencies and percentages in different wafers
of the same table. For example:
TABULATE X.T, AGE%X, REGION%X /
PERCENT = X 'Percent' /
The wafer expression is X.T which stands for TOTAL THEN
X. Thus the table consists of two wafers, the first wafer contains
totals and the second contains percentages. Each wafer is a two-dimensional
table of AGE (in the stub) by REGION (in the
header). Since the base marker, %X, appears in both the stub
and the header, the percentage wafer contains overall percentages. The following
two wafers are the output of the previous tabulate statement:X.T,AGE%X,REGION%X
Table of Contents:
TOTAL........................................ page 1
Percent...................................... page 2
X.T,AGE%X,REGION%X
TOTAL..............
TOTAL North South
TOTAL..........
Young..........
Old............
22
10
11
12
5
7
9
5
4X.T,AGE%X,REGION%X
Percent............
TOTAL North South
TOTAL..........
Young..........
Old............ 100
45
50
55
23
32 41
23
18
Example percentage tables
The following examples illustrate some of the different ways to display
percentages. The examples are all based on the
following table of frequency counts.
TABULATE AGE.T BY SEX.T, REGION.T
TOTAL North South
TOTAL
TOTAL...........
Male............
Female..........
Young
TOTAL...........
Male............
Female..........
Old
TOTAL...........
Male............
Female..........
22
13
8
10
5
5
11
8
3
12
9
3
5
3
2
7
6
1
9
4
5
5
2
3
4
2
2Base marker appended to AGE
This table contains nine base (100%) cells. The TOTAL for
AGE is the base cell (AGE%X). Since
SEX.T is nested within AGE, there are three base cell rows,
and since the header contains three columns (REGION.T) there is a
total of nine base cells (three columns times three rows).TABULATE AGE%X BY SEX.T, X BY REGION.T/ PERCENT = X 'Percent'
Percent
TOTAL North South
TOTAL
TOTAL..........
Male...........
Female.........
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL..........
Male...........
Female.........100
100
100
45
38
63
50
62
38100
100
100
42
33
67
58
67
33100
100
100
56
50
60
44
50
40Base marker appended to SEX
Note that the nine base cells now occur where TOTAL for
SEX appears.TABULATE AGE BY SEX%X, X BY REGION.T / PERCENT = X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL..........
Male...........
Female.........
100
50
50
100
73
27
100
60
40
100
86
14
100
40
60
100
50
50
Use of FIRST
In the previous table, base rows appeared at every occurrence of the
TOTAL variable for SEX. The next table uses
.FIRST to restrict the base row to the first occurrence
of TOTAL for SEX.TABULATE AGE BY SEX%X.F, X BY REGION.T /
PERCENT = X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL.......
...
Male...........
Female.........
100
50
50
110
80
30
100
60
40
140
120
20
100
40
60
80
40
40Multiple Base Markers
The next three tables are similar to the first three with the addition of a
second base marker appended to REGION in the header. The base
cells occur at the intersections of the base rows and columns.TABULATE AGE%X BY SEX.T, X BY REGION%X /
PERCENT = X 'Percent'
Percent
TOTAL North South
TOTAL
TOTAL..........
Male...........
Female.........
Young
TOTAL.....
.....
Male...........
Female.........
Old
TOTAL..........
Male...........
Female.........
100
100
100
45
38
63
50
62
38
55
69
38
23
23
25
32
46
13
41
31
63
23
15
38
18
15
25
TABULATE AGE BY SEX%X, X BY REGION%X /
PERCENT = X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL.......
...
Male...........
Female.........
100
50
50
100
73
27
50
30
20
64
55
9
50
20
30
36
18
18TABULATE AGE BY SEX%X.F, X BY REGION%X /
PERCENT = X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL.......
...
Male...........
Female.........
100
50
50
110
80
30
50
30
20
70
60
10
50
20
30
40
20
20Multiple Percent Variables Within a Table
More than one percent distribution can be displayed in a single table by
specifying more than one percent variable. For example, the following two
variables could be specified representing row and column percentages:PERCENT= PR 'Row Pct' PC 'Col Pct' /
Since PR represents row percentages, the base marker associated
with PR must be in the header expression. Similarly, since
PC, represents column percentages, the base marker associated with
PC must be in the stub expression. In the example below, the
results are presented in two tables concatenated together. The first table
contains row percentages, the second contains column percentages.TABULATE AGE%PC, (PR + PC) BY REGION%PR /
PERCENT = PR 'Row Pct' PC 'Col Pct'
Row Pct Col Pct
TOTAL North South TOTAL North South
TOTAL.....
Young.....
Old....... 100
100
100 55
50
64
41
50
36 100
45
50 100
42
58 100
56
44Multiple percents in Rows
The next example presents the same data in a more conventional form. Notice
(PR + PC) is moved to the stub expression.TABULATE AGE%PC BY (PR + PC), REGION%PR /
PERCENT = PR 'Row Pct' /
PERCENT = PC 'Col Pct' /
TOTAL North South
TOTAL
Row Pct.........
Col Pct.........
Young
Row Pct.........
Col Pct.........
Old
Row Pct.........
Col Pct.........
100
100
100
45
100
50
55
100
50
42
64
58
41
100
50
56
36
44Example: Overall Percents
Overall percentages can be included in the same table by adding a third percent
variable, PT. Since PT is to represent overall
percentages, append the PT base marker to both the stub and the
header variables.TABULATE AGE%PC%PT BY (PR + PC + PT), REGION%PR%PT /
PERCENT = PR 'Row Pct' /
PERCENT = PC 'Col Pct' /
PERCENT = PT 'Tot Pct' /
TOTAL North South
TOTAL
Row Pct.........
Col Pct.........
Tot Pct.........
Young
Row Pct.........
Col Pct.........
Tot Pct.........
Old
Row
Pct.........
Col Pct.........
Tot Pct.........
100
100
100
100
45
45
100
50
50
55
100
55
50
42
23
64
58
32
41
100
41
50
56
23
36
44
18Counts and Multiple Percents
Counts can be included by adding the TOTAL keyword:TABULATE AGE%PC%PT BY (TOTAL + PR + PC + PT), REGION%PR%PT /
TOTAL = TOTAL 'Count' /
PERCENT = PR 'Row Pct' /
PERCENT = PC'Col Pct' /
PERCENT = PT 'Tot Pct'
TOTAL North South
TOTAL
Count...........
Row Pct.........
Col Pct.........
Tot
Pct.........
Young
Count...........
Row Pct.........
Col
Pct.........
Tot Pct.........
Old
Count...........
Row
Pct.........
Col Pct.........
Tot Pct.........
22
100
100
100
10
100
45
45
11
100
50
50
12
55
100
55
5
50
42
23
7
64
58
32
9
41
100
41
5
50
56
23
4
36
44
18Percentages in both Columns and Rows
It is not possible to produce a table which has a column of percentages on the
total and a row of percentages on the same total. The solution to this is to
produce a second table on the same file with a single row - the final
percentages.TABULATE GENDER.T, AGEGROUP.T
However it is impossible to produce both sets of percentages in the same table.
The required output is the following:
Total Young and Old Under 40 40+ % BY AGE
Total Males and Females 20 6 14 100%
Male 12 2 10 60%
Female 8 4 4 40%
% BY GENDER 100% 30% 70% TABULATE TOTX%X THEN GENDER, TOTY THEN AGEGROUP THEN X/
FILENAME = A /
TOTAL = TOTX 'Total Males and Females' /
PERCENT = X '% BY AGE' /
TABULATE Y, TOTY%Y THEN AGEGROUP /
TOTAL = TOTY 'Total Young and Old' /
PERCENT = Y '% BY GENDER' /
Note that the second tabulate does not have a FILE clause and
thus writes the output to the same file. This second tabulate produces one row
which is the required set of final percentages.
Record Filtering
BOOLEAN = (logical expression)
SAMPLE = fraction [,seed ]
WEIGHT = varname
BOOLEAN
| Specifies which procedure table records are used by the procedure. The procedure table records for which the logical expression is true are used by the procedure. If this option is not specified, all procedure table records are used. |
SAMPLE
|
Specifies that a random sample of the procedure table records are
used by the procedure. The fraction specifies the percent of records used and is specified as a positive decimal number less than or equal to 1 (one). .25, for example specifies that a 25% sample be used. The optional seed is an integer that is used to initialise the random number generator. |
WEIGHT
|
Specifies an integer procedure variable used to weight the frequency
counts and aggregations. By default, each record adds a count of 1 to frequency
counts. Specifying a weight variable, adds the value of that variable rather
than 1. For example, if WEIGHT = FAMSIZE were specified,
then, in a table of RACE by REGION, a procedure table
record would contribute a count of 5 to the RACE=1,
REGION=2 cell of the table if it contained the data:RACE=1 REGION=2 FAMSIZE =5 |
Print Formatting
PRINTFORMATS = var list (options,...) .../
By default, cell contents are printed as integers. The
PRINTFORMATS option is used to alter the defaults. The options
and their abbreviations are:
COMMAS (C)
| Separates groups of 3 digits by commas. |
DECIMAL=n (n)
| Specifies the number of decimal places to be printed. A decimal point is only printed when the number of decimal places is non-zero. |
DOLLAR (D)
| Places a floating dollar sign before the number. |
PERCENT (P)
| Places a trailing percent sign after the number. |
DOLLAR, PERCENT and COMMAS
options are not in effect and DECIMAL= 0 (i.e. numbers are
printed as integers).
AGE.T , SALARY BY (TOTAL THEN (X BY SEX%X)) /
PERCENT = X 'Pct' /
PRINTFORMATS = SALARY (D,C) X (P,2) /
| Current monthly salary | ||||
| TOTAL | Pct | |||
| TOTAL | Male | Female | ||
| TOTAL.......... Young.......... Old............ |
$57,600 $27,500 $30,100 | 100.00% 100.00% 100.00% |
60.68% 47.45% 72.76% | 39.32% 52.55% 27.24% |
TOTAL under SALARY is printed with dollar signs and
commas. However, TOTAL under PCT is printed with a
percent sign and two decimal places because of the precedence rules for
print formats.
PRINTFORMAT are not considered in the precedence analysis.
PRINTFORMATS for percentages have the highest
precedence. If a percent variable is not assigned an explicit
PRINTFORMAT, it is printed as an integer.
PRINTFORMAT takes
precedence.
MEAN, MAX, etc.)
is nested with an observation variable, the observation variable's
PRINTFORMAT is used. When nesting variables, the
PRINTFORMAT of the variable lower in the nesting takes
precedence.
TABULATE SALARY BY REGION BY (MIN THEN MAX) /
MINIMUM = MIN 'Lowest' /
MAXIMUM = MAX 'Highest' /
PRINTFORMATS = SALARY (D) MIN (2)MAX (2) /
The salaries are listed with 2 decimal points and no dollar sign. To print the
salaries with dollar signs and no decimal points the expression is:
TABULATE REGION BY (MIN THEN MAX) BY SALARY /
PRINTFORMAT of the lowest variable in the nesting takes
precedence.
PRINTFORMAT precedence, the
header expression has highest precedence followed by the stub and then the
wafer.Observation variables are displayed in a summary cell labelled with the variable label. If there is no variable label, the variable name is used.
When not enough room is provided in a table for label or title information, the
label or title is broken into two or more segments so that it can fit in the
allotted space. The VALUE LABELS command defines value labels
for missing values including BLANK and
UNDEFINED.
Page Formatting
PAGETITLE = 'string'['string'['string']]
PAGELENGTH = n
PAGEWIDTH = n
COLLAPSE
The following options alter the default formatting of the page.
PAGETITLE
| Specifies the title printed left justified at the top of the page. By default, the page title is the string "SIR/DBMS Tabulate Procedure" and the page number is printed at the top right-hand side of the page. The page title may consist of up to 3 lines. The first line contains the page number. The second and third lines are printed if there are wafers and/or if the wafers must be broken into chunks (i.e. the wafer does not fit on a single page). |
PAGELENGTH
|
The maximum number of lines that can appear on a page. Specifying
PAGELENGTH = NOEJECT turns off paging. The default
PAGELENGTH value is the current output file page length. This
must include space for:the page title and page number line(s), the wafer title, wafer label, and the blank lines between, the table (wafer) itself and any WAFERNOTES specified.
|
PAGEWIDTH
|
Controls the width of the printed page. By default, tables are
printed centred within PAGEWIDTH print positions. The default
value is the current output file page width.
|
COLLAPSE
|
If a wafer exceeds either PAGELENGTH or
PAGEWIDTH, an attempt is made to break the wafer into multiple
'chunks', each printed on a separate page. The chunks appear in the output in
order from left to right, then top to bottom. The wafer and stub labels appear
as needed. The COLLAPSE option allows chunks of broken up
wafers to be printed on the same page. No more than 2 chunks are printed per
page. The default is not to collapse chunks into a single page.
|
Header Formatting
HEADERWIDTH = n
HEADERINDENTATION = n
HEADERDIVIDER = 'character'
NODIVIDERS
NOHEADERCENTER
The following options alter the default formatting of the column header.
HEADERWIDTH
|
Specifies the width to be used in printing each column of the table.
This width does not include the column divider. The default header width is 10
print positions. If HEADERWIDTH is set too small, there may be
insufficient room to print a number in the table, in which case the field is
filled with Xs. Also, if HEADERWIDTH is
too small, there may not be enough room for the header labels to be printed and
they may be severely segmented. Conversely, if HEADERWIDTH is
set too large, the table header may contain too many print positions for one
page. When this occurs, the page is broken into chunks. Chunks are printed on
successive pages left to right, top to bottom.
|
HEADERINDENTATION
| Specifies the number of print lines for each level of the header nesting. The default is 3 and it cannot be set smaller. |
HEADERDIVIDER
| Specifies the character used as the column divider. The default is a vertical bar (|). This character is used for both the label area of the header as well as between the columns of data. |
NODIVIDERS
| Sets the divider character between columns of data to blank. It does not affect the label area of the header. |
NOHEADERCENTER
| Left justifies header labels in a column. By default header labels are centred in the column. |
Stub Formatting
STUBTITLE = option 'string'
STUBWIDTH = n
STUBINDENTATION = n
STUBCONTINUATION = n
STUBHEIGHT = n
STUBFILLER = 'character'
STUBDIVIDER = 'character'
The following options alter the default formatting of the stub:
STUBTITLE
|
Specifies a title which appears in the boxed in area to the left of
the header and directly above the stub labels. The options allowed are CENTER, LEFT and
RIGHT and control the justification of the stub label. If the
option is omitted, it is assumed to be CENTER. If the
stub title is too long, it is automatically segmented over two or more lines.
There is no default stub title.
|
STUBWIDTH
|
Specifies the number of print positions for stub labels. The default
is 20 print positions. If "too much" stub label segmentation occurs, TABULATE aborts with a WAFER/STUB OUTPUT FORMATTING
ERROR. The solution is to increase the STUBWIDTH.
|
STUBINDENTATION
| Specifies the indentation for each level of nesting. The default is 2 print positions. |
STUBCONTINUATION
| Specifies the indentation for continuation lines. A label which exceeds the stub width is segmented into two or more lines. Continuation lines are indented by the parameter. The default indentation is 3 print positions. |
STUBHEIGHT
|
Specifies the number of print lines used for each stub label. The
default is 1 line. Label segmentation works independently of the
STUBHEIGHT parameter and, therefore, some stub labels may
occupy more than STUBHEIGHT lines.
|
STUBFILLER
| Stub labels (on lines which contain data) are filled with the character to the full width of the stub. The default stub filler is a period (.). |
STUBDIVIDER
| Specifies the horizontal divider character. The default is the dash (-). |
TABULATE AGE BY SEX, REGION.T /
STUBTITLE = 'Age by Sex' /
STUBWIDTH = 20
| Age by Sex | TOTAL | North | South |
| Young Male............ Female.......... Old Male............ Female.......... | 5 5 8 3 | 3 2 6 1 | 2 3 2 2 |
TABULATE AGE BY SEX, REGION.T /
STUBFILLER = ' '
| TOTAL | North | South | |
| Young Male Female Old Male Female |
5 5 8 3 |
3 2 6 1 | 2 3 2 2 |
Wafer labels are formed in the same way that header and stub labels are
produced.
Wafer Formatting
WAFERTITLE = option 'string'
NOWAFERTITLE
WAFERCENTER
WAFERSTART = n
WAFERINDENTATION = n
WAFERCONTINUATION = n
WAFERHEIGHT = n
When a tabulate statement contains a wafer expression, each wafer produced
contains a wafer label in the upper left corner below the wafer title. If there
is no wafer expression in the tabulate statement, a wafer label is not printed.
WAFERTITLE
|
Specifies a wafer title printed above each wafer. By default, the
tabulate statement is printed as the wafer title, centred and one line above
the wafer (table). The option may be CENTER, LEFT or
RIGHT to position the wafer title over the wafer.
|
NOWAFERTITLE
| Suppresses printing of the wafer title. |
WAFERCENTER
| Centres the wafer on the page. By default wafers are left justifies on the page. |
WAFERSTART
|
Specifies the starting print position for the wafer. Does not apply
when WAFERCENTER is specified.
|
WAFERINDENTATION
| Specifies the indentation used for each level of nesting in the wafer title. The default is 2 print positions. |
WAFERCONTINUATION
| Specifies the indentation for continuation lines when a wafer label is broken up into multiple lines. The default is 3 print positions. |
WAFERHEIGHT
| Specifies the number of print lines for the wafer label. The default is 1. This height is automatically adjusted if a label does not fit in the width of the wafer title and has to be segmented. |
TABULATE reports a WAFER/STUB
OUTPUT FORMATTING ERROR. The solution is to increase the size of
STUBWIDTH.
TABULATE AGE BY REGION, SEX, RACE
Under 21
North.......
Under 21
South.......
21 and Over
North.......
21 and Over
South.......
When output is in html format, then the print formatting, page formatting and many of the general output control options can be specified but have no effect. There is no control over the detailed appearance of the table. The general shape of the table is dictated by the table expressions and the specifics of the table appearance depend on the software used to view and/or print the resulting html.
Example: Dummy Spanner Labels
Plus (+) simply concatenates two segments; blank forces a line break between
the two segments; minus (-) specifies optional segmentation as per the
following rules.
After the first line segment has been formed, all the remaining characters of
the original label to the right of the break character are considered next. If
the remaining segment fits in the allocated space, it is printed, otherwise, the
first n characters of this portion of the label (where n is the
space available) are also scanned from right to left and segmented as described
above. This process continues until the entire label has been printed.
If during compilation, a
If there is not enough room on a page to print an entire row or column,
Other Options
HTML
NOBOTTOMBORDER
NOLEFTBORDER
NORIGHTBORDER
NOTES = 'string'
FOOTNOTES = 'string'
WAFERNOTES = 'string'
NOZEROS
ZEROCHARACTER = 'character'
NOEMPTYROWS
DELETEMISSING
UPPERCASE
SPANNERS
DUMMY = variable 'label' ...
LANGUAGE
HTML
Specifies that the output file contains data in htm format for viewing by a browser or other package which expects this format.Border Generation
By default all borders are generated. The
NOBOTTOMBORDER, NOLEFTBORDER and
NORIGHTBORDER
options alter the production
of the left, right and bottom borders of each wafer. They can be used in
conjunction with the HEADERDIVIDER and
STUBDIVIDER.Footnotes
There are no default footnotes. NOTES specifies text printed
at the end of the Table of Contents which is produced if there are multiple
wafers or a NOTES option specified.
FOOTNOTES specifies text printed on a new page following all
wafers. WAFERNOTES specifies text printed at the end of each
wafer or chunk of a wafer.Zero Printing
By default zeros are printed as a 0 (or as specified in the
PRINTFORMAT statement). Specify NOZEROS to
print zeros as dashes (-). Specify ZEROCHARACTER to print
another character for zero cells.Suppression of Empty Rows and Wafers
An empty row or wafer is one that contains zero in all cells. By default empty
rows are printed. In a table with a single wafer or with no wafer expression at
all, the table is printed whether it is empty or not. If multiple wafers are
produced, empty wafers are automatically deleted. The deletion is noted in the
table of contents. Empty columns cannot be deleted.
NOEMPTYROWS suppresses the printing of empty rows.Exclusion of Records with Missing Values
Specify DELETEMISSING to exclude records
where any of the cross tabulation variables have missing or undefined values in
a variable.Converting Text to Uppercase
Specify UPPERCASE to convert text to uppercase. By default,
both upper and lower case are used.Spanner Labels
Specify the SPANNERS option to produce spanner labels for all
control variables. Specify spanner labels on a per variable basis with the
.SPAN modifier. When the SPANNERS option is
in effect, suppress spanner labels for an individual variable with the
.NOSPAN modifier.Dummy Spanner Labels
Specify the DUMMY clause to create label information that is
not related to any specific variable. The DUMMY clause creates
a dummy variable whose sole purpose is to carry label information into the
table. Dummy spanner labels can be used to span concatenated expressions.TABULATE J BY (SEX THEN REGION) /
DUMMY = J 'Sex and region Data'
Sex and region Data
Male Female North South
Dummy.......... 13 8 12 9 Language
If a language is specified, the effect is to produce translated versions of the following words or phrases:
SIR2002 Tabulate Procedure, Footnotes, Notes,Page, empty/deleted, Table of Contents, cont'd, of, Wafer, and, Chunk.
Recognised language keywords are HEBREW, DUTCH and GERMAN.String Specifications
Whenever an option includes the specification of strings or labels, a string can
be specified in multiple segments using the following format:'segment' option 'segment'
Option can be a plus (+), blank or minus(-); the different characters
specify where strings can be broken across lines:Label Segmentation
If a label does not fit, it is broken into segments by splitting the text at an
appropriate breakpoint. The breakpoint is chosen by searching the label from
right to left looking for one of the following conditions taken in sequence:
If no breakpoint is found, the label is truncated at a point which allows a
hyphen to be printed.
Error Processing
There are two stages at which errors in the TABULATE command
are detected, during compilation and during execution. Errors detected during
compilation cause error messages to be displayed and the VisualPQL program is not
executed. Errors detected during execution cause the program to terminate.CELL CONTENTS ERROR error occurs,
check that:
The table is formatted after the program is run, and some problems are diagnosed
at this stage. The error WAFER/STUB OUTPUT FORMATTING ERROR
implies that there is a problem inserting the wafer and/or stub labels within their allocated
space in each wafer. Usually there are too many levels of nesting, within the
wafer or stub, for the width of the STUBWIDTH.TABULATE breaks the wafer into + "chunks". If a
BY clause has been specified, all of the rows or columns
beneath the highest level in a nesting must fit on a single page, otherwise a
CHUNK FORMATTING ERROR is reported. This implies that a wafer is too large to print on a single page and that it cannot
be broken at any 'clean' place to produce multiple 'chunks' of output. A wafer
can be broken into chunks only along the highest variable nested within the stub
or header. The solutions are to:TABULATE statement, or


