Application Series 7
DSS
MASTERS
SURVEYS
Stuart Mitchenall, Computing Services Manager of Department of Social Security,
describes the skills his Department has developed with complex survey analysis and the
essential role of SIR.
Background
Since the early 1980's the Department of Social Security (DSS), a Department of the UK
Government providing payments for pensions, disability, unemployment and other
(generally) low income groups, had been having problems with the detail available in existing
socio-economic and household survey data. The analysts of the department were faced with
having to use very small samples of particular client groups (such as single male pensioners
over 80 or disabled women in their 50's) on which to base forecasts of significant government
expenditure.
Solution
By the end of the decade it had been agreed that the best answer to this problem was to
launch a survey based on the needs of the Department with a sample size adequate to gain
statistically valid samples of a far higher percentage of interest groups. As some groups of
benefit recipients totalled only 4 to 5,000 in the entire population, it was obvious that without
specifically targeting groups of the population, adequate samples would not be achieved.
For a variety of reasons the survey was targeted at a general sample, and not a specific
group. Issues such as confidentiality, problems with using DSS data to identify clients, and
the UK Data Protection Acts, all had an effect.
The survey was named the Family Resources Survey (FRS) and the project launched using
contracted resources.
The use of Computers
From day one DSS insisted these contractors used computer aided personal interviewing
techniques. We had, prior to the award of the contracts, conducted an analysis of the
various packages available and had decided to use Blaise, a tool developed and supplied by
the Central Bureaux of Statistics (CBS) in the Netherlands. We had looked at various
solutions, but had decided that a specific tool able to run on the then available 286 portable
computers was better than trying to adapt other software.
Blaise performed best for the type of survey we were conducting, with complex routing and
validation features, combined with mathematical capability to validate quite complex
individual requirements for survey questions if needed.
SIR the Best option
Where does SIR come in? The UK government had established a pattern of using SIR for
surveys with an inherently hierarchical structure, and we had some experience of the use of
the database from earlier analytical work. We had also used Ingress and Oracle for specific
solutions, but found these products were inefficient when placed against the needs of our
analysts. That is not to say SIR was perfect, but our judgement was that it was better.
Further, it support reporting tools designed for the analytical environment, and in that sense
it was unique. So we decided to use our existing expertise and go with a solution we had
some confidence was a good fit to our data.
SIR Co-operates
Next problem was environment. We were obliged at that time to consider open systems
implementations very strongly, and certainly SIR was not available on the Unix (we really
wanted to use Unix) variants we were using for other purposes. DSS pushed us towards an
ICL solution, and we were able to get preferable terms on DRS 6000 equipment. SIR
agreed to co-operate in porting SIR to DRS/NX, and ICL agreed to cover the costs of
installing equipment to allow the port. From that point all went pretty smoothly. ICL installed
the system at the SIR offices, the port was completed to schedule, and made available to
us in Central London in time for us to develop the database ready for the first pilot data. How
did SIR interact with a CAPI system? Blaise already had extensive output facilities,
generating output files suitable for use in import to, amongst others, SAS and SPSS. With
contract help we were able to generate a standard SIR export file to import the data into SIR,
and had it not been for some predicted, but unaddressed, problems, we would have had a
very simple importation system for our database.
The Problems
Firstly, we failed to treat the process as a single task. The companies conducting the survey
were also given the task of writing the survey instrument for use in the field, and they wrote
the survey without any thought of the output format they were generating or the eventual
format of the database. They generated an ideal survey instrument without further
consideration, and this was a mistake. Because of the way Blaise formats output, it would
have been quite possible to have ensured that the output data structure was block structured
in line with the database structure, and that output was consistent case to case. Neither of
these were true, unfortunately, so large amounts of time have had to be spent analysing
individual cases to determine data value positions.
Secondly, Blaise follows some unique naming conventions relating the recurring incidence
of data - income for head of household, spouse, children, lodger, granddad, etc. Thus
changes in data position within the output structure also result in changes in name.
Finally, when we embarked upon the survey the portables available were more restricted
than those today, and the questionnaire had to be split to enable the programs to run. This
meant the questionnaire had to be split. Ideally, had it been split into database record types,
problems would have been immensely reduced, but at the same time validation of records
would have been complicated across both individuals and households. We ended up with
two large records (Household and Benefit Unit) which have to be read into import database
records. We then process the import data to correct all the problems and derive the
analytical record structure of the database.
Lessons
I leave those to you, but remain confident that using a computerised field instrument in
conjunction with a SIR database provides, if properly implemented, a very rapid way of
generating large socio-economic database with the majority of the data prevalidated. The
principal adopted early, that if you don't collect the right information at source then however
you impute data is never entirely dependable, remains true. Had we put more thought to the
interface between Blaise and SIR when designing the whole system, I am sure we would
have cut our processing times and resources required by an order of magnitude.
For further information about DSS use of SIR please contact:
Stuart Mitchenall
Computing Services Manager
Department of Social Security
Adelphi Building
1-11 John Adam Street
London WC2N 6HT UK
Back Sir Home