The Common BLSA
Data Set
5. Structure and Content of the Common BLSA Data Set
The common BLSA data set is generated and published quarterly,
on or about the first business day of the calendar quarter (January
1, April 1, July 1, and October 1). Obviously, work schedules,
major late-breaking modifications, or equipment problems may have
an impact on this schedule.
5.1 File Structure
The common BLSA data set, as resident on the VMS cluster, is comprised
of six sets of files. This structure was developed to most easily
accommodate the diversity in the information presented. These sets
of files are:- fixed-field, numeric coded quantitative data,
saved in BLS$CDS$FIX$<suffix> by id number and visit
- data of a non-longitudinal nature, such as race, saved in
BLS$CDS$NL$<suffix> by id number
- prescription and medication information, saved in
BLS$CDS$RX$<suffix> by id number and visit
- information on diagnoses, cumulative from the most recent
visit, saved in BLS$CDS$DX$<suffix> by id number and latest
visit number
- cumulative information on hospitalizations and major illnesses
(morbidity), saved in BLS$CDS$HI$<suffix> by id number
- cumulative information on surgical and test procedures, saved
in BLS$CDS$PRC$<suffix> by id number
For each of these sets, five files can be found on the VMS cluster.
Each is identified by a unique suffix, as follows:
- <suffix> = DAT is the flat, ASCII file of raw
data
- <suffix> = SAS is the description of the file in SAS
format
- <suffix> = SPS is the description of the file in
SPSS format
- <suffix> = SSD is the data set interpreted as
described by the SAS description and written in the SAS save
format
- <suffix> = SPX is the data set interpreted as
described by the SPS description and written in the SPSS save
format
The data files are, depending on the data presented, formatted
as either fixed-field or delimited. For the fixed-field format,
columns are specified in the next section and missing data are left
blank. For the delimited format: the delimiter is "," (comma);
text data are enclosed in "'"s (single quotes); missing numeric
data are denoted by "-.1"; and missing text data are indicated by
"''"s (two single quotes).
The SAS and SPSS-X code files provide descriptions of the flat,
ASCII files. The SAS files contain at least "input", "label" and
"format" statements. Similarly, the SPSS-X files hold "data list",
"variable label" and "value label" statements. These files can be
cut and pasted as needed in cases where users want to create their
own SAS or SPSS-X data sets. The SAS permanent data sets and the
SPSS-X system files are the result of compiling the raw data with
modifications of the SAS and SPSS-X code files provided in BLS$CDS.
These are provided as a convenience and possible shortcut for
users.
SAS value formats are generated and stored permanently for all
core data set variables which are nominal or ordinal in nature and
have discrete interpretations. Value format names are comprised of
an "_" (underscore) appended to the variable name. Value format
names for character data are prefaced by a "$" (dollar sign). In
order to use a permanently saved format, the user needs to include
the SAS statement "LIBNAME LIBRARY 'BLS$CDS';" and a format
statement such as "FORMAT STAT $STAT_.;" in his or her SAS routine.
SPSS-X value labels are also defined and saved in the
dictionary of all SPSS-X system files for variables which have
distinct interpretations. SPSS-X values are automatically
displayed on the output of many procedures and require no
additional SPSS-X statements when using SPSS-X system files.
These files are all read-only. Researchers can use these in any
way most appropriate to their work. They may choose to use the raw
data and extract information themselves. They may choose to use
the entire packaged data sets or use the statistical package to
create a smaller file of more specific applicability to their work
and more easily and efficiently handled.
In addition to the above data files, there is also a file
referenced as BLS$CDS$DOC$TXT which contains this document.
Additional hard copies of this documentation can be obtained by
printing this file.
5.2 Data Contained in Each Component Data File
Provided in the following sections are the data dictionaries
of the component data files. For each, the identity of the data
item, its SAS/SPSS name, location in the data file, and source (in
the masterfile) is provided, along with relevant notes (if any)
regarding its use.
Due to the flexible nature of the BLSA and the history of
starting and ending tests and changing data collection protocols,
there are many data items that are missing for a large number of
participant visits, or have moved over the course of the study from
one variable to another. It is important to "know the data" in
order to make the most sensible interpretation of it.
General formatting guidelines for the data files are as
follows:
- fixed-length, fixed-column data are located in the columns
indicated in the following charts with missing data simply left
blank, no delimiters, and no special treatment of text data
- all variable-length, free-field data are delimited by commas
- text variable-length, free-field data are enclosed in single
quotes, with missing data represented by ''
- missing numeric variable-length, free-field data are
represented by -.1
JM - Disclaimer
Revised 11/7/97 -- Send comments to our Web Master. These Pages are formatted for
Netscape, other browsers may experience difficulty viewing them!