The Common BLSA Data Set



5. Structure and Content of the Common BLSA Data Set
The common BLSA data set is generated and published quarterly, on or about the first business day of the calendar quarter (January 1, April 1, July 1, and October 1). Obviously, work schedules, major late-breaking modifications, or equipment problems may have an impact on this schedule.

5.1 File Structure
The common BLSA data set, as resident on the VMS cluster, is comprised of six sets of files. This structure was developed to most easily accommodate the diversity in the information presented. These sets of files are: For each of these sets, five files can be found on the VMS cluster. Each is identified by a unique suffix, as follows: The data files are, depending on the data presented, formatted as either fixed-field or delimited. For the fixed-field format, columns are specified in the next section and missing data are left blank. For the delimited format: the delimiter is "," (comma); text data are enclosed in "'"s (single quotes); missing numeric data are denoted by "-.1"; and missing text data are indicated by "''"s (two single quotes).
The SAS and SPSS-X code files provide descriptions of the flat, ASCII files. The SAS files contain at least "input", "label" and "format" statements. Similarly, the SPSS-X files hold "data list", "variable label" and "value label" statements. These files can be cut and pasted as needed in cases where users want to create their own SAS or SPSS-X data sets. The SAS permanent data sets and the SPSS-X system files are the result of compiling the raw data with modifications of the SAS and SPSS-X code files provided in BLS$CDS. These are provided as a convenience and possible shortcut for users.
SAS value formats are generated and stored permanently for all core data set variables which are nominal or ordinal in nature and have discrete interpretations. Value format names are comprised of an "_" (underscore) appended to the variable name. Value format names for character data are prefaced by a "$" (dollar sign). In order to use a permanently saved format, the user needs to include the SAS statement "LIBNAME LIBRARY 'BLS$CDS';" and a format statement such as "FORMAT STAT $STAT_.;" in his or her SAS routine.
SPSS-X value labels are also defined and saved in the dictionary of all SPSS-X system files for variables which have distinct interpretations. SPSS-X values are automatically displayed on the output of many procedures and require no additional SPSS-X statements when using SPSS-X system files.
These files are all read-only. Researchers can use these in any way most appropriate to their work. They may choose to use the raw data and extract information themselves. They may choose to use the entire packaged data sets or use the statistical package to create a smaller file of more specific applicability to their work and more easily and efficiently handled.
In addition to the above data files, there is also a file referenced as BLS$CDS$DOC$TXT which contains this document. Additional hard copies of this documentation can be obtained by printing this file.

5.2 Data Contained in Each Component Data File
Provided in the following sections are the data dictionaries of the component data files. For each, the identity of the data item, its SAS/SPSS name, location in the data file, and source (in the masterfile) is provided, along with relevant notes (if any) regarding its use.
Due to the flexible nature of the BLSA and the history of starting and ending tests and changing data collection protocols, there are many data items that are missing for a large number of participant visits, or have moved over the course of the study from one variable to another. It is important to "know the data" in order to make the most sensible interpretation of it.
General formatting guidelines for the data files are as follows:


JM - Disclaimer

Revised 11/7/97 -- Send comments to our Web Master. These Pages are formatted for Netscape, other browsers may experience difficulty viewing them!