The Common BLSA
Data Set
1. Objectives and Introduction
The common BLSA data set is a set of files intended to provide
researchers working with the Baltimore Longitudinal Study of Aging
(BLSA) with a relatively current snapshot of the BLSA population so
that they may pursue their research objectives without the need to
repeatedly refer back to and interface with the BLSA masterfile
stored on the secure DUX computer system. The set of files is
designed to be dynamic in content, changing to reflect evolving
needs of BLSA researchers. It is also intended to be available to
BLSA researchers who can access it via their VMS accounts.
It should be further understood that these data should not be
used carelessly. Much of the data in the common BLSA data set has
been collected by a particular scientist in the course of his or
her research. While their inclusion in the common BLSA data set
implies that they are public, some of these data may still be
considered proprietary by the original scientist, and users should
be sensitive to that consideration. Also, data have changed over
the course of the Study. For example, lab tests have been handled
in different ways by different laboratories, both in-house and
third-party. Often, data items have been extracted and cleaned-up
by various researchers. Sometimes this improved data is sent back
for inclusion in the masterfile, and other times it is not (either
intentionally or unintentionally). In short, any researcher using
the information in the common BLSA data set must be aware of the
limitations and history of the data. It is often wise to consult
the in-house experts regarding subtleties in data that might not be
obvious or widely known.
2. Security and Privacy Considerations
The easy access nature of the dataset raises the important
aspect of privacy of the data. The BLSA files are a system of
records within the scope of PL 93-579, the Privacy Act of 1974,
which dictates that the information contained in the files may be
disclosed only on a "need to know" basis. As the "system manager"
for the BLSA, the Computer Scientist of the Data Management Services
Section, RRB is the responsible NIA official for assuring compliance
with the Privacy Act. Every researcher who accesses these data has
the legal responsibility to respect and safeguard the
confidentiality of the information. While the nature of
collaborations necessitates sharing of some of these data, release
of files should be done only when the BLSA researcher is sure that
the implications of the Privacy Act are understood by the recipient
of the data, and only necessary data items are to be released.
Under no circumstances should identifying information (names,
addresses, etc.) be linked with individual records, and there
should be no cross-reference listings of identification numbers
(xray numbers, in BLSA parlance) and names generally available.
3. How to Access the Data
The common BLSA data set is referenced by system-wide dataset
names on the Gerontology Research Center's (GRC) VMS Cluster. In
order to access these data, two criteria must be met. First, the
user must be assigned or have access to a Username and Password
valid on the GRC VMS cluster. This can be obtained from the Network, Computing and
Telephony Section (NCTS) on extension 8000.
The second requirement is that the user be cleared to have
access to the BLSA dataset and be designated an authorized user of
those data. The clearance procedure includes completion of a small
amount of paperwork in which the user specifies the nature of their
access of the data, the time frame of their need for access, a
promise that they will provide feedback on any errors they may
find, and a certification that they will be responsible for the
provisions of the Privacy Act of 1974 as relates to any data they
retrieve. Form BLSA-CDS-01 should be used for this purpose, and
can be obtained from the Data Management Services Section (DMSS) staff of the
Research Resources Branch (RRB), on extension 8144. A copy of
this form is attached to this document.
Once the above prerequisites are fulfilled, the user may log
onto the VMS cluster and read these datasets. They are then free to use
the data in whatever way they see fit. The names, content and
structure of the files is detailed in section 5.
4. Modifications of the Common BLSA Data Set
Modifications may take two forms correction of errors and
changes in the contents of the data sets. Errors may be the result
of transcription mistakes, incorrect tabulation, improper
interpretation, or incomplete or missing information. Changes in
content are additions or deletions with respect to the dictionary
of data items contained in the dataset, but not in the values
assigned to any of those data items.
Error corrections or approved modifications to the data
dictionary will appear in the next regularly published edition of
the common BLSA data set.
4.1 What To Do If Errors Are Found in the Data
The common BLSA data set as created on the VMS cluster reflects the
current state of the BLSA masterfile at the time when the snapshot
is created. All of the data items are extracted from the raw data
in the masterfile, and no massaging or pre-processing is done
between extraction from the masterfile and storage to the common
data set. This implies that the integrity and accuracy of the
common data set is no better or worse than the masterfile from
which it is drawn, and there is a reasonable likelihood that there
are errors in some masterfile data.
As stated above, there are various circumstances that would
result in errors in the data set. Once the data are entered into
the masterfile and verified, it is unlikely that DMSS staff will find
the errors, unless they result in inconsistencies in the database.
Therefore, the most likely individuals to discover any errors are
the BLSA researchers, for two reasons. First, they will be using
the data, and will therefore be looking at it closely. Secondly,
it is expected that a researchers know their own data better than
anyone else, and will therefore be able to spot errors or
omissions.
In order that the common data set be of the most use to the
most people, it is expected that researchers will inform RRB Data
Management Services whenever they encounter errors in the data. If the
problem is incompleteness of the data set, then additional data
should be provided to the DMSS staff for inclusion in the masterfile
(and, thus, subsequent common BLSA data sets). If the problem is
errors, then corrections should be submitted, and, if possible,
suggestions as to the source of the error. Bulk data may be
submitted on floppy disk, in a flat ASCII DOS file. Individual
corrections can be submitted in writing. Obviously, all data items
should identify the id number and visit to which they apply. Form
BLSA-CDS-02 should accompany any corrections, and can be obtained
from DMSS staff. A copy of this form is attached to this document.
4.2 Changing the Common BLSA Data Set Dictionary
A data dictionary is a list of the data items that may be
found in a database, as is provided in sections 5.2. The programs
used to generate the common data set have been designed so that
changes to the dictionary can be accommodated easily. However, any
such changes must be approved by the BLSA Steering Committee before
being incorporated into the common BLSA data set.
If a researcher desires additional data not found in the common BLSAdata set, there are two possible avenues. First, if they
believe that the data are genuinely of common long-term interest,
then application can be made, on form BLSA-CDS-03 (a copy of which
is attached), to the BLSA Steering Committee for inclusion of these
data. Exclusions can be handled similarly.
However, if the desired data are of individual and/or short-term interest, then the researcher
should meet with DMSS staff.
Individual programs can be written to assist the researcher in
retrieving data particular to their work.
JM - Disclaimer
Revised 11/7/97 -- Send comments to our Web Master. These Pages are formatted for
Netscape, other browsers may experience difficulty viewing them!