The Common BLSA Data Set



1. Objectives and Introduction
The common BLSA data set is a set of files intended to provide researchers working with the Baltimore Longitudinal Study of Aging (BLSA) with a relatively current snapshot of the BLSA population so that they may pursue their research objectives without the need to repeatedly refer back to and interface with the BLSA masterfile stored on the secure DUX computer system. The set of files is designed to be dynamic in content, changing to reflect evolving needs of BLSA researchers. It is also intended to be available to BLSA researchers who can access it via their VMS accounts.
It should be further understood that these data should not be used carelessly. Much of the data in the common BLSA data set has been collected by a particular scientist in the course of his or her research. While their inclusion in the common BLSA data set implies that they are public, some of these data may still be considered proprietary by the original scientist, and users should be sensitive to that consideration. Also, data have changed over the course of the Study. For example, lab tests have been handled in different ways by different laboratories, both in-house and third-party. Often, data items have been extracted and cleaned-up by various researchers. Sometimes this improved data is sent back for inclusion in the masterfile, and other times it is not (either intentionally or unintentionally). In short, any researcher using the information in the common BLSA data set must be aware of the limitations and history of the data. It is often wise to consult the in-house experts regarding subtleties in data that might not be obvious or widely known.

2. Security and Privacy Considerations
The easy access nature of the dataset raises the important aspect of privacy of the data. The BLSA files are a system of records within the scope of PL 93-579, the Privacy Act of 1974, which dictates that the information contained in the files may be disclosed only on a "need to know" basis. As the "system manager" for the BLSA, the Computer Scientist of the Data Management Services Section, RRB is the responsible NIA official for assuring compliance with the Privacy Act. Every researcher who accesses these data has the legal responsibility to respect and safeguard the confidentiality of the information. While the nature of collaborations necessitates sharing of some of these data, release of files should be done only when the BLSA researcher is sure that the implications of the Privacy Act are understood by the recipient of the data, and only necessary data items are to be released. Under no circumstances should identifying information (names, addresses, etc.) be linked with individual records, and there should be no cross-reference listings of identification numbers (xray numbers, in BLSA parlance) and names generally available.

3. How to Access the Data
The common BLSA data set is referenced by system-wide dataset names on the Gerontology Research Center's (GRC) VMS Cluster. In order to access these data, two criteria must be met. First, the user must be assigned or have access to a Username and Password valid on the GRC VMS cluster. This can be obtained from the Network, Computing and Telephony Section (NCTS) on extension 8000.
The second requirement is that the user be cleared to have access to the BLSA dataset and be designated an authorized user of those data. The clearance procedure includes completion of a small amount of paperwork in which the user specifies the nature of their access of the data, the time frame of their need for access, a promise that they will provide feedback on any errors they may find, and a certification that they will be responsible for the provisions of the Privacy Act of 1974 as relates to any data they retrieve. Form BLSA-CDS-01 should be used for this purpose, and can be obtained from the Data Management Services Section (DMSS) staff of the Research Resources Branch (RRB), on extension 8144. A copy of this form is attached to this document.
Once the above prerequisites are fulfilled, the user may log onto the VMS cluster and read these datasets. They are then free to use the data in whatever way they see fit. The names, content and structure of the files is detailed in section 5.

4. Modifications of the Common BLSA Data Set
Modifications may take two forms correction of errors and changes in the contents of the data sets. Errors may be the result of transcription mistakes, incorrect tabulation, improper interpretation, or incomplete or missing information. Changes in content are additions or deletions with respect to the dictionary of data items contained in the dataset, but not in the values assigned to any of those data items.
Error corrections or approved modifications to the data dictionary will appear in the next regularly published edition of the common BLSA data set.

4.1 What To Do If Errors Are Found in the Data
The common BLSA data set as created on the VMS cluster reflects the current state of the BLSA masterfile at the time when the snapshot is created. All of the data items are extracted from the raw data in the masterfile, and no massaging or pre-processing is done between extraction from the masterfile and storage to the common data set. This implies that the integrity and accuracy of the common data set is no better or worse than the masterfile from which it is drawn, and there is a reasonable likelihood that there are errors in some masterfile data.
As stated above, there are various circumstances that would result in errors in the data set. Once the data are entered into the masterfile and verified, it is unlikely that DMSS staff will find the errors, unless they result in inconsistencies in the database. Therefore, the most likely individuals to discover any errors are the BLSA researchers, for two reasons. First, they will be using the data, and will therefore be looking at it closely. Secondly, it is expected that a researchers know their own data better than anyone else, and will therefore be able to spot errors or omissions.
In order that the common data set be of the most use to the most people, it is expected that researchers will inform RRB Data Management Services whenever they encounter errors in the data. If the problem is incompleteness of the data set, then additional data should be provided to the DMSS staff for inclusion in the masterfile (and, thus, subsequent common BLSA data sets). If the problem is errors, then corrections should be submitted, and, if possible, suggestions as to the source of the error. Bulk data may be submitted on floppy disk, in a flat ASCII DOS file. Individual corrections can be submitted in writing. Obviously, all data items should identify the id number and visit to which they apply. Form BLSA-CDS-02 should accompany any corrections, and can be obtained from DMSS staff. A copy of this form is attached to this document.

4.2 Changing the Common BLSA Data Set Dictionary
A data dictionary is a list of the data items that may be found in a database, as is provided in sections 5.2. The programs used to generate the common data set have been designed so that changes to the dictionary can be accommodated easily. However, any such changes must be approved by the BLSA Steering Committee before being incorporated into the common BLSA data set.
If a researcher desires additional data not found in the common BLSAdata set, there are two possible avenues. First, if they believe that the data are genuinely of common long-term interest, then application can be made, on form BLSA-CDS-03 (a copy of which is attached), to the BLSA Steering Committee for inclusion of these data. Exclusions can be handled similarly.
However, if the desired data are of individual and/or short-term interest, then the researcher should meet with DMSS staff. Individual programs can be written to assist the researcher in retrieving data particular to their work.


JM - Disclaimer

Revised 11/7/97 -- Send comments to our Web Master. These Pages are formatted for Netscape, other browsers may experience difficulty viewing them!