Existing Data - Office of Research, Innovation and

Existing Data
(Also see Data Security FAQs for more information)
Identifiable Data: Any information about a living individual that is linked, associated with, or contains the name or any
details of the individual that would allow someone to be able to directly or indirectly identify a subject from the information
collected.
Direct identifiers: Identities of individual subjects are kept by the investigator. If subjects' identities are inseparable
from data, then data is directly identifiable. Direct identifiers in research data or records include names; postal
address information (other than town or city, state and zip code); telephone numbers, fax numbers, e-mail
addresses; social security numbers; medical record numbers; health plan beneficiary numbers; account numbers;
certificate /license numbers; vehicle identifiers and serial numbers, including license plant numbers; device identifiers
and serial numbers; web universal resource locators ( URLs); internet protocol (IP) address numbers; biometric
identifiers, including finger and voice prints; and full face photographic images and any comparable images.
Indirect identifiers: Identities are kept separate from data, with information connecting them maintained by codes
and a master list. Indirect identifiers in research data or records include all geographic identifiers smaller than a state,
including street address, city, county, precinct, Zip code, and their equivalent postal codes, except for the initial three
digits of a ZIP code; all elements of dates ( except year ) for dates directly related to an individual, including birth
date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates ( including year)
indicative of such age, except that such age and elements may be aggregated into a single category of age 90 or
older.
Anonymous Data is: any information about a living individual that was collected in a manner that identifiers were never
associated with the information and that no one was ever able to identify from whom the information was collected.
Subjects' identities are unknown to the investigator, not requested, not recorded and not given. There is no possible way
that the researcher, research team or anyone else could possibly link the data to the participant.
De-Identified Data: identifiers have been removed from the dataset in a manner that any member of the research team is
not able to identify the individual from whom such information was collected.
Coded Data: identifiers have been removed from the dataset but can readily be found through the use of a master list that
is accessible to the investigator.
• The link that cross-references the subject’s identity with the code should be stored in a separate location from the
data and should be locked. Consideration should be given by the Principal Investigator as to how many and which
staff should have access to the link. Limiting the number of staff who have access to the link should be considered
for more sensitive high-risk data.
Confidential Data: Confidential data is not anonymous. It is the protection of study participants’ data such that an
individual participant’s data is protected and will not be disclosed except to another authorized person.
Data Transfer:
• All electronic transmission of private and sensitive information over the Internet must be *encrypted*. This
includes email, file transfers and other data transfer modalities.
• Paper transfer: transferred by snail mail, fed-ex, hand carried by member of the study team? Data transfer needs
to be protected from a breach (e.g., data transferred separately from consent forms, codes).
Data Storage Basic Tips:
• Separate data files from consent forms and from master lists (if applicable).
• Paper records containing research data should be stored in a locked cabinet with access limited to research team.
• Electronic data should be stored in a password protected file (or encrypted if necessary) on a password protected
computer in a locked cabinet/desk in a locked office.
• Electronic data should be stripped of identifying information as soon as possible.
• Master lists should be kept in a separate password protected file (or hard drive) than the coded data set
• The level of security and restriction should increase depending on the level of sensitive data being captured in the
research records.
• Data protection plans must consider all record-keeping processes and storage of data from the initial collection to
post-study storage or destruction or complete de-identification of the data. Such plans should include details to all
modes of storage: paper, electronic, video/audio recordings, films, etc.
• Data storage plans should include notes about keeping all software and protective software up to date
Questions to Answer when Handling Existing Data
1. Is the data identifiable or is it anonymous?
a. If the data is identifiable, your study does not qualify for Exemption.
2. What is the nature of the data?
a. Electronic (audio or text), hardcopy files, or biological specimens?
b. Do the data contain protected health information, personal identifying information
or other sensitive information?
c. Are identifiers retained and linked to the data? Who will have access to the data
and identifiers?
d. Are the data stripped of identifiers and the identifiers destroyed?
e. Are identifiers de--‐linked from the data and managed by use of a code? How are
the identifiers, data files and key managed and secured? Who will have access
to the identifiers, data files and key?
3. How do you have access to the data?
a. Is it appropriate for you to access the data? Is it a conflict of interest?
b. Is someone providing you with the data?
c. Is the data originally identifiable, but then someone else besides the research team is
removing identifying information?
4. How will data be transferred or transported?
a. Who is transferring the data to you?
b. How will electronic files be transmitted?
c. How will hardcopy files be transported?
d. How are the files and data protected while in transmission or when transported?
5. Where/how will the data be stored and what security measures will be used for each?
a. Office computer? Personal laptop? University laptop? Office file cabinet?
Thumb/jump drive? Departmental server, etc.?
b. What security measures will be used with each (password protected; encryption;
locked file cabinet in locked office, 128 bit encryption, etc.)?
c. Who will have access to the computer/laptop/or files?
d. Is the computer on a network?
e. Is the data behind a firewall?
6. When and how will data be deleted or destroyed?
a. Will cloud--‐computing resources be used? (refer to NCSU resources below)
b. What is the resource and what is the privacy policy for the resource?
c. Do you have a data agreement and is your procedure in line with their stated
procedure?
7. Is there a data use agreement?
a. Are your practices and expectations in line with what is articulated in that agreement?
b. Are you compiling multiple data sets? Does compiling those data sets put the
information about the people in those datasets, at more risk because of how you have
compiled the data?
Best Practices for Data Security in Google Apps at NCSU
http://google.ncsu.edu/usinggoogleapps/best-practices-data-security-google-apps-nc-state#AppropriateUse
Storage Locations for University Data
http://oit.ncsu.edu/security-standards-compliance/storage
NCSU Office of Information Technology (OIT) Information
Units: http://oit.ncsu.edu/units
Training and Consulting: http://oit.ncsu.edu/n/welcome-training-consulting
Safe Computing: http://oit.ncsu.edu/safe-computing