Research: Data Security

Security of research data is of particular importance to researchers collecting or using data that is sensitive, confidential or related to human subjects. Safe data handling should be addressed during the design of the project, and implemented throughout the life cycle of the data. 

To determine the level of security required, data must be classified using U of T’s data classification table.

Once the data classification is identified, and in consultation with Education Commons, appropriate tools can be recommended and data controls can be designed for specific research projects.

Data must first be classified based on the U of T Guidelines for Data Classification.

The classification will then dictate what controls are necessary to protect that data. 

Education Commons will use industry standards for Cybersecurity such as NIST 800-171 (National Institute of Standards and Technology Special Publication 800-171) and CMMC (Cybersecurity Maturity Model Certification) to identify appropriate data controls for your specific research project.

 

Data privacy breaches are a major risk for the University. The first step to preserving the privacy of data is to recognize where the vulnerabilities lie. A data breach could happen as a result of simple mistakes, such as:

  • losing a laptop, USB key or smartphone;
  • leaving files in a public place;
  • emailing the wrong person;
  • sharing personal information on social networking websites.

A breach may be linked to an unfortunate event, like a theft, or it may be caused by a sophisticated attack such as a deliberate hack. Each researcher should classify the types of information they hold and categorize the level of hazard attached to each category. 

You should take a layered approach to preventing data exposure, including:

  • Physical security. Protect against break-ins and theft of equipment containing personal data
  • Anti-virus and anti-malware software. Use it regularly, and keep it up-to-date
  • Access controls. Restrict access to systems to users and sources based on roles and responsibilities.  Each user must have their own username and password. You should use strong passwords and change them on a regular basis.
  • Awareness. Employees need to be aware of their roles and responsibilities. Train your staff to recognize threats such as phishing emails and malware.

Why anonymization?

Procedures to anonymize data should be considered, together with informed consent for data sharing and the need for access restrictions. Anonymization may be needed

  • for ethical reasons– to protect people’s identities in research, or
  • for legal reasons− not to disclose personal data, which is protected by law.

Anonymization factors to consider

  1. Anonymizing research data can be time consuming and therefore costly−early planning can help reduce costs.
  2. Personal data should never be disclosed from research information, unless a respondent has given specific consent in writing.
  3. A person’s identity can be disclosed from:
    • Direct identifiers: these are often collected as part of the research administration process, but are usually not essential research information and can therefore easily be removed from the data. (Examples include names, email addresses, home addresses, telephone numbers, and pictures.)
    • Indirect identifiers: When linked with other publicly available information sources, these could identify someone. (Examples include information on workplace, occupation, or exceptional values of characteristics like salary or age.)
  4. Re-users of data have the same legal and ethical obligation to NOT disclose confidential information as primary users.

Quantitative anonymization

Special attention may be needed for relational data, where connections between variables in related datasets can disclose identities, and for geo-referenced data, where identifying spatial references also have a geographical value. 

Qualitative anonymization

When anonymizing qualitative material, such as transcribed interviews, identifiers should not be crudely removed or aggregated, as this can distort the data or even make it unusable.  Instead pseudonyms (i.e. replacement terms) or vaguer descriptors should be used.  The objective is to achieve a reasonable level of anonymization, avoiding unrealistic or overly harsh editing, whilst maintaining maximum content.

Training

Training events and information sessions are available from the Division of the Vice President, Research & Innovation.