FAIR Data in AI/ML: Exercise 1

Practicum AI FAIR training icon

Student Instructions

Data Organization

Background: A professor was interested in measuring the effect of exercise on heart rate. For this projects, students were assigned to record their resting pulse rate and then either run for one minute or sit for one minute and then record their pulse rate again.

In addition to their pulse, other information about their lifestyle was collected including height, weight, age gender, and smoking and drinking habits.

This experiment was performed over 3 years and the data files for each year are available to you here:

The hope is to combine all three of these data files into one analysis.

Activity

  1. (10 minutes) In small groups, open the three files and inspect them. What are some problems in the way the data are currently organized?
  2. (5 minutes) Before moving on to the next activity, have the class check in and discuss some of the findings.
  3. (10 minutes) Suggest a new system for organization. Create a new spreadsheet that can be used as a template for later years of data collection.

Following the activities, the class will discuss each group’s findings and suggestions.


Additional Information

After class discussion, we encourage you to read more about data management and spreadsheet organization.

Specifically, we suggest the following resources:

Notes about the datasets and activities

The datasets used in this exercise were derived from the Pulse Rates Before and After Exercise dataset provided by Dr Richard J. Wilson. The full dataset is a clean, well-organized dataset and the example files in this exercise were intentionally created for teaching purposes.

The exercise is designed using the DataOne learning exercise:

DataONE Community Engagement & Outreach Working Group (2017) “Data Entry and Manipulation”. Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/lessons/04_entry/index.