FAIR Data in AI/ML: Exercise 2

Practicum AI FAIR training icon

Student Instructions

After the last exercise where you worked to compile data on the effects of running on heart rate, you have started to think that you might want to do more research on the overall effects of exercise on fitness. Perhaps, this research will lead you to launch your company’s new line of fitness trackers!!

As such, you decide to start looking through the existing data on the effects of different exercises on fitness. There is already a fair bit of data published on this, so why not make use of those data?

Searching the literature for published datasets

When researchers publish their work, it is usually expected that they will also make their data available. How they do this had changed over time and often depends on discipline. As recently as the early 2000s, it was common that people would write something like:

Data are available upon request from the corresponding author.

But how do you track down someone who may have moved institutions? What if they don’t reply to your email requesting the data?

In this exercise, we will attempt to locate data for some studies. We will use the PubMed search engine.

  1. Start at the PubMed portal at: https://pubmed.ncbi.nlm.nih.gov/
  2. Each group will be assigned one of the following types of activities to research:
    • Running, jogging, cycling, walking, rowing, climbing stairs
  3. Enter the search: Effect of ______ on physical fitness and click the search button, filling your search term in the blank.
  4. Before the next step, add some filters to help increase your chances of finding a dataset:
    • Under the Article Attribute section, click the box “Associated data” Screenshot showing the associated data checkbox
    • Scroll down and find the Additional filters button and click that
      • In the “Article Type” section, select: Clinical Study, Observational Study and Validation study and click Show. Screenshot of the additional filters box
  5. Proceed to the next section to find the datasets.

Finding datasets

  1. Now that you have the search results and have filtered them some, read through some titles. You are looking for studies that sound like they might have data to address your question (looking at the effect of __ on fitness).
  2. Your goal is to attempt to locate datasets from three to four studies.
  3. Click on a tile, and in the next page, click on the DOI (Digital Object Identifier)

“The DOI system provides a technical and social infrastructure for the registration and use of persistent interoperable identifiers, called DOIs, for use on digital networks.” doi.org/

Screenshot of the DOI Link

  1. Attempt to locate the data associated with the study.
    • Data are often linked in supplementary materials, the end of the article or methods section.
    • You don’t need to read or understand the article (though are welcome to), mostly skim, looking for the method to download the data that were collected.
    • Note: This is often not possible! Don’t worry, that is part of the exercise. If after a few minutes of looking at the article, you can’t find the data, that’s fine, fill out the Google form below indicating that.
  2. Fill in this Google form with your findings an submit it. Then repeat for the remaining articles until you have explored three to four studies. Or if you would rather have the form in a new window, click here.

Additional Information

Note about the activities

The exercise was inspired by the DataOne learning exercise:

DataONE Community Engagement & Outreach Working Group (2017) “Introduction to Data Management”. Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/lessons/01_management/index