Information Services

Lightning Talk: Research Abstract

How Big Is Too Big: A Case Study Using Keyword Text Analysis and Unsupervised Machine Learning to Manage Large Search Results

Sunday, May 5
4:40 PM - 4:45 PM
Room: Columbus KL (East Tower, Ballroom/Gold Level)

Objectives : To employ the use of a data-driven technique to refine a complicated search strategy by analyzing contributions of individual keywords. To prioritize results from a comprehensive, complex search on a broad topic using machine learning.

Methods : We developed a comprehensive search strategy on a diffuse public health subject that resulted in more than 50,000 results from PubMed. To refine the search strategy, we conducted keyword analysis that indicated the relative contribution by individual keywords and informed our decision-making around terms to exclude from the final search strategy. Using this method we revised the search strategy by eliminating several keywords. After executing the final search, we used unsupervised machine learning or clustering to prioritize a subset of studies for manual review. Specifically, we used a clustering algorithm that groups like references based on text similarities in titles and abstracts. For each cluster, the software produces a set of keywords or a topic “signature”. We used these topic signatures to select clusters to prioritize for manual review.

Results : Our initial search in PubMed yielded approximately 50,000 results. Keyword analysis informed our decisions to eliminate several terms that were causing excessive “noise”. The revised search was reduced to approximately 15,000 records. These records were clustered using unsupervised machine learning and we prioritized approximately 1,400 studies for manual review.

Conclusions : Keyword analysis informs searchers of potentially irrelevant terminology in their search strategies and is useful in shining a light on a set of results at any stage of developing a search strategy. Unsupervised machine learning or clustering assists research groups in prioritizing a subset of literature from a large set of results and is particularly useful for broad, comprehensive literature reviews. Our initial results from manual screening indicate a higher degree of precision than would be typically expected from such a large set of search results.

Send Email for Jennifer Walker

Barbara Rochen Renner

Library Services Evaluation Specialist/Allied Health Sciences Liaison and Adjunct Professor, Allied Health Sciences
University of North Carolina at Chapel Hill Health Sciences Library and School of Medicine, Department of Allied Health Sciences
Chapel Hill, North Carolina

Barbara Rochen Renner, PhD, is Library Services Evaluation Specialist and Liaison, Allied Health Sciences, in the Health Sciences Library, and an Adjunct Professor in the Department of Allied Health Sciences in the School of Medicine, at the University of North Carolina at Chapel Hill. Her professional interests and areas of expertise include education of professionals in the health professions, curriculum development, program evaluation, mentoring, qualitative research resources, and supporting diversity and inclusion and wellness in health professions education. As liaison to and a faculty member in the Department of Allied Health Sciences, she teaches course and curriculum integrated class sessions; supports faculty, staff, students and administrators; and is currently serving a three-year term on the Research Advisory Committee. She is currently involved in a number of diversity & inclusion activities in the department. She also initiated and coordinates the Health Sciences Library's YOUR HEALTH® Radio partnership with faculty in the Department of Family Medicine at UNC-Chapel Hill. Prior to her current positions, she served as the Distance Learning Specialist for UNC-Chapel Hill’s libraries, was a faculty researcher at the Thurston Arthritis Research Center, and worked with the state-wide autism program, TEACCH. She also served as Director of the Division of Student Affairs in the Office of the Dean of the School of Medicine.


Send Email for Barbara Renner

Michelle Cawley

Head of Clinical, Academic, and Research Engagement
UNC Chapel Hill Health Sciences Library
Chapel Hill, North Carolina

Michelle Cawley, MA, MLS is the Head of Clinical, Academic, and Research Engagement (CARE) with the Health Sciences Library at the University of North Carolina at Chapel Hill. In this role, she leads a team of librarians who engage and partner with the University’s schools of dentistry, medicine, nursing, pharmacy, and public health as well as multiple clinical departments within UNC Medical Center. She supports innovation, outreach, and curriculum engagement and has deep experience in the application of machine learning solutions to improve the efficiency of completing scoping reviews, systematic reviews, and other large-scale literature reviews. In her previous role as project manager and senior librarian with ICF – a management consulting firm – Ms. Cawley supported clients from U.S. EPA’s National Center for Environmental Assessment (NCEA), U.S. EPA’s Office of Pollution Prevention and Toxics (OPPT), and the National Institute of Environmental Health Sciences (NIEHS). At ICF, Ms. Cawley was on the development team for a machine learning application for reducing manual burden of reviewing literature search results, managed multiple projects including the development of the IRIS Toxicological Review for hexavalent chromium, and created several courses for EPA’s Risk Assessment and Training Experience (RATE) program.


Send Email for Michelle Cawley


How Big Is Too Big: A Case Study Using Keyword Text Analysis and Unsupervised Machine Learning to Manage Large Search Results

Audio Slides Handouts Video

Attendees who have favorited this

Please enter your access key

The asset you are trying to access is locked. Please enter your access key to unlock.

Send Email for How Big Is Too Big: A Case Study Using Keyword Text Analysis and Unsupervised Machine Learning to Manage Large Search Results