The Use of Text Mining and Data Visualization

  • Slides: 22
Download presentation
The Use of Text Mining and Data Visualization to Assist in Managing a Scientific

The Use of Text Mining and Data Visualization to Assist in Managing a Scientific Grants Portfolio Elizabeth Ruben, Jerry Phelps, Kristianna Pettibone, and Christina H. Drew Program Analysis Branch Division of Extramural Research and Training National Institute of Environmental Health Sciences November 4, 2011 1

NIEHS Mission Reduce the burden of human illness and disability by understanding how the

NIEHS Mission Reduce the burden of human illness and disability by understanding how the environment influences the development and progression of human disease.

Purpose To investigate the use of the text mining/data visualization tool Omni. Viz™ as

Purpose To investigate the use of the text mining/data visualization tool Omni. Viz™ as a way to: • Help us understand patterns in our portfolio that could inform the management of science in a new way. • Visualize the assignment of grants to program officers. • Explore emerging areas of science. • Identify gaps in research. 3

What is Omni. Viz? Software designed to find and display trends in large amounts

What is Omni. Viz? Software designed to find and display trends in large amounts of data. Specifically designed for bio-medical, healthcare, pharmaceutical industries. 4

The Process: 1. Obtain our active grant portfolio data. 2. Limit our data set

The Process: 1. Obtain our active grant portfolio data. 2. Limit our data set by grant type and program to focus on our Research Grant Program portfolio. 3. Import data into Omni. Viz. 4. Select text mining algorithm. 5. Identify words to eliminate in the text mining algorithm. (stop words)

Question 1: Can Omni. Viz help us understand patterns in a portfolio that could

Question 1: Can Omni. Viz help us understand patterns in a portfolio that could inform the management of science in a new way? 6

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not the Omni. Viz default. = Cluster of grants = One grant 7 7

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not the Omni. Viz default. = Cluster of grants = One grant 8 8

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not the Omni. Viz default. = Cluster of grants = One grant 9 9

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not the Omni. Viz default. = Cluster of grants = One grant 10 10

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not

Galaxy: DERT Active Research Grants Legend: . Note: Labels are created by NIEHS; not the Omni. Viz default. = Cluster of grants = One grant 11 11

Initial View of Grant Clusters DERT Active Research Project Grant Portfolio Human Studies Transitional

Initial View of Grant Clusters DERT Active Research Project Grant Portfolio Human Studies Transitional Basic Science Training/Education Note: Labels are created by NIEHS; not the Omni. Viz default. 12

DNA Repair Grants Program Officer 1 Program Officer 2 Program Officer 3 Program Officer

DNA Repair Grants Program Officer 1 Program Officer 2 Program Officer 3 Program Officer 4 Grand Total Number of DNA Repair Grants 39 2 1 1 43 13 13

Question 2: Understand Program Administrator Workload Distribution • Examples of individuals across galaxy visualization

Question 2: Understand Program Administrator Workload Distribution • Examples of individuals across galaxy visualization • Similar/Different • Branch Distribution 14

Portfolio Distribution Across Program Officers Legend: Program Officer 1 Program Officer 2 15 15

Portfolio Distribution Across Program Officers Legend: Program Officer 1 Program Officer 2 15 15

Portfolio Distribution Across Program Officers Legend: Program Officer 1 Program Officer 3 16 16

Portfolio Distribution Across Program Officers Legend: Program Officer 1 Program Officer 3 16 16

Portfolio Distribution Across Branches Legend: Branch A Branch B Branch C 17 17

Portfolio Distribution Across Branches Legend: Branch A Branch B Branch C 17 17

Galaxy: DERT Active Research Grants Human Studies Transitional Basic Science Training/Education Note: Labels are

Galaxy: DERT Active Research Grants Human Studies Transitional Basic Science Training/Education Note: Labels are created by NIEHS; not the Omni. Viz default. 18 18

Program Officers by Category of Science Program Officer KG LO LC LR FT DC

Program Officers by Category of Science Program Officer KG LO LC LR FT DC CT CD DB DS JH LM SN CL KM MH AK CS Human Studies X X X X Transitional X X X Basic Science X X X X Training and Education X X Number of Categories 1 1 1 1 2 2 2 2 2 3 4 19

Pros and Cons of Using This Tool Pro Con • Big picture view of

Pros and Cons of Using This Tool Pro Con • Big picture view of our portfolio • Cost of software ($1, 000 annually for education and federal government) • Output • Novel way of doing pattern analysis • Steep learning curve • Ability to identify outliers • Transferring output to Power. Point is challenging • Cool factor is very high • Difficult to interpret 20

What questions could this method of analysis answer for you? • Strategic planning •

What questions could this method of analysis answer for you? • Strategic planning • Emerging areas of science • Gaps in research • Institute/Center niches/across all Institutes/Centers 21

Contact Information • Elizabeth Ruben: elizabeth. ruben@nih. gov

Contact Information • Elizabeth Ruben: elizabeth. ruben@nih. gov