Multivariate Analysis of Collaboration Patterns Among Researchers in
Multivariate Analysis of Collaboration Patterns Among Researchers in an Epidemiological Cohort Study Presented at the American Evaluation Association Annual Conference October 18, 2013 Matthew Eblen, MPIA 1, Katherine Catevenis, MPH 1, Richard R. Fabsitz, Ph. D 2, Jean L. Olson, MD 2, Mona Puggal, MPH 2, Robin M. Wagner, Ph. D, MS 1 1 Division of Statistical Analysis and Reporting, Office of Planning, Analysis and Communication, Office of Extramural Research, Office of the Director, National Institutes of Health 2 Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, National Institutes of Health Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 1
Outline • Research Questions • Background on NHLBI-funded Cohort Study – Cardiovascular Health Study • Methods with Examples • Analysis • Summary • Next Steps Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 2
Research Questions • General Question: – What motivates researchers to collaborate with one another? • Specific Questions: – Are certain researcher characteristics associated with an increased likelihood to collaborate? – If so, can we measure the relative magnitude of these characteristics? Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 3
Cardiovascular Health Study Background • Cardiovascular Health Study (CHS): Started in 1988 to study development and progression of clinical coronary heart disease (CHD) and stroke in older adults – Cohort study funded by the National Heart, Lung, and Blood Institute (NHLBI) – Persons were recruited at 4 study field sites – Includes questionnaires, clinic exams, laboratory exams, and ongoing participant follow-up to identify clinical events – Includes occasional training events and a policy on data sharing • Information on journal articles associated with CHS, and published between 1990 – June 2011 was collected – Publications reported by study coordinating centers, augmented through Pub. Med searches – Co-author linkages identified Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 4
Methods - I • Builds on previous paper* applying social network analysis techniques to collaboration networks of two NHLBI-funded cohort studies, the CHS and Strong Heart Study – Two authors were said to have collaborated if they coauthored a publication together – Collaboration network was constructed using co-authorship linkages – Authors represented by nodes – Collaboration (co-authorship) caused a line to be drawn between collaborating authors – Network measures of density, diameter and centralization were calculated • Similar to traditional summary statistics *Eblen et al. , Social network analysis comparing researcher collaborations in two cardiovascular cohort studies , Research Evaluation (2012) 21 (5): 392 -405. doi: 10. 1093/reseval/rvs 030 Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 5
Example - I Here is one possible author collaboration network. author node instance of collaboration between two authors Density = 36% (16/45 = 36% of possible collaborations have occurred) This is the probability that two random authors in the network have collaborated. Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 6
Example - II Let color represent the author’s field of science. Estimates of how likely it is two authors collaborated will improve if their fields of science are known. e. g. , if both authors are red there is a 90% chance they have collaborated (9/10 possible collaborations have occurred among red authors) General observations about this network: 1. Red authors are more collaborative than blue authors (ten collaborations vs. seven collaborations). 2. Both red and blue authors are more likely to collaborate with one another than with an author of the other field of science (only one collaboration between blue and red). Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 7
Example - III Let node size represent seniority (larger nodes = more seniority). The more seniority an author pair has, the more likely it is they collaborated. Are red authors more collaborative than blue authors? Or are high seniority authors more collaborative than low seniority authors? Or both? Exponential Random Graph Models (ERGMs) are designed to answer such questions. Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 8
Methods - II • Used ERGMs to estimate the likelihood that two CHS authors would collaborate – Similar to traditional multivariate logistic regression models – Dependent variable is the probability that two authors will collaborate given characteristics of the authors (independent variables) – Isolates the contribution of one characteristic on the likelihood of collaboration while “controlling” for all other characteristics – ERGMs were fitted with the Statnet package in R (3. 0. 1) • Also modeled the Strong Heart Study collaboration network, but results not presented here due to time constraints Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 9
Methods - III • CHS author characteristics included in ERGM – Continuous Variables • Publication Productivity – Average # of publications per year • Network Seniority – Number of years since entering the network – Categorical Variables • Role in Study: – PI: Funded Principal Investigator – Co-Investigator: Any non-PI paid staff member of study (or NHLBI staff involved in study) – Neither: No formal study affiliation Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 10
Methods – III (cont. ) – Categorical Variables (cont. ) • Primary Field of Science (FOS): – Authors coded to the field of science associated with the journal they published in most often – If authors published in two FOS journals an equal amount they were classified as “More Than One FOS” • Training Events – Data Analysis Workshop Attendance – NHLBI sponsored workshops for new junior investigators in 2005 and 2007 • Data Sharing – Utilization of NHLBI Data Repository Data Set – Annually updated de-identified data set available since 2000, which is easily distributed by NHLBI to any qualified investigator – Formerly known as Limited Access Data Set (LADS) Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 11
Methods - IV • For categorical variable characteristics, two types of estimates were calculated: – Sociality • The general propensity of authors of a particular category to collaborate, regardless of whom with – e. g. , “red” authors were more collaborative than “blue” authors – If authors of a given pair differ in characteristic type, a different sociality estimate applies to each author in the pair – Assortative Mixing • The propensity of authors to collaborate specifically within their own categorical type – e. g. , “red” authors were more likely to collaborate with “red” authors than with “blue” authors – An assortative mixing estimate only applies to author pairs that match on a particular characteristic Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 12
Methods – IV (cont. ) • For continuous variable characteristics, two types of estimates were calculated: – Combined • Add together the values of both authors in the pair • E. g. , the more combined years of seniority an author pair had, the more likely it is they collaborated • Similar to sociality – Difference • Subtract the values of the author pair from one another • E. g. , the greater the difference in years of seniority between the author pair, the less likely it is they collaborated • Similar to assortative mixing Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 13
Methods – V • Estimates of characteristics (independent variables) shown in log odds (logit) form – Useful for showing the relative magnitude of each characteristic’s contribution to collaboration • 95% confidence intervals – Intervals that cross zero indicate the estimate is not statistically significant • Sociality estimates require a baseline reference group, assortative mixing estimates do not – Sociality considers all collaborative ties, so degrees of freedom are exhausted – Assortative mixing only considers collaborative ties within characteristic types, so degrees of freedom are not exhausted Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 14
CHS Collaboration Network Author Role in Study Red = Principal Investigator Orange = Co-Investigator Green= Neither # Authors = 1749 Density = 2% Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 15
Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 16
Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 17
CHS Collaboration Estimates, by Publication Productivity and Network Seniority 95% confidence interval Meaning: The more productive an author pair was, the more likely they collaborated. Combined Pubs per Year (Range: 0 - 16. 6) E. g. , a pair of authors who combined to publish 5 articles per year on average was 7% more likely to collaborate than a pair of authors who combined to publish 4 articles per year on average, all else equal. Meaning: Productive authors tended not to collaborate with unproductive authors. Difference in Pubs per Year (Range: 0 - 8. 3) E. g. , an author pair in which one author published 3 articles per year and the other published 2 articles per year was 2% less likely to collaborate than an author pair in which both authors published 2. 5 articles per year, all else equal. Meaning: The more experience an author pair had, the more likely they collaborated. Combined Years in Network (Range: 2 - 42) Meaning: Experienced authors tended not to collaborate with inexperienced authors. Difference in Years in Network (Range: 0 - 20) -1. 5 -1. 0 -0. 5 0. 0 Model Estimates (log odds) Less likely to collaborate 0. 5 1. 0 1. 5 More likely to Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 18
Summary - I • Publication productivity and seniority were associated with more collaboration (co-authorship) in general (sociality) – However, highly productive and high seniority authors preferred to collaborate with other highly productive and high seniority authors (assortative mixing) • PIs and Co-Investigators tended to be more collaborative in general than other researchers (sociality) – There was no evidence that PIs and Co-Investigators preferred to collaborate exclusively with one another (no assortative mixing) • Some fields of science were more collaborative in general than others (sociality) – Fields that were more peripheral in subject matter to CHS tended to be less collaborative Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 19
Summary - II • All fields of science preferred to collaborate within their own field, though to varying degrees (assortative mixing) – Fields that were more peripheral in subject matter to CHS were more likely to collaborate exclusively with others in their own field • NHLBI events designed to encourage collaboration were effective (sociality) – Invitees who attended workshops and who participated in NHLBI’s data sharing program tended to be more collaborative in general than similar authors who did not Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 20
Next Steps • Greater knowledge of author characteristics would enhance the ability of ERGMs to identify the main drivers of collaboration • This methodology could be fruitfully combined with information on which co-authorships had greater impact (e. g. , citation information) – ERGMs could estimate the factors associated with highly cited co-authorships – Results could suggest potential collaborating partners with a high likelihood of producing impactful publications – Knowledge gained could be integrated into design of new studies, building in or encouraging characteristics that would promote collaboration Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 21
Questions? Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 22
Supplemental Slides Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 23
Strong Heart Study Background • Strong Heart Study (SHS): Started in 1988 to estimate cardiovascular disease (CVD) mortality and morbidity, and prevalence of known and suspected CVD risk factors in American Indians – Includes 13 American Indian tribes and communities • Phoenix, Arizona • Southwestern Oklahoma • Western and central North and South Dakota – Required participants to be 45 -74 years old at entry – Includes questionnaires, clinic exams, laboratory exams, and ongoing participant follow-up to identify clinical events • Strong Heart Family Study launched in 1998, includes family members of original participants to add genetic risk factors • Largest multi-center epidemiologic study of American Indians Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 24
SHS Collaboration Estimates, by Publication Output and Network Seniority 95% confidence interval Meaning: The more prolific an author pair was, the more likely they collaborated. Combined Pubs per Year (Range: 0 - 14. 2) E. g. , a pair of authors who combined to publish 4 articles per year on average was 54% more likely to collaborate than a pair of authors who combined to publish 2 articles per year on average, all else equal. Meaning: Prolific authors tended not to collaborate with unprolific authors. Difference in Pubs per Year (Range: 0 - 7. 4) E. g. , an author pair in which one auhor published 3 articles per year and the other published 1 article per year was 27% less likely to collaborate than an author pair in which both authors published 2 articles per year, all else equal. Meaning: The more experience an author pair had between them, the more likely they collaborated. Combined Years in Network (Range: 2 - 44) Difference in Years in Network (Range: 0 - 21) -2. 00 Meaning: Experienced authors tended not to collaborate with inexperienced authors. -1. 50 -1. 00 -0. 50 0. 00 0. 50 Model Estimates (log odds) Less likely to collaborate 1. 00 1. 50 2. 00 More likely to Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 25
Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 26
Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: OERStats@mail. nih. gov 27
- Slides: 27