TIES Cancer Research Network Y 4 Face to

  • Slides: 62
Download presentation
TIES Cancer Research Network Y 4 Face to Face Meeting U 24 CA 180921

TIES Cancer Research Network Y 4 Face to Face Meeting U 24 CA 180921 November 4 th 2016 University of Pittsburgh, PA

WELCOME Meeting Goals, Summary of Y 3 accomplishments and Y 4 plans

WELCOME Meeting Goals, Summary of Y 3 accomplishments and Y 4 plans

Agenda 08: 00 – 08: 30 Goals for the meeting; Overall summary of Y

Agenda 08: 00 – 08: 30 Goals for the meeting; Overall summary of Y 3 accomplishments 08: 30 – 11: 00 Session 1 – TCRN Partner Sites Reports (Feldman, Bollag, Gaudioso, Reber, Schoenfeld) 11: 00 – 11: 15 Break 11: 15 – 11: 45 Session 2 – ITCR Grant Renewal Planning (Group) 11: 45 – 12: 30 Session 3 - Integrating TIES and TCRN with Cancer Registry (All) 12: 30 – 01: 30 Session 4 – Working Lunch: Scaling TCRN (Jacobson, Chavan) 01: 30 – 02: 00 Session 5 – Updates on Y 3 releases , Work underway (Chavan) 02: 00 – 02: 15 Break 02: 15 – 03: 45 Session 6 – Year 4 Development Plans (Tseytlin, Chavan and Jacobson)

Conflict of Interest • Jacobson, Tseytlin - Shareholder, Consultant to Nexi, Inc • Chavan

Conflict of Interest • Jacobson, Tseytlin - Shareholder, Consultant to Nexi, Inc • Chavan – Shareholder, Nexi, Inc

Goals for the Meeting 1. Review and discuss Y 3 work – – –

Goals for the Meeting 1. Review and discuss Y 3 work – – – 2. Discuss Y 4 work and agree on basics of project plan – 3. 4. 5. Where are we now and what is our long term goal Progress towards dissemination (users, institutions) Pilot projects and scientific efforts resulting from project We will refine these further and present something more formal soon after meeting Contribute to TIES development path to meet needs of institutions and investigators Planning for Y 4 manuscripts Planning for Grant Renewal

U 24 Specific Aims • Specific Aim 1. Enhance the informatics technology to support

U 24 Specific Aims • Specific Aim 1. Enhance the informatics technology to support interinstitutional “trust”, paraffin registry development, tissue microarray (TMA) development, and nondestructive tissue use. • Specific Aim 2. Establish the TIES Cancer Research Network (TCRN) with four founding member institutions. Develop governance, network agreements, and policies for operating the TCRN. • Specific Aim 3. Recruit and support pilot scientific collaborations across the network. • Specific Aim 4. Disseminate the software and measure its impact.

Major accomplishments in Year 3 • Cancer Research manuscript • First paper resulting directly

Major accomplishments in Year 3 • Cancer Research manuscript • First paper resulting directly from scientific effort using TCRN was accepted to The Breast Journal (Khoury, RPCI) • Addition of TJU to network • Further dissemination at all sites • Three major releases (5. 4, 5. 5, 5. 6) Manual annotation, beginnings of an analytic framework • Dual license model, with spinout Nexi, Inc

Overall Progress in Y 3 Complete Software releases (3) Auditing reports, LDAP, Manual Annotation

Overall Progress in Y 3 Complete Software releases (3) Auditing reports, LDAP, Manual Annotation All data loaded and coded at every site QA ongoing Pilot Projects Pending A little behind on pilot projects and dissemination Optional additional de-identifier

Overview of Y 1 – Y 3 Accomplishments: ‐ All sites have working systems,

Overview of Y 1 – Y 3 Accomplishments: ‐ All sites have working systems, actively coding new documents ‐ All sites have approved IRB protocols ‐ All sites have signed Network Agreement ‐ All sites have set up appropriate regulatory controls ‐ All sites have portals with forms for account request ‐ All sites disseminating to users ‐ Policies and recommendations have all been approved ‐ ‐ ‐ de-identification QA, approval bodies, governance, verifying eligibility, study registration, auditing of users, incident reporting, joining of new members Adoption and Deployment Blueprints available Increased downloads of TIES Foundation for better social media and outreach (e. g. Insightly) Successful pilot projects

TIES Downloads ITCR

TIES Downloads ITCR

TIES 5. 4, 5. 5, and 5. 6 releases v 5. 4 – 11/11/2015

TIES 5. 4, 5. 5, and 5. 6 releases v 5. 4 – 11/11/2015 - Improved query builder to support structured data search. - Query Activity Log report generation and auditing functions. v 5. 5 – 03/23/2016 - Add LDAP support to TIES - Adding features to Regulatory Administrator audit v 5. 6 – 09/19/2016 - Add and export manual annotations to case sets - Better an more versatile export capabilities

All TCGA clinical data in TIES Demo TCGA in TIES DEMO: https: //youtu. be/IWay.

All TCGA clinical data in TIES Demo TCGA in TIES DEMO: https: //youtu. be/IWay. Y-gq. SAc

Annotation workflows TIES Manual Annotation TIES NLP Engine TIES Datastore Specialized IE Engines &

Annotation workflows TIES Manual Annotation TIES NLP Engine TIES Datastore Specialized IE Engines & Other Data consumers TIES Search and Cohort Development

Nexi, Inc. • University of Pittsburgh exclusively licensed TIES and NOBLE coder Software to

Nexi, Inc. • University of Pittsburgh exclusively licensed TIES and NOBLE coder Software to Nexi in June, 2016 • License explicitly enables continued open source software for NFP, bidirectional flow of code • Founders: Ed Engler (CEO), Rebecca Jacobson (Chair SAB), Girish Chavan, Eugene Tseytlin • Nexi will ü license code to commercial entities ü develop new functionality that will not be owned by Pitt, and will customize to meet client needs ü offer support packages for sites that want to deploy TIES ü create other networks • Currently developing customers Ed Engler, Nexi CEO http: //nexihub. com/

Y 4 will be another pivotal year of this grant… New TCRN members Multiple

Y 4 will be another pivotal year of this grant… New TCRN members Multiple new pilot projects with process adjustments; Open TCRN to investigators at all sites Y 2: Engaging Researchers Major new functionality Y 3: Growing the network

Plans for Y 4 Work to complete Help with LIMS Integration Cancer Registry Integration

Plans for Y 4 Work to complete Help with LIMS Integration Cancer Registry Integration New Noble. Coder Integration Image Annotation Tools Analytic framework Optional additional de-identifier Other goals Publications from users Dissemination at all sites Additional adoptions at other institutions

SESSION 1 TCRN Partner Sites Reports

SESSION 1 TCRN Partner Sites Reports

SESSION 2 ITCR Grant Renewal Planning

SESSION 2 ITCR Grant Renewal Planning

U 24 Grant Funding • Current funding period ends on 7/31/18 • ITCR grants

U 24 Grant Funding • Current funding period ends on 7/31/18 • ITCR grants are reviewed in two cycles. Due dates are – June 14 th 2017, November 20 th 2017, June 14 th 2018 • Our best chance to avoid a funding gap is probably June 14 th, 2017 – Peer review December 2017 – Council review January 2018 – Funding announcement earliest possible, Summer 2018

ITCR Possible Mechanisms Enhancement and Dissemination (U 24) PAR-15 -331: Advanced Development of Informatics

ITCR Possible Mechanisms Enhancement and Dissemination (U 24) PAR-15 -331: Advanced Development of Informatics Technologies for Cancer Research and Management This FOA supports the advanced development and enhancement of emerging informatics technologies to improve the acquisition, management, analysis, and dissemination of data and knowledge in support of cancer research. Sustainment (U 24) PAR-15 -333: Sustained Support for Informatics Resources for Cancer Research and Management This FOA supports the continued development and sustainment of high-value informatics research resources to serve current and emerging needs across the cancer research continuum.

Advanced Development PAR 15 -331 Sustained Support PAR 15 -331 Purpose emerging informatics technology,

Advanced Development PAR 15 -331 Sustained Support PAR 15 -331 Purpose emerging informatics technology, defined as one that has passed the initial prototyping and pilot development stage, not been widely adopted in the cancer research field improved user experience and availability of existing, widely-adopted informatics tools and resources. proposed sustainment plan must provide clear justifications for why the research resource should be maintained and how it has benefited and will continue to benefit the cancer research field. Budget The amount of requested budget my not exceed $600, 000 Direct Costs (excluding consortium F&A costs) per year. Application budgets are not limited but need to reflect the actual needs of the proposed project. Applicants requesting $500, 000 or more in direct costs in any year (excluding consortium F&A) must contact a Scientific/ Research Contact at least 6 weeks before submitting the application and follow the Policy on the Acceptance for Review of Unsolicited Applications that Request $500, 000 or More in Direct Costs as described in the SF 424 (R&R) Application Guide.

Which mechanism? • Maintaining both the software and the network takes resources. Adding new

Which mechanism? • Maintaining both the software and the network takes resources. Adding new nodes takes resources at the local sites • There are still many enhancements we want to make to support the community • Potential for including development in Sustained Support mechanism

What do we need to do now to increase chance of success? Expand number

What do we need to do now to increase chance of success? Expand number of sites using software locally Expand number of users who are part of TCRN Expand number of users at local sites Increase number of studies using the network Multicenter publications that could not be done without the network q Integration with other data sources q Show that a national network is possible. How? q Focus more on specific type of network…Pathomics? Rad-Path? q q q

Timeline and collaboration By end of December: • Establish PIs, select mechanism strategy •

Timeline and collaboration By end of December: • Establish PIs, select mechanism strategy • Determine sites to be added and approach them • Position TCRN within the larger NCI landscape • Position TCRN within the larger Cancer Research landscape • Define major (new) goals January – March: • Specific Aims and Executive Summary • Outline approach, deliverables, collaborators scope of work March - June: • Grant writing • Letters of Support • Budgets, Budget justifications, and Biosketches

SESSION 3 Integrating TIES and TCRN with Cancer Registry

SESSION 3 Integrating TIES and TCRN with Cancer Registry

Adding Cancer Registry Data to TIES • Identified as a high value development target

Adding Cancer Registry Data to TIES • Identified as a high value development target from last years F 2 F meeting • We have secured additional funding from or Institution for Precision Medicine to make this happen here • Senior Developer Mike Davis will be leading effort. Cancer Registry staff participating. Hiring two new staff members. • Project kickoff scheduled for Nov 8 th • Starting with Breast Cancer first • Work that we do here can immediately be leveraged by all of you to similarly add CR data to your TIES instances

28

28

Your Use Cases • How do you envision investigators at your institution using combinations

Your Use Cases • How do you envision investigators at your institution using combinations of report data and Cancer Registry data through TIES? • Can you provide an example of one or more queries (real or imagined) in which a user would selects a cohort using Cancer Registry AND text data? • Assuming that we make some subset of Cancer Registry data available, should this de-identified be available for download by researchers? • What interval should be used to update CR data? Would your institution be able to run the required scripts at some regular interval? • Do you anticipate any regulatory challenges? Political challenges? Are there obvious solutions to those challenges?

Data Elements Demographics Primary Treatment Outcome Race Primary Site Surgery Vital Status Gender Histology

Data Elements Demographics Primary Treatment Outcome Race Primary Site Surgery Vital Status Gender Histology Chemotherapy Cancer Status Age @ Diagnosis Grade BRM Recurrence Smoking Path TNM Hormonal Cause of Death Alcohol Clinical TNM Immunotherapy Prognostic Factors (including site specific) Rad Onc

Regulatory Issues • Modify IRB protocol at each site (Pitt has already done this)

Regulatory Issues • Modify IRB protocol at each site (Pitt has already done this) • Discuss Network Agreement with Pitt lawyers – I do not think need it will need to be changed • Policies and processes – New process for assuring that no PHI is accidentally added to TIES • Dates and doctor names in text • Improper mapping of fields – Discussion with regulatory experts; how much information can we provide on sequences

TCRN Adoption Process • Please plan to spend some of next years budget to

TCRN Adoption Process • Please plan to spend some of next years budget to integrate your CR data • Develop buy in, work with your Cancer Registries now – Cancer Registry Directors, Cancer Center Directors, PM Directors • Seek modification to your IRB soon. Pitt will investigate changes needed • Provide flexibility to sites, we may not be adding the exact same fields • TCRN Policy and Process group can act as the first pass for fields to be added – Value to your researchers – High quality of your data • TCRN Exec Committee can set standard for MDS, help identify next cancers to be added

SESSION 4 Scaling TCRN to the Next Level

SESSION 4 Scaling TCRN to the Next Level

Where are the process pain points in TCRN right now? • • • Confusion

Where are the process pain points in TCRN right now? • • • Confusion in the steps of the approval process. Where do you see problems? Problems in the auditing process? Provisioning users? Any scale issues? Adding additional data - Cancer Registry Data, Specimen data Access to aggregate results requires same approval process Need for IRB approvals at each site for tissue No central site for curating TCRN information, processes, policies TCRN members don’t know where to email/ask for help No central site for getting technical help (other than developers) Managing the onboarding process and administrative setup Getting out to potential users Lack of an OSS De-identifier

More pain points…

More pain points…

Social Media Strategy • • Blog – catchy google searchable titles (eventually get guest

Social Media Strategy • • Blog – catchy google searchable titles (eventually get guest posts) Create SEO Keywords Create XML sitemap Use Google Analytics Tweet regularly Post on Linked. In regularly Send out press releases for new versions of TIES

Email Based Account Approval • Continue to use Wordpress Account Request Forms for their

Email Based Account Approval • Continue to use Wordpress Account Request Forms for their flexibility in form flow design. • Reviewers record their decision through buttons/links in the account request email. • Monitor ties@pitt. edu email account for new account requests and responses. • Create accounts in TIES based on account request. Account is in Pending Review status. User cannot yet access TIES. • Monitor reviewer decisions, once all reviewers approve, change account status to Approved.

SESSION 5 TIES Releases

SESSION 5 TIES Releases

v 5. 4 – Released November 2015 Structured Data Search Auditing Support

v 5. 4 – Released November 2015 Structured Data Search Auditing Support

v 5. 5 – Released March 2016 Live Results Chart Pop-out Reports De-identified ID

v 5. 5 – Released March 2016 Live Results Chart Pop-out Reports De-identified ID Search Export to Excel

v 5. 5 – Released March 2016 Easy to see Section Headers Node Settings

v 5. 5 – Released March 2016 Easy to see Section Headers Node Settings and LDAP Support

v 5. 6 – Released September 2016 Manual Annotation Improved Login Dialog

v 5. 6 – Released September 2016 Manual Annotation Improved Login Dialog

Manual Annotation Tool • • • Allows you to manually enter structured data associated

Manual Annotation Tool • • • Allows you to manually enter structured data associated with case sets. Eliminates the need to store it in a separate spreadsheet as the expert reviews the reports. Data organized by forms and fields. Forms are study specific and can be shared with other study members or made public. Fields can be of Text, Number, Boolean and Category data types. Data is exported to Excel with each field stored in a separate column and a row for each report. Access the tool under the My Case Sets tab. Click Annotate from the Available Tasks menu under the name of the Case Set.

Manual Annotation Tool

Manual Annotation Tool

SESSION 6 Year 4 Development Plans

SESSION 6 Year 4 Development Plans

Annotation workflows TIES Manual Annotation TIES NLP Engine TIES Datastore Specialized IE Engines &

Annotation workflows TIES Manual Annotation TIES NLP Engine TIES Datastore Specialized IE Engines & Other Data consumers TIES Search and Cohort Development

Analytics Use Cases Whole Slide Image (WSI) nucleus segmentation algorithm morphex report feature extraction

Analytics Use Cases Whole Slide Image (WSI) nucleus segmentation algorithm morphex report feature extraction to correlate with above analysis BIRADS category extractor for radiology reports BIRADS pathology report classifier (coming soon)

Analytics Framework TIES Analytics App Store Docker Image Case Set TIES DATA CENTER TIES

Analytics Framework TIES Analytics App Store Docker Image Case Set TIES DATA CENTER TIES Analytical Server Docker Image Docker Image TIES Data

Why Docker for analytics ? No complex installation of 3 rd party software and

Why Docker for analytics ? No complex installation of 3 rd party software and dependencies. All code is self-contained inside an image No need to mandate a single technology stack on algorithm developers Lightweight compared to VM Seems to be a de-facto standard for distribution of complex software. INPUT: /input/text – for text data /input/images – for WSI imaging data /input/data – for delimited structured data All filenames are constructed of <patient id>. <document id>. (txt|tsv|svs) OUTPUT: /output – any output can be put here. If any delimited file (. tsv, . bsv, . csv) with first column containing filenames of input files will be imported back into TIES Any other output file will be zipped up and available for download to the user who launched the analysis.

Correlating Radiology and Pathology information Right Breast BIRADS 4 Suspicious abnormality. Biopsy is recommended

Correlating Radiology and Pathology information Right Breast BIRADS 4 Suspicious abnormality. Biopsy is recommended ? Breast, Right, excision: Atypical Ductal Hyperplasia and Radial Scar

BIRADS extraction

BIRADS extraction

IMPRESSION: OVERALL BI-RADS Category 1. Negative mammogram. BI-RADS laterality classification BI-RADS category annotation 2

IMPRESSION: OVERALL BI-RADS Category 1. Negative mammogram. BI-RADS laterality classification BI-RADS category annotation 2 nd 1 st NB, SVM, DT CRF Model Entity Definition Left BI-RADS category assigned to the left breast Right BI-RADS category assigned to the right breast Bilateral BI-RADS category assigned to both breasts Overall BI-RADS Corresponds to the most abnormal BI‑RADS of the two breasts, based on the highest likelihood of malignancy. It is usually found after the detailed description of the BI‑RADS category for each breast. In some cases, it is the only BI‑RADS class in the report.

Corpus Development and Inter Annotator Agreement (IAA)

Corpus Development and Inter Annotator Agreement (IAA)

Gold Corpus Metrics Corpus Split Total Word Total Number Total Docs Tokens Total Gold

Gold Corpus Metrics Corpus Split Total Word Total Number Total Docs Tokens Total Gold BIRADS Tokens BI-RADS 0 BI-RADS 1 BI-RADS 2 BI-RADS 3 BI-RADS 4 BI-RADS 5 BI-RADS 6 Training 368 105333 7588 608 83 79 129 100 90 77 50 Devel 58 16311 1189 101 12 15 16 32 14 7 5 Test 173 55711 4077 305 48 34 67 50 46 36 24 Total 599 177355 12854 1014 143 128 212 182 150 120 79

BIRADS Token Annotation Results Corpus Split Features Recall Precision F 1 Accuracy Development Section,

BIRADS Token Annotation Results Corpus Split Features Recall Precision F 1 Accuracy Development Section, Token Type, Context Token 0. 92 0. 83 0. 87 0. 99 Development Section, Token Type, Context Token, Anchor, Time 0. 93 0. 95 0. 96 0. 99 Development 250 Section, Token Type, Context Token, Anchor, Time 0. 95 0. 99 0. 97 0. 99 Test Section, Token Type, Context Token, Anchor, Time 0. 93 0. 98 0. 95 0. 99

BIRADS classification results NB SVM PART 0 -6 3 -5 Recall 0. 83 0.

BIRADS classification results NB SVM PART 0 -6 3 -5 Recall 0. 83 0. 88 0. 89 0. 90 0. 91 0. 93 Bag-of-word (Bo. W) by line Total number of BI-RADS token annotations BI-RADS category BI-RADS sequence Precision 0. 84 0. 88 0. 90 0. 91 0. 93 F 1 0. 83 0. 87 0. 89 0. 90 0. 91 0. 93 Accuracy 0. 91 0. 93 0. 94 0. 95 0. 96 Imaging Study type Laterality Breast(s) studied Laterality word token counts

Pathology Report Classification

Pathology Report Classification

Pathology Report Classification

Pathology Report Classification

BIRADS Extraction UIMA pipeline c. TAKES to extract token level features CRF classifier from

BIRADS Extraction UIMA pipeline c. TAKES to extract token level features CRF classifier from Mallet to identify BIRADS category number Weka PART classifier to classify BIRADS laterality Docker image to wrap the pipeline INPUT /input/text – location of text reports with filenames matching de-identified ids. OUTPUT /output – location of output annotations as tab delimited files

Future Development Ideas • New Coding Pipeline – Integrate Noble. Coder v 1. 1.

Future Development Ideas • New Coding Pipeline – Integrate Noble. Coder v 1. 1. More accurate coding, faster coding. Uncertainty, polarity, experiencer and temporality annotations. – Latest NCIM terminology with more fine tuned sources. • • Cancer Registry data integration Email based management of account review and approvals. Patient level search index and visualization Manual Annotation Tool Enhancements – Link report text annotations to data in form fields. – Intelligent auto-highlighting and filling of form fields. – Library of forms to choose from, making it easy to share and reuse previously created forms.

Feature AU PENN RPCI SB TJU

Feature AU PENN RPCI SB TJU