Classification of hepatocellular carcinoma stages from freetext clinical

  • Slides: 72
Download presentation
Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports Session Title: Extracting

Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports Session Title: Extracting and Classifying Knowledge from Text Session Number: S 92 Wen-wai Yim, Ph. D 1, 2, Sharon Kwan, MD 3, Guy Johnson MD, Pharm. D 3, Meliha Yetisgen, Ph. D 4, 5 1 Veterans Affairs Palo Alto Health Care System 2 Biomedical Informatics, Stanford University 3 Radiology, University of Washington 4 Biomedical and Health Informatics, University of Washington 5 Linguistics, University of Washington

Disclosure The authors have no relevant relationships with commercial interests to disclose. 4

Disclosure The authors have no relevant relationships with commercial interests to disclose. 4

Hepatocellular carcinoma (HCC) Deadly disease • 5 year survival from all stages <20% Liver

Hepatocellular carcinoma (HCC) Deadly disease • 5 year survival from all stages <20% Liver disease: (Child-Pugh B) 4 tumors, both 3 cm No spread regional/distal spread No vascular invasion Heterogeneous disease/patients • Comorbidites big factor in survival Heterogenous treatments • Complex treatment algorithms • Combinatorial treatments • Sequential treatments Treatment recommendation per guideline 90 Yttritium Alberta University Transcatheter Arterial Chemoembolization (TACE) Barcelona Cancer of Liver Clinic TACE Sorafenib Fudan University TACE, HAIC, Resection, RFA Japan Society 5

CC: Xxxxx History: Xxxxxxxxxxx History: Xxxxxxx Med: xxxxxxx Impression: xxxxx History: Xxxxx CC: Xxxxxx

CC: Xxxxx History: Xxxxxxxxxxx History: Xxxxxxx Med: xxxxxxx Impression: xxxxx History: Xxxxx CC: Xxxxxx Impression: Xxxxxxxx xx History: Xxxxxxx Impression: xxxxx CC: Xxx History: Xxxxxxx Impression: Xxxxxxx CC: Xxxxx History: Xxxxxx Impression: xxxxxx History: Xxxxxx xxx Med: xxxxxx Impression: xxxxx CC: Xxxxxx History: xxxxx Medication: xx Impression: xxxxx For all the patients that are similar to me: What treatments did they receive? What was the outcome? 6

Cancer stages Measures similarity of cancer extent • How many tumors? Tumor size/numbe r

Cancer stages Measures similarity of cancer extent • How many tumors? Tumor size/numbe r • Has it invaded nearby organs? Liver cancer stages take into account liver function Tumo r% Metastasi s AJCC X X BCLC X X CLIP X • Normal levels of Albumin? • Sign of ascites? X ECOG Child. Pugh Portal Hypert X X X AJCC – American Joint Committee on Cancer BCLC – Barcelona Clinic Liver Cancer CLIP - Cancer of the Liver Italian Program > 6 liver cancer staging systems 7

Why need natural language processing (NLP) Stages not recorded in structured or unstructured form

Why need natural language processing (NLP) Stages not recorded in structured or unstructured form • 70% completely stage in one 3 year study • Some estimates of up to 30% inaccurate Debate on how to classify stages • >6 classification systems for HCC • Many ways of quantifying liver function / cancer extent NLP benefits: • Expedite chart review • Re-train if stages change • Access historical information 8

Related work 9

Related work 9

Previous work Cancer stage prediction Document classification Nguyen et al. Multi-classification of cancer stages

Previous work Cancer stage prediction Document classification Nguyen et al. Multi-classification of cancer stages from free-text histology reports using support vector machines. IEEE 2007. Sentence classification > document classification Cancer characteristic information extraction Mc. Cowan et al. Collection of cancer stage data by classifying free-text medical reports. JAMIA 2007. Martinez and Li. Information extraction from pathology reports in a hospital setting. Proceedings of the 20 th ACM. ICIKM 2011. Rule-based patient classification Nguyen et al. Symbolic rule-based classification of lung cancer stages from free-text pathology reports. JAMIA 2010. 10

Previous work Cancer stage prediction Cancer characteristic information extraction Dictionary / Regular Expression Methods

Previous work Cancer stage prediction Cancer characteristic information extraction Dictionary / Regular Expression Methods Ping et al. Information extraction for tracking liver cancer patients statuses: from mixture of clinical narrative report types. Telemedicine Journal and E-Health 2013. Document Classification Methods Kavuluru et al. Automatic extraction of icd-o-3 primary sites from cancer pathology reports. AMIA jt Summits Transl Sci Proc 2007. Sequential labeling Methods Ou and Patrick. Automatic Population of Structured Reports from Narrative Pathology Reports types. Proceedings of the Seventh HIKM 2014. Wang et al. Extracting important information from chinese operation notes with natural language processing methods types. JBI 2014. 11

This work OUTPUT: INPUT: CC: Xxxxx History: Xxxxxxxxxxxxxxxxx Med: xxxxxxx Impression: AFP Bilir Alb

This work OUTPUT: INPUT: CC: Xxxxx History: Xxxxxxxxxxxxxxxxx Med: xxxxxxx Impression: AFP Bilir Alb 700 3. 5 1. 1 500 4. 5 6. 3 xxxxx 11 STAGE PARAMETERS 3 LIVER CANCER STAGES Calculate cancer stage • Information extraction + aggregate to patient level + use stage logic • Compare to shallow NLP in supervised prediction task 12

Dataset 13

Dataset 13

Corpus description Patient inclusion criteria • University of Washington Medical Center primary liver cancer

Corpus description Patient inclusion criteria • University of Washington Medical Center primary liver cancer clinic 1/2011 -12/2013 • At least one clinical report, radiology report, full set of labs (4) needed for staging • Clinic notes: day of visit, Radiology notes: (-3, 1) months, Labs: (-30, 30) days Annotation • Subdocument • Stage parameter values • Patient level • Stage parameter values (Yim et al, EMNLP 2015, In-depth annotation for patient level liver cancer staging) 14

Corpus description Patient inclusion criteria University of Washington Medical Center primary liver cancer clinic

Corpus description Patient inclusion criteria University of Washington Medical Center primary liver cancer clinic 1/2011 -12/2013 At least one clinical report, radiology report, full set of labs (4) needed for staging Annotation • Subdocument • Stage parameter values • Patient level • Stage parameter values (Yim et al, EMNLP 2015, In-depth annotation for patient level liver cancer staging) 15

Corpus description Patient inclusion criteria University of Washington Medical Center primary liver cancer clinic

Corpus description Patient inclusion criteria University of Washington Medical Center primary liver cancer clinic 1/2011 -12/2013 At least one clinical report, radiology report, full set of labs (4) needed for staging Annotation • Subdocument • Stage parameter values • Patient level • Stage parameter values (Yim et al, EMNLP 2015, In-depth annotation for patient level liver cancer staging) 16

Corpus description Patient inclusion criteria University of Washington Medical Center primary liver cancer clinic

Corpus description Patient inclusion criteria University of Washington Medical Center primary liver cancer clinic 1/2011 -12/2013 At least one clinical report, radiology report, full set of labs (4) needed for staging Annotation • Subdocument • Stage parameter values • Patient level • Stage parameter values (Yim et al, EMNLP 2015, In-depth annotation for patient level liver cancer staging) 17

Extraction challenges Within-Sentence Negation, Drugs, Abbreviations Within-Sentence Other inference required Multiple-Sentence Summarize multiple ideas

Extraction challenges Within-Sentence Negation, Drugs, Abbreviations Within-Sentence Other inference required Multiple-Sentence Summarize multiple ideas “He has never had diagnosis of ascites Ascites – none “lactulose” Hepatic encephalopathy - mild “CTP-A 6 cirrhosis” Child. Pugh - A 18

Extraction challenges “he has no known liver disease” Within-Sentence Negation, Drugs, Abbreviations Ascites –

Extraction challenges “he has no known liver disease” Within-Sentence Negation, Drugs, Abbreviations Ascites – none Hepatic encephalopathy - none “Moderate to severe splenomegaly” Within-Sentence Other inference required Multiple-Sentence Summarize multiple ideas Portal hypertension - yes “There is thrombus in the right posterior branch of the portal vein [. . . ] possibly [. . . ] tumor thrombus” Macrovascular invasion - minor branch “There is enhancing tumor thrombus in the right portal vein” Macrovascular invasion - major branch 19

Extraction challenges Within-Sentence Negation, Drugs, Abbreviations Within-Sentence Other inference required Multiple-Sentence Summarize multiple ideas

Extraction challenges Within-Sentence Negation, Drugs, Abbreviations Within-Sentence Other inference required Multiple-Sentence Summarize multiple ideas 27: Focal lesions: 28: Total number: 4: 29: Lesion 1: segment 8, 2. 2 x 2. 0 cm , image 4/41, hypervascular with washout on veinous phase - HCC 30: Lesion 2: segment 5, 0. 7 cm , image 4/49, hypervascular with no washout. 31: Lesion 3: segment 4 A, 0. 5 cm , image 4/24, hypervascular with no washout. 32: Lesion 4: Segment 5, 0. 6 cm, image 4/56, hypervascular with no washout. …………………… 37: Impression: 38: 1 focal lesion in segment 8, highly suggestive of HCC. 39: Technically limited study for the characterization of HCC due to the absence of the delayed phase. 40: 3 indeterminate focal lesions in segment 4 a and 5. 20

Experimental setup Document level evaluation of stage parameters • Sub-document classifications • Baseline: Compare

Experimental setup Document level evaluation of stage parameters • Sub-document classifications • Baseline: Compare to a documentlevel classification (max-ent) • 1 -, 2 -, 3 - gram with frequencies Patient level evaluation • Stage parameter • Cancer stages Dataset split • Training 160 training, 40 test AJCC BCLC CLIP I 108 A 1 27 0 66 II 48 A 2 21 1 62 IIIA 16 A 3 13 2 41 IIIB 14 A 4 17 3 18 IIIC 0 B 23 4 8 IVA 6 C 70 5 4 IVB 7 D 14 6 0 199 185 199 Total: 200 patients 545 documents (303 clinic, 242 radiology) Text annotations: 2108 total • Training 5 -fold cross-validation 21

System description 22

System description 22

System description AJCC STAGE BCLC STAGE CHILDPUGH Laboratory values x AFP: Albumin: Bilirubin: Prothrombin:

System description AJCC STAGE BCLC STAGE CHILDPUGH Laboratory values x AFP: Albumin: Bilirubin: Prothrombin: CLIP STAGE CLASSIFICATION SYSTEMS ECOG x x HEPATIC ENCEPHALOPATHY ASCITES x STAGE PARAMETERS PORTAL HYPERTENSION EXTRAHEPATIC INVASION METASTASIS MACROVASCULAR INVASION TUMOR NUMBER TUMOR MORPHOLOGY TUMOR SIZE 23

System description Rule-based classification AJCC STAGE CHILDPUGH Laboratory values Regular expression ECOG x AFP:

System description Rule-based classification AJCC STAGE CHILDPUGH Laboratory values Regular expression ECOG x AFP: Albumin: Bilirubin: Prothrombin: CLIP STAGE BCLC STAGE x x x HEPATIC ENCEPHALOPATHY ASCITES PORTAL HYPERTENSION Sentence classification EXTRAHEPATIC INVASION MACROVASCULAR INVASION METASTASIS TUMOR NUMBER Radiology report structuring TUMOR MORPHOLOGY TUMOR SIZE 24

System description Rule-based classification AJCC STAGE CHILDPUGH Laboratory values Regular expression ECOG x AFP:

System description Rule-based classification AJCC STAGE CHILDPUGH Laboratory values Regular expression ECOG x AFP: Albumin: Bilirubin: Prothrombin: CLIP STAGE BCLC STAGE x x x HEPATIC ENCEPHALOPATHY ASCITES PORTAL HYPERTENSION Sentence classification EXTRAHEPATIC INVASION MACROVASCULAR INVASION METASTASIS TUMOR NUMBER Radiology report structuring TUMOR MORPHOLOGY TUMOR SIZE 25

CHILD-PUGH “Mr. Xxxxxxx is a 65 year-old man with CPA(6) HCV-related cirrhosis” “He is

CHILD-PUGH “Mr. Xxxxxxx is a 65 year-old man with CPA(6) HCV-related cirrhosis” “He is Child class A” “CTP-A 6 cirrhosis” “CTP score was 5” “Childs-Pugh A” “Child A” P R F 1 Training_baseline: 0. 55 0. 51 0. 53 Training_system: 0. 86 0. 95 0. 91 Test_system: 0. 87 0. 84 0. 85 ECOG “(ECOG performance status): (0)” “ECOG = 0” “ECOG status would be considered to be 0” “ECOG score of 0 to 1” “ECOG status zero” “ECOG is 0” P F 1 R Training_baseline: 0. 75 0. 61 0. 67 Training_system: 0. 98 0. 69 0. 81 Test_system: 0. 97 0. 61 0. 75 non-explicit mentions missed: “He is cachetic. He is deconditioned and needs a wheelchair to walk greater than 10 feet” 26

System description Rule-based classification AJCC STAGE Laboratory values Regular expression CHILDPUGH x AFP: Albumin:

System description Rule-based classification AJCC STAGE Laboratory values Regular expression CHILDPUGH x AFP: Albumin: Bilirubin: Prothrombin: CLIP STAGE BCLC STAGE ECOG x x x HEPATIC ENCEPHALOPATHY ASCITES PORTAL HYPERTENSION Sentence classification EXTRAHEPATIC INVASION MACROVASCULAR INVASION METASTASIS TUMOR NUMBER Radiology report structuring TUMOR MORPHOLOGY TUMOR SIZE 27

Sentence classification: ranked n-gram + UMLS concept features and assertion classification Ascites • E.

Sentence classification: ranked n-gram + UMLS concept features and assertion classification Ascites • E. g. chi-squared, t-test, pmi 1 -gram None Mild/Suppressed Moderate/Severe 2 -gram 3 -gram assertion-UMLS Ascites No Girth Free fluid Prior encephalopathy Any sequela Or peritoneal Denies increase Without ascites Problems with ascites Denies any issues Pt denies symptoms Without ascites or cirrhosis including no Present-ascites Absent-edemaoflowerextremity Absent-complicated Absent-hepaticencephalopathy Refractory Perihepatic Tipss Pericolic ascites Resolved after Prior varices Ascites after Instances of Cirrhosis appears Moderate perihepatic lasix and spironolactone Encephalopathy in the Of trace ascites Controlled on medications had refractory ascites Present-ascites Absenttransjugularintrahepaticportosystemicshuntproc edure Present-refractoryascites Present-mild(qualifiervalue) Possible-historyofpreviousevents Ascitic Ascites Abdominal Irretractable Moderate High volume Progressive distension Large volume Volume abdominal Ascites secondary Moderate volume Volume paracenteses had large Refractory ascites with Paracentesis x 3 since Volume abdominal paracentesis His refactory ascites Absent-intraperitoneal Present-ascites Present-volume absent-moderate Present-refractoryascites 28

Sentence classification approach (Document level evaluation) Training set (5 -fold cross validation) Testing set

Sentence classification approach (Document level evaluation) Training set (5 -fold cross validation) Testing set Label TP P R F 1 Baseline F 1 TP P R F 1 Ascites 187 0. 52 0. 89 0. 66 0. 41 52 0. 58 1. 00 0. 73 Extrahepatic invasion 53 0. 90 0. 87 0. 88 0. 81 6 0. 86 0. 55 0. 67 Hepatic encephalopathy 111 0. 63 0. 85 0. 73 0. 72 26 0. 63 0. 76 0. 69 Macrovascular invasion 145 0. 72 0. 94 0. 81 0. 78 33 0. 80 0. 81 Metastasis 103 0. 74 0. 85 0. 79 0. 69 25 0. 81 0. 86 0. 83 Portal hypertension 83 0. 87 0. 93 0. 90 0. 78 27 0. 84 0. 93 0. 89 (Yim et al, EMNLP 2015, In-depth annotation for patient level liver cancer staging) 29

System description Rule-based classification AJCC STAGE CHILDPUGH Laboratory values x AFP: Albumin: Bilirubin: Prothrombin:

System description Rule-based classification AJCC STAGE CHILDPUGH Laboratory values x AFP: Albumin: Bilirubin: Prothrombin: CLIP STAGE BCLC STAGE Regular expression ECOG x x x HEPATIC ENCEPHALOPATHY ASCITES PORTAL HYPERTENSION Sentence classification EXTRAHEPATIC INVASION MACROVASCULAR INVASION METASTASIS TUMOR NUMBER Radiology report structuring TUMOR MORPHOLOGY TUMOR SIZE 30

Tumor characteristics pipeline Processing CC: Xxxxx History: Xxxxxxxxxxx Xxxxxxx Med: xxxxxx Impression: xxxxxxx Example

Tumor characteristics pipeline Processing CC: Xxxxx History: Xxxxxxxxxxx Xxxxxxx Med: xxxxxx Impression: xxxxxxx Example Lesion 2 segment 7 Lesion. Location: 1 Size: segment 2. 3 cm 3 Location: Size: 3. 6 cm 1 Template extraction 2 Reference resolution “Lesion 1” <-> “segment 3 lesion” 3 Rule-based heuristics Rule: Get the MAXIMUM of all sizes for malignant tumors Impression: xxxxx “Lesion 2” <-> “segment 7 lesion” Tumor number, Tumor size, and Tumor morphology 31

1 Template extraction Identify “Findings” and “Impressions” CC/HISTORY OF PRESENT ILLNESS xxxxxxxxxxxxx Rule-based sentence

1 Template extraction Identify “Findings” and “Impressions” CC/HISTORY OF PRESENT ILLNESS xxxxxxxxxxxxx Rule-based sentence identifications MEDICATIONS xxxx 10 ml x 3 days xxxx 3 tablets, twice a day CRF labeling of important elements (0. 87 F 1) Relation classification (0. 89 gold, 0. 74 system F 1) FINDINGS xxxx Lesion 1: 1. 0 x 2. 3 cm, segment VII xxxxx IMPRESSIONS There are 2 lesions, suspicious for HCC xxxxxxxxxxxx (Yim et al, AMIA JT Summits 2016, Tumor information extraction in radiology reports for hepatocellular carcinoma patients) 32

2 Reference resolution Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6

2 Reference resolution Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6 x 2. 3 cm hypervascular with washout on delayed imaging, image 3/32 and 7/33. Lesion 2: segment 4 A/B, 0. 8 cm hypodense on all phases, image 7/32 Lesion 3: Segment 7, 2. 3 cm … Impression: Agree with outside report: 3 focal lesions: Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. Classification • Top-to-bottom • Greedy (Yim et al, Journal of Biomedical Informatics, Tumor reference resolution and characteristic extraction in radiology reports for liver 33 cancer stage prediction. 2016)

2 Reference resolution Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6

2 Reference resolution Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6 x 2. 3 cm hypervascular with washout on delayed imaging, image 3/32 and 7/33. Lesion 2: segment 4 A/B, 0. 8 cm hypodense on all phases, image 7/32 Lesion 3: Segment 7, 2. 3 cm … Impression: Agree with outside report: 3 focal lesions: Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. Classification • Top-to-bottom • Greedy Equivalence Avg(MUC, B 3, CEAF) 0. 66 FI Particularization 0. 43 F 1 (Yim et al, Journal of Biomedical Informatics, Tumor reference resolution and characteristic extraction in radiology reports for liver 34 cancer stage prediction. 2016)

3 Rule based logic F 1 Tumor 1 “mass” [malignant] Segment VII Tumor 2

3 Rule based logic F 1 Tumor 1 “mass” [malignant] Segment VII Tumor 2 “Lesion 1” [indet] Segment 4 ……. . + references Malignancy statuses changed according to references - Malignant Indeterminant Benign Unknown Logic >10 cm? “invasive” tumor Max(*) Reference consolidator Sum(*) >50%? 0. 94 Largest size? 0. 93 Tumor counts 0. 69 //assuming gold input templates + reference resolution (Yim et al, Journal of Biomedical Informatics, Tumor reference resolution and characteristic extraction in radiology reports for liver 35 cancer stage prediction. 2016)

Tumor characteristics stage parameters (Document level evaluation) Training set (5 -fold cross validation) Testing

Tumor characteristics stage parameters (Document level evaluation) Training set (5 -fold cross validation) Testing set Label TP P R F 1 Baseline F 1 TP P R F 1 Tumor morphology 114 0. 65 0. 68 0. 66 0. 58 25 0. 63 0. 60 0. 61 Tumor number 102 0. 58 0. 63 0. 60 23 0. 58 0. 56 0. 57 Tumor size 129 0. 83 0. 75 0. 79 0. 50 32 0. 86 0. 74 0. 80 (Yim et al, Journal of Biomedical Informatics, Tumor reference resolution and characteristic extraction in radiology reports for liver 36 cancer stage prediction. 2016)

System description Rule-based classification AJCC STAGE Regular expression CHILDPUGH Laboratory values x AFP: Albumin:

System description Rule-based classification AJCC STAGE Regular expression CHILDPUGH Laboratory values x AFP: Albumin: Bilirubin: Prothrombin: CLIP STAGE BCLC STAGE ECOG x x x HEPATIC ENCEPHALOPATHY ASCITES PORTAL HYPERTENSION Sentence classification EXTRAHEPATIC INVASION MACROVASCULAR INVASION METASTASIS TUMOR NUMBER Radiology report structuring TUMOR MORPHOLOGY TUMOR SIZE 37

Stage parameter: Sub-document > patient level classification Within document Patient level System Rule: Take

Stage parameter: Sub-document > patient level classification Within document Patient level System Rule: Take most severe E. g. Doc 1 – mild ascites Doc 2 - no ascites Patient level > mild ascites Chief Compllatint: Severe headache Hix of present illness: Patient came in today Baseline Rule: Take most frequent class in training data (straight classifier) 11 STAGE PARAMETERS E. g. “no ascites” most frequent value Patient level > no ascites 38

Stage parameter: Sub-document > patient level classification Label Training set (5 -fold cross validation)

Stage parameter: Sub-document > patient level classification Label Training set (5 -fold cross validation) Test set System TP System F 1 Baseline TP Baseline F 1 System TP System F 1 Ascites 130 0. 81 120 0. 75 29 0. 73 Child-Pugh 139 0. 87 99 0. 62 35 0. 88 ECOG 135 0. 84 96 0. 60 32 0. 80 Extrahepatic invasion 156 0. 98 40 1. 00 Hepatic encephalopathy 145 0. 91 136 0. 85 35 0. 88 Macrovascular invasion 134 0. 84 138 0. 86 34 0. 85 Metastasis 145 0. 91 147 0. 92 39 0. 98 Portal hypertension 131 0. 82 96 0. 60 34 0. 85 Tumor morphology 114 0. 71 90 0. 56 23 0. 58 Tumor number 106 0. 66 102 0. 64 23 0. 58 Tumor size 128 0. 80 74 0. 46 33 0. 83 ALL 1463 0. 83 1254 0. 71 357 0. 81 System Rule: Take most severe E. g. Doc 1 – mild ascites Doc 2 - no ascites Patient level > mild ascites Baseline Rule: Take most frequent class in training data (straight classifier) E. g. “no ascites” most frequent value Patient level > no ascites 39

Stage classification Stage Training set (5 -fold cross validation) Test set System TP System

Stage classification Stage Training set (5 -fold cross validation) Test set System TP System F 1 B 1 TP B 1 F 1 B 2 TP B 2 F 1 System TP System F 1 AJCC 103 0. 64 87 0. 54 86 0. 54 22 0. 55 BCLC 98 0. 61 57 0. 36 20 0. 50 CLIP 88 0. 55 53 0. 33 48 0. 30 17 0. 43 System - Translated logic for each classification scheme B 1 (Baseline 1) – Most frequent classifier (straight classifier) B 2 (Baseline 2) – Maxent classification using n-gram + UMLS concept frequencies 40

Tumoer size Tumor number Tumor morphology Portal Hypertension Metastasis Macrovascular Invasion Hepatic encephalopathy Extrahepatic

Tumoer size Tumor number Tumor morphology Portal Hypertension Metastasis Macrovascular Invasion Hepatic encephalopathy Extrahepatic Invasion ECOG Child-Pugh Ascites REFERENCE Sensitivity analysis Testing (system with 1 gold standard, 10 system stage parameters) SYS AJCC 0. 55 -- -- -- +0. 05 0. 03+ -- -- 0. 30+ -- BCLC 0. 50 -- -- 0. 13+ -- -- +0. 05 0. 03+ -- 0. 13+ +0. 05 CLIP 0. 43 -- +0. 07 -- -- -- 0. 37+ -- -- 41

Tumoer size Tumor number Tumor morphology Portal Hypertension Metastasis Macrovascular Invasion Hepatic encephalopathy Extrahepatic

Tumoer size Tumor number Tumor morphology Portal Hypertension Metastasis Macrovascular Invasion Hepatic encephalopathy Extrahepatic Invasion ECOG Child-Pugh Ascites REFERENCE Sensitivity analysis Testing (system with 1 gold standard, 10 system stage parameters) SYS AJCC 0. 55 -- -- -- +0. 05 0. 03+ -- -- 0. 30+ -- BCLC 0. 50 -- -- 0. 13+ -- -- +0. 05 0. 03+ -- 0. 13+ +0. 05 CLIP 0. 43 -- +0. 07 -- -- -- 0. 37+ -- -- Testing (system with 10 gold standard, 1 system stage parameters) GOLD AJCC 1. 00 0. 03 - 0. 02 - 0. 10 - 0. 05 - 0. 02 - 0. 37 - 0. 05 - BCLC 1. 00 0. 07 - 0. 17 - 0. 07 - 0. 10 - 0. 12 - 0. 07 - 0. 25 - 0. 15 - CLIP 1. 00 -- 0. 12 - -- -- -- 0. 07 - -- -- 0. 42 - -- -- 42

Discussion and conclusions 43

Discussion and conclusions 43

Summary System • Liver cancer stage classification + multiple extraction sub-systems • Provided intrinsic

Summary System • Liver cancer stage classification + multiple extraction sub-systems • Provided intrinsic and extrinsic evaluations • Identified challenging areas for improvement Results • Stage parameters: 0. 83 (training) 0. 81 (test) F 1 • Stage: AJCC: 0. 55, BCLC: 0. 50, CLIP: 0. 43 F 1 • Need most improvement: Tumor size, morphology, number Limitations • Small dataset • Single institution 44

Applications and future work CC: Xxxxx AJCC: II BCLC: A 4 CLIP: 3 History:

Applications and future work CC: Xxxxx AJCC: II BCLC: A 4 CLIP: 3 History: Xxxxxxx xxxxx Med: xxxxxxx Impression: AFP Bilir Alb 700 3. 5 1. 1 500 4. 5 6. 3 xxxxx Ascites: Mild/Severe …… ABOMEN: soft. Moderate to large distention but not tight. Positive large ascites on exam. …… …… Portal hypertension: Yes There are changes of hepatic cirrhosis with portal hypertension, mild splenomegaly, and small gastroesophageal varices . …… Applications • Sub-modules can be used for other tasks • System can be implemented to speed annotation task Tumor number: 2 -3 CC/HISTORY OF PRESENT ILLNESS xxxxxxxxxxxxx MEDICATIONS xxxx 10 ml x 3 days xxxx 3 tablets, twice a day FINDINGS xxxx Lesion 1: 1. 0 x 2. 3 cm, segment VII xxxxx IMPRESSIONS There are 2 lesions, suspicious for HCC xxxxxxxxxxxx Future work • Expand dataset, more training data! • Try higher order modeling (e. g. deep learning) • Pair with structured information 45

Acknowledgements University of Washington UW Biomedical Language Processing Group § Meliha Yetisgen (advisor) §

Acknowledgements University of Washington UW Biomedical Language Processing Group § Meliha Yetisgen (advisor) § Prescott Klassen § Sharon W Kwan § Michael Semanik § Lucy Vanderwende § Fei Xia § Gina-Anne Levow UW Radiology and School of Medicine § Guy Johnson § Tyler Denman Funding • NLM Training Grant T 15 LM 007442 • NIH, National Center for Advancing Translational Sciences (KL 2 TR 000421) • UW Institute of Translational Health Sciences (UL 1 TR 000423) • VA Big Data Scientist Training Enhancement Program 46

Publications [1] W. Yim, S. Kwan, M. Yetisgen. In-depth annotation for patient level liver

Publications [1] W. Yim, S. Kwan, M. Yetisgen. In-depth annotation for patient level liver cancer staging. Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis at EMNLP 2015. (Louhi’ 15), Lisbon, Portugal. September, 2015. [2] W. Yim, T. Denman, S. Kwan, M. Yetisgen. Tumor information extraction in radiology reports for hepatocellular carcinoma patients. In Proceedings of the American Medical Informatics Association Clinical Research Informatics Summit (AMIA CRI’ 16), San Francisco. March, 2016. (Best Student Paper) [3] W. Yim, M. Yetisgen, W. Harris, S. Kwan. Natural Language Processing in Oncology: A Review. Journal of the American Medical Association: Oncology. June 2016. [4] W. Yim, S. Kwan, M. Yetisgen. Tumor reference resolution and characteristic extraction in radiology reports for liver cancer stage prediction. Journal of Biomedical Informatics. December 2016. [5] W. Yim, S. Kwan, M. Yetisgen. Classifying tumor event attributes in radiology reports. Journal of the Association for Information Science and Technology. September 2017. [6] W. Yim, G. Johnson, S. Kwan, M. Yetisgen. Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports. American Medical Informatics Association Fall Symposium 2017. 47

Thank you for listening. Questions? wwyim@stanford. edu 48

Thank you for listening. Questions? wwyim@stanford. edu 48

Interannotator agreement 20 patients, 71 files Comparison TP FP FN P R F 1

Interannotator agreement 20 patients, 71 files Comparison TP FP FN P R F 1 Text span (partial) 145 56 52 0. 721 0. 736 0. 729 Patient 126 18 7 0. 875 0. 947 0. 910 • Agreement of text annotations at patient level higher because less resolution needed (Yim et al, EMNLP 2015, In-depth annotation for patient level liver cancer staging) 50

ECOG Performance Status PRINCIPLE DIAGNOSIS: CHRONIC LYMPOCYTIC LEUKEMIA OTHER PROBLEMS: 1. FREQUENT HEADACHES. ….

ECOG Performance Status PRINCIPLE DIAGNOSIS: CHRONIC LYMPOCYTIC LEUKEMIA OTHER PROBLEMS: 1. FREQUENT HEADACHES. …. . HISTORY OF PRESENT ILLNESS: He […]six months later he developed rapidly increasing leukocytosis and he became more symptomatic. … However he did have relative pancytopenia following his therapy. … SOCIAL HISTORY: He works as an airline pilot for Ven He is currently flying on Hwy routes. …. GENERAL: He was well-appearing and in no apparent distress. … HOSPITAL COURSE: The patient remained afebrile until day +11 when he spiked and was started on ceftazidime. TRANSFER DIAGNOSIS (ES) : STATUS POST METASTIC CANCER, ORIGIN UNKNOWN. SECONDARY DIAGNOSIS (ES) : 1. GLAUCOMA. 2. HISTORY OF BLADDER CANCER …. 3. COGNITIVE IMPAIRMENT …. ISSUES: The patient’s left lower extremity had severe pain which rendered the patient immobile. … Orthopedics was consulted because the patient cannot ambulate due to severe pain and also due to possibility of further fracture of the left lower extremity. … She has been cognitively confused in the mornings but during the day she has been alert. …. Physical Therapy and Occupational Therapy consulted… 51

Sentence classification performance evaluated at document level STAGE PARAM ASCITES EXTRAHEPATIC INVASION HEPATIC ENCEPHALOPATHY

Sentence classification performance evaluated at document level STAGE PARAM ASCITES EXTRAHEPATIC INVASION HEPATIC ENCEPHALOPATHY CATEGORY BASELINE SENT. CLASS. Freq P R F 1 Mild 44 0. 24 0. 18 0. 21 0. 44 0. 86 0. 58 Moderate/Severe 20 0. 50 0. 38 0. 67 0. 50 0. 59 None 146 0. 77 0. 36 0. 49 0. 55 0. 95 0. 67 No 59 0. 81 0. 85 0. 83 0. 90 Yes 2 0. 00 No 127 0. 70 0. 76 0. 73 0. 65 0. 82 0. 72 Yes - distal 20 0. 71 0. 73 0. 72 0. 62 0. 87 0. 73 Yes – minor branch 8 0. 00 52

Sentence classification performance evaluated at document level STAGE PARAM MACROVASCULAR INVASION METASTASIS PORTAL HYPERTENSION

Sentence classification performance evaluated at document level STAGE PARAM MACROVASCULAR INVASION METASTASIS PORTAL HYPERTENSION CATEGORY BASELINE SENT. CLASS. Freq P R F 1 No 127 0. 71 0. 96 0. 82 0. 78 0. 97 0. 86 Yes – major branch 20 0. 55 0. 52 0. 46 0. 90 0. 61 Yes – minor branch 8 1. 00 0. 50 0. 67 0. 80 0. 50 0. 62 No 108 0. 70 0. 74 0. 75 0. 94 0. 83 Yes - distal 6 0. 50 0. 17 0. 25 0. 00 Yes - regional 7 0. 00 0. 50 0. 29 0. 36 No 5 0. 00 0. 33 0. 20 0. 25 Yes 84 0. 80 0. 82 0. 89 0. 98 0. 93 53

Sentence classification error analysis ASCITES {NONE, MILD, MODERATE/SEVERE} Sentence identification FP – mostly correct

Sentence classification error analysis ASCITES {NONE, MILD, MODERATE/SEVERE} Sentence identification FP – mostly correct Confusion of “moderate/severe” for mild cases “moderate/severe” case have more variations: “gross ascites” “extensive ascites” “ascites […] required multiple large paracenteses” Other sentence FP due to co-occurring signs and symptoms Metastasis {NO, YES-REGIONAL, YES-DISTAL} Lots of straight-forward evidence “extrahepatic metastatic disease: none” One confusing part for Metastatis-Yes_regional “no enlarged lymph nodes” “no lymphadenopathy” Macrovascular invasion {NO, YES-MAJOR_BRANCH, YESMINOR_BRANCH} Yes-major_branch and Yes-minor_branch confused Word order, or other type of semantic variations “the portal vein” “superior mesenteric vein” “anterior branches of the right portal vein” Variations: “involvement” “infiltrated” “distended” “occluded” For Metastasis-Yes_distal Findings in other non-liver locations 54

Sentence classification error analysis Metastasis Macrovascular invasion Yes-major_branch and Yes-minor_branch confused Word order, or

Sentence classification error analysis Metastasis Macrovascular invasion Yes-major_branch and Yes-minor_branch confused Word order, or other type of semantic variations “the portal vein” “superior mesenteric vein” “anterior branches of the right portal vein” Variations: “involvement” “infiltrated” “distended” “occluded” Lots of straight-forward evidence “extrahepatic metastatic disease: none” One confusing part for Metastatis-Yes_regional “no enlarged lymph nodes” “no lymphadenopathy” For Metastasis-Yes_distal Findings in other non-liver locations Portal hypertension Very imbalanced “large gastrosphageal varicose” “spleen: enlarged” Mistakes for both Yes and No classes when “portal hypertension” was actually very clearly mentioned 55

Sentence classification error analysis Not enough annotation • N – number of features, relies

Sentence classification error analysis Not enough annotation • N – number of features, relies on significance value spread Assumption of sentence classification • Example: He reports drinking heavily up until his HCV diagnosis, but became abstinent since due to concern for the health of his liver. In the past few months Mr. Xxxxxxxx endorses sometimes getting confused, and has had others tell him that he is not acting like himself and not making sense. Mr. Xxxxxxxx denies any other symptoms or complications related to his cirrhosis including nausea, vomiting, ascites, jaundice, fatigue, edema or bleeding tendency. 56

Sentence classification discussion • Design (annotation/classification) many levels of built-in redundancy – – –

Sentence classification discussion • Design (annotation/classification) many levels of built-in redundancy – – – Sentence classification Tumor characteristics Multiple data feeds HISTORY OF PRESENT ILLNESS. . Over the last 3 weeks he has developed abdominal distention consistent with ascites. . PHYSICAL EXAMINATION. . . ABDOMEN: soft. Moderate to large distention but not tight. Positive large ascites on exam. . ASSESSMENT AND PLAN: . . . He is showing evidence of increasing ascites most likely related to some mild hepatic impairment in conjunction with extensive vascular invasion. 57

Reference Resolution Agreement N = 20 files COREF MU C B-3 P 0. 956

Reference Resolution Agreement N = 20 files COREF MU C B-3 P 0. 956 R 0. 915 F 1 0. 935 P 0. 980 R 0. 964 F 1 0. 972 TP 95 FP 36 ANNOTATOR 2 Clusters (no singletons): 40 Average size: 2. 7 Clusters (with singletons) : 149 Particularization FN 1 P 0. 725 R 0. 990 F 1 0. 837 58

Peculiarities Measurements are used as referring expressions At times unclear if single lesion or

Peculiarities Measurements are used as referring expressions At times unclear if single lesion or multiple Measurements - May be approximate - Multi-dimensional (largest usually referred to) The following lesions are hypervascular with delayed washout, characteristic for HCC: Segment VII: 2. 6 x 2. 4 cm (37/4). Segment VI/VII: 5. 6 x 4. 5 cm (47/4). Segment III: 2. 6 x 2. 0 cm (45/4). Segment III subcapsular: 0. 9 cm (52). Segment II/III: 1. 5 x 1. 4 cm (35/4). The latter lesion could also represent 2 separate smaller lesions, measuring 1. 4 and 1. 0 cm, best appreciated on delayed phase (37/6). … Impression: 1. 5 or 6 hypervascular lesions within the liver with delayed washout, characteristic for HCC, the largest over 5 cm. 59

Number is not always clear 5 -6 -mm segment 6/7 and 5 hyper vascular

Number is not always clear 5 -6 -mm segment 6/7 and 5 hyper vascular foci without washout suggesting indeterminate lesion A small enhancing area seen along the lateral aspect of segment 6, image 62/4, segment 7, image 42/4 and segment 4 b/5, image 52/4 likely small dysplastic nodule or THAD 1. Subcapuslar 2. 4 x 1. 7 cm arterial enhancing lesion with delayed washout in segment 7/4 A/B, characteristic for HCC. 60

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6 x 2. 3 cm hypervascular with washout on delayed imaging, image 3/32 and 7/33. Lesion 2: segment 4 A/B, 0. 8 cm hypodense on all phases, image 7/32 Lesion 3: Segment 7, 2. 3 cm … Impression: Agree with outside report: 3 focal lesions: Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. Tumor number • • • Single 2 -3 >3 Tumor size • • • Single 2 -3 >3 Tumor morphology • • • Single AND <50% liver >1 AND <50% liver >50% liver 61

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6 x 2. 3 cm hypervascular with washout on delayed imaging, image 3/32 and 7/33. Lesion 2: segment 4 A/B, 0. 8 cm hypodense on all phases, image 7/32 Lesion 3: Segment 7, 2. 3 cm … Impression: Agree with outside report: 3 focal lesions: Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. Tumor number • • • Single 2 -3 >3 Tumor size • • • Single 2 -3 >3 Tumor morphology • • • Single AND <50% liver >1 AND <50% liver >50% liver ADD 62

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6 x 2. 3 cm hypervascular with washout on delayed imaging, image 3/32 and 7/33. Lesion 2: segment 4 A/B, 0. 8 cm hypodense on all phases, image 7/32 Lesion 3: Segment 7, 2. 3 cm … Impression: Agree with outside report: 3 focal lesions: Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. Tumor number • • • Single 2 -3 >3 Tumor size • • • <3 cm 3 -5 cm >5 cm Tumor morphology • • • Single AND <50% liver >1 AND <50% liver >50% liver 63

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6 x 2. 3 cm hypervascular with washout on delayed imaging, image 3/32 and 7/33. Lesion 2: segment 4 A/B, 0. 8 cm hypodense on all phases, image 7/32 Lesion 3: Segment 7, 2. 3 cm MAX … Impression: Agree with outside report: 3 focal lesions: Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. Tumor number • • • Single 2 -3 >3 Tumor size • • • <3 cm 3 -5 cm >5 cm Tumor morphology • • • Single AND <50% liver >1 AND <50% liver >50% liver 64

Tumor-related stage parameters Tumor number Focal lesions: • Single Total number: 2: • 2

Tumor-related stage parameters Tumor number Focal lesions: • Single Total number: 2: • 2 -3 >50% >3 Lesion 1: segment 3, 3. 6 x 2. 3 of cm liver invaded • logic hypervascular with washout on delayed imaging, image 3/32 and 7/33. Any tumor >10 cm? Tumor size • <3 cm Lesion 2: segment 4 A/B, 0. 8 cm hypodense on • 3 -5 cm all phases, image 7/32 Right lobe invaded? • >5 cm Lesion 3: Segment 7, 2. 3 cm MAX … Tumor morphology Left lobe + some right lobe? Impression: • Single AND <50% liver Agree with outside report: • >1 AND <50% liver • >50% liver 3 focal lesions: >4 segments invaded? Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. 65

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6

Tumor-related stage parameters Focal lesions: Total number: 2: Lesion 1: segment 3, 3. 6 x 2. 3 cm hypervascular with washout on delayed imaging, image 3/32 and 7/33. Lesion 2: segment 4 A/B, 0. 8 cm hypodense on all phases, image 7/32 Lesion 3: Segment 7, 2. 3 cm … Impression: Agree with outside report: 3 focal lesions: Segment 3 lesion is consistent with HCC. Segment 4 A/B lesion is indeterminate. Segment 7 lesion is suspicious for HCC. Tumor number • • • Single 2 -3 >3 Tumor size • • • <3 cm 3 -5 cm >5 cm Tumor morphology • • • Single AND <50% liver >1 AND <50% liver >50% liver 66

>50% of liver? Criteria Tumor >= 10 cm >4 segments of liver involved Right

>50% of liver? Criteria Tumor >= 10 cm >4 segments of liver involved Right lobe of liver involved Entire left lobe + some right lobe parts involved Some description to suggest much of liver involved, e. g. “massive”

Need knowledge of hierarchy segment 4 Segment IV of liver left medial section Left

Need knowledge of hierarchy segment 4 Segment IV of liver left medial section Left medial division of liver left lobe Left lobe of liver 68

Some issues Assume: we have perfect text spans for information regarding anatomy all segments

Some issues Assume: we have perfect text spans for information regarding anatomy all segments both hepatic lobes both kidneys left and right lobes left and right portal veins main, right and proximal left portal veins main portal, left and right portal veins main right, left portal vein portal and hepatic veins segment 5/6 segment 4 b/5 segment 6/7 (15/39) and 5 segment 7/4 A/B segment 8 and 6 segments II – III segments VIII and V/Ivb segments II, Iva and IVb

Line to organ mapping Comparison: Ultrasound examination dated xxxxx Findings: Lungs bases: There is

Line to organ mapping Comparison: Ultrasound examination dated xxxxx Findings: Lungs bases: There is calcification of the coronary arteries. There is a new 1. 3 x 0. 9 cm a subpleural nodule in the right base. No pleural effusion. Abdomen: Liver: Nodular cirrhotic liver. There is convential hepatic arterial anatomy, which is patent There is recanalized paraumbilical vein and few gastrohepatic ligament and splenic hilum varices. Biliary tree: No intra or extra-hepatic biliary ductal dilation. Gallbladder: Present and normal. Spleen: Splenomegaly measuring 16 cm in craniocaudal dimension. Pancreas: Normal.

Line to organ mapping Have map of organ to synonyms and organ adjectives •

Line to organ mapping Have map of organ to synonyms and organ adjectives • Organ list created by taking subclasses of Organ in FMA • Synonyms of the concept are saved • Adjectives created by taking all pertainyms from Word. Net that point to an organ Organ Adjective forms Kidney Nephritic, renal, adrenal Liver Hepatic Lung Pulmonic, lung-like, pulmonary, pneumogastric, pneumonic, cardiopulmonary, intrapulmonary Prostate Prostatic, prostate Spleen Lienal, splenetic, splenic Tibia tibial

Definition of involvement Segment • If anything that is part of segment (e. g.

Definition of involvement Segment • If anything that is part of segment (e. g. blood vessel) with tumor • Any liver section (lower than lobe), (e. g. “anterior section of left liver) with tumor Right/Left/Whole liver • If there is a modifier “majority”, “most”, “entire”, “all” • Considered modifying an anatomy entity if it has the shortest path compared to other anatomy entities Extensive or Massive tumor • If there is a modifier “extensive”, “infiltrative”, “massive” • Look at shortest dependency path, and if no other “tumor reference” entity in the way, then considered true

Tumor characteristics annotator No Ref. Res. Gold temp. System temp. /Gold / Sys ref

Tumor characteristics annotator No Ref. Res. Gold temp. System temp. /Gold / Sys ref ref Largest size 0. 79 0. 93 0. 86 0. 77 Tumor # 0. 14 0. 69 0. 55 0. 50 >50% liver invaded 0. 93 0. 94 0. 90 73

Patient classification results Parenthesis enclosed a “relaxed” scoring AJCC: BCLC: {IIIa, IIIb, IIIc} ->

Patient classification results Parenthesis enclosed a “relaxed” scoring AJCC: BCLC: {IIIa, IIIb, IIIc} -> III {A 1, A 2, A 3, A 4} -> A {IVa, IVb} -> IV TRAINING SET TEST SET Label TP F 1 AJCC 103 0. 64 (0. 68) 22 0. 55 (0. 60) BCLC 98 0. 61 (0. 69) 20 0. 50 (0. 55) CLIP 88 0. 55 (0. 55) 17 0. 43 (0. 43) 74