Electron Identification Based on Boosted Decision Trees HaiJun

Electron Identification Based on Boosted Decision Trees Hai-Jun Yang University of Michigan, Ann Arbor (with T. Dai, X. Li, A. Wilson, B. Zhou) ATLAS Egamma Meeting October 2, 2008

Motivation • Lepton (e, m, t) Identification with high efficiency is crucial for new physics discoveries at the LHC • Great efforts in ATLAS to develop the algorithms for electron identification: – Cut-based algorithm: Is. EM – Multivariate algorithms: Likelihood and BDT • Further improvement could be achieved with better treatment of the multivariate training using the Boosted Decision Trees technique Electron ID with BDT 2

MC Samples for e-ID studies e Signal Dataset SW Version W en 5104 V 13 W en 5104 V 12 Z ee 5144 V 12 5922, 5925 V 12 5931 V 12 Dataset SW Version J 0: di-jet (8<Pt<17 Ge. V) 5009 V 12, V 13 J 1: di-jet (17<Pt<35 Ge. V) 5010 V 12, V 13 J 2: di-jet (35<Pt<70 Ge. V) 5011 V 12, V 13 J 3: di-jet (70<Pt<140 Ge. V) 5012 V 12 J 4: di-jet (140<Pt<280 Ge. V) 5013 V 12, V 13 J 5: di-jet (280<Pt<560 Ge. V) 5014 V 12, V 13 J 6: di-jet (560<Pt<1120 Ge. V) 5015 V 12, V 13 ttbar Wb Wb all jets 5204 V 12 WW enmn ZZ 4 l Jet samples Electron ID with BDT 3

Electron Identification Studies Selectrons in two steps 1) Pre-selection: an EM cluster matching a track 2) Apply electron ID based on pre-selected samples with different e-ID algorithms (Is. EM, and Likelihood for SW release v 12 samples; add BDT for v 13). New BDT e-ID development at U. Michigan – Based on version 12 datasets (talk by H. Yang) http: //indico. cern. ch/conference. Display. py? conf. Id=38991 -- Further study based on version 13 datasets Performance comparisons -- electron ID efficiency -- jet fake rate Electron ID with BDT 4

Signal Pre-selection: MC electrons • MC True electron from W en by requiring – |he| < 2. 5 and ETtrue>10 Ge. V (Ne) • Match MC e/g to EM cluster: – DR<0. 2 and 0. 5 < ETrec / ETtrue< 1. 5 (NEM) • Match EM cluster with an inner track: – eg_trkmatchnt > -1 (NEM/track) • Pre-selection Efficiency = NEM/Track / Ne Electron ID with BDT 5

Electrons W en MC Electrons after Pre-selection Electron ID with BDT 6

Electron Pre-selection Efficiency The inefficiency mainly due to track matching W en Electron ID with BDT 7

Electron Pre-selection Efficiency e from process Dataset Software Version EM / Track Match W en ( Ne = 135000) 5104 V 13 89. 1% W en ( Ne = 485489) 5104 V 12 88. 2% Z ee (Ne = 29383) 5144 V 12 87. 3% WW enmn ( Ne = 39822) 5922 5925 V 12 87. 8% ZZ 4 l ( Ne = 97928) 5931 V 12 87. 4% Electron ID with BDT 8

Pre-selection of Jet Faked Electrons • Count number of jets with – |hjet| < 2. 5, ETjet >10 Ge. V (Njet) • Loop over all EM clusters; each cluster matches with a jet – ETEM > 10 Ge. V (NEM) • Match EM cluster with an inner track: – eg_trkmatchnt > -1 (NEM/track) • Pre-selection Acceptance = NEM/Track / Njet Electron ID with BDT 9

Jets (from tt) and Faked Electrons Jet ET (with EM / Track selected) EM/Track ET Electron ID with BDT 10

Faked Electron from Top Jets vs Different EM ET ET > 10 Ge. V ET > 20 Ge. V Electron ID with BDT 11

Jet Fake Rate from Pre-selection ETjet > 10 Ge. V, |hjet| < 2. 5, Match the EM/Track object to the closest jet From process Dataset Njets V 13 V 12 J 0: di-jet (8<Pt<17 Ge. V) 5009 404363 4. 8 E-3 6. 0 E-3 J 1: di-jet (17<Pt<35 Ge. V) 5010 724033 1. 5 E-2 J 2: di-jet (35<Pt<70 Ge. V) 5011 713308 9. 1 E-2 1. 1 E-1 J 3: di-jet (70<Pt<140 Ge. V) 5012 42330 N/A 3. 2 E-1 J 4: di-jet (140<Pt<280 Ge. V) 5013 1185538 3. 3 E-1 4. 3 E-1 J 5: di-jet (280<Pt<560 Ge. V) 5014 1606039 3. 6 E-1 5. 1 E-1 J 6: di-jet (560<Pt<1120 Ge. V) 5015 1828401 3. 3 E-1 5. 0 E-1 ttbar Wb Wb all jets 5204 675046 N/A 3. 2 E-1 Electron ID with BDT 12

Existing ATLAS e-ID Algorithms 1) Is. EM & 0 x 7 FFFFF == 0 (v 13) 2) Likelihood: DLH = log (EMWeight / Pion. Weight) > 6. 5 (V 13) 3) Ele_BDTScore (Rel. v 13) > 7 (v 13) e-ID in V 12 (talk by H. Yang on Sept. 10, 2008): http: //indico. cern. ch/conference. Display. py? conf. Id=38991 1) Is. EM & 0 x 7 FF == 0 2) Likelihood: DLH = EMWeight /(EMWeight+Pion. Weight) > 0. 6

e-ID multivariate discriminators (v 13) Likelihood discriminator Discriminator of Ele_BDTScore

Variables Used for BDT e-ID (UM) The same variables for Is. EM are used egamma. PID: : Cluster. Hadronic. Leakage fraction of transverse energy in Tile. Cal 1 st sampling egamma. PID: : Cluster. Middle. Sampling Ratio of energies in 3*7 & 7*7 window Ratio of energies in 3*3 & 7*7 window Shower width in LAr Energy in LAr 2 nd sampling egamma. PID: : Cluster. First. Sampling Fraction of energy deposited in sampling Delta Emax 2 in LAr 1 st sampling Emax 2 -Emin in LAr 1 st sampling Total shower width in LAr 1 st sampling Shower width in LAr 1 st sampling Fside in LAr 1 st sampling egamma. PID: : Track. Hits. A 0 B-layer hits, Pixel-layer hits, Precision hits Transverse impact parameter egamma. PID: : Track. TRT Ratio of high threshold and all TRT hits egamma. PID: : Track. Match. And. Eo. P Delta eta between Track and egamma sampling 1 st Delta phi between Track and egamma E/P – egamma energy and Track momentum ratio Track Eta and EM Eta Electron isolation variables: Number of tracks (DR=0. 3) Sum of track momentum (DR=0. 3) Ratio of energy in DR=0. 2 -0. 45 and DR=0. 2 15

EM Shower shape distributions of discriminating Variables (signal vs. background) Energy Leakage in HCal EM Shower Shape in ECal Electron ID with BDT 16

ECal and Inner Track Match P Dh of EM Cluster & Track Electron ID with BDT E E/P Ratio of EM Cluster 17

Electron Isolation Variables Ntrk around Electron Track ET(DR=0. 2 -0. 45)/ET(DR=0. 2)of EM Electron ID with BDT 18

BDT e-ID Training (UM) • BDT multivariate pattern recognition technique: – [ H. Yang et. al. , NIM A 555 (2005) 370 -385 ] • BDT e-ID training signal and backgrounds (jet faked e) – W en as electron signal – Di-jet samples (J 0 -J 6), Pt=[8 -1120] Ge. V – ttbar hadronic decays samples (Rel. v 12 only) • BDT e-ID training procedure – Event weight training based on background cross sections [ H. Yang et. al. , JINST 3 P 04004 (2008) ] – Apply additional cuts on the training samples to select hardly identified jet faked electron as background for BDT training to make the BDT training more effective. – Apply additional event weight to high PT backgrounds to effective reduce the jet fake rate at high PT region. Electron ID with BDT 19

Use Independent Samples to Test the BDT e-ID Performance • BDT Test Signal (e) Samples: – – W en (Rel. v 12, v 13) WW enmn (Rel. v 12) Z ee (Rel. v 12) ZZ 4 l (Rel. v 12) • BDT Test Background (jet faked e) Samples: – – Di-jet samples, Pt=[8 -1120] Ge. V (Rel. v 12, v 13) ttbar hadronic decays samples (Rel. v 12) W mn + Jets (Rel. v 12) Z mm + Jets (Rel. v 12) Electron ID with BDT 20

BDT e-ID discriminator (UM)

Comparison of e-ID Algorithms BDTs have high e-ID efficiency and low jet fake rate BDT (UM) has achieved better performance

Comparison of Is. EM vs BDT-UM Is. EM Eff. BDT-UM / Eff. Is. EM

Comparison of Likelihood vs BDT-UM Likelihood Eff. BDT-UM / Eff. Likelihood

Comparison of BDT-v 13 vs BDT-UM BDT-v 13 Eff. BDT-UM / Eff. BDT-v 13

Jet Fake Rate (Is. EM vs BDT-UM) Is. EM BDT-UM Fakerate. Is. EM Fakerate. BDT-UM

Jet Fake Rate (Likelihood vs BDT-UM) Likelihood BDT-UM Fakerate. Likelihood Fakerate. BDT-UM

Jet Fake Rate (BDT-v 13 vs BDT-UM) BDT-v 13 BDT-UM Fakerate. BDT-v 13 Fakerate. BDT-UM

Overall Electron Efficiency and Fake Rate from Jets (ET (EM) > 10 Ge. V) From process Is. EM Likelihood BDT (Rel. v 13) (U. Michigan) W en (Signal) 65. 7% 78. 5% 78. 6% 82. 3% J 0: di-jet (8<Pt<17 Ge. V) 1. 8 E-4 7. 1 E-5 8. 7 E-5 6. 0 E-5 J 1: di-jet (17<Pt<35 Ge. V) 3. 8 E-4 1. 5 E-4 1. 6 E-4 1. 1 E-4 J 2: di-jet (35<Pt<70 Ge. V) 6. 3 E-4 2. 9 E-4 1. 8 E-4 6. 7 E-5 J 3: di-jet (70<Pt<140 Ge. V) N/A N/A J 4: di-jet (140<Pt<280 Ge. V) 5. 2 E-4 3. 8 E-4 1. 6 E-4 8. 7 E-5 J 5: di-jet (280<Pt<560 Ge. V) 5. 5 E-4 4. 6 E-4 1. 7 E-4 1. 2 E-4 J 6: di-jet (560<Pt<1120 Ge. V) 4. 4 E-4 6. 5 E-4 2. 2 E-4 2. 0 E-4 29

Summary and Future Plan • Electron ID efficiency can be improved by using BDT multivariate particle identification technique – e Eff = 65. 7% (Is. EM) 78. 5% (LH) 82. 3% (BDT). • BDT technique also reduce the jet fake rate • Incorporate the Electron ID based on BDT into ATLAS official reconstruction package • BDT training with real data: - Selectron signals Z ee (Tag-Prob) - Select fake electron from di-jet samples Electron ID with BDT 30

Backup Slides Electron ID with BDT 31

Comparison of e-ID Algorithms (v 13) Dijet: PT=140 -280 Ge. V

Performance of The BDT e-ID (v 12) BDT Output Distribution Jet Fake Rate vs e-ID Eff. Cut Jet fake e-Signal Electron ID with BDT 33

Comparison of e-ID Algorithms (v 12) Di-jet Samples J 0: Pt = [8 -17] Ge. V J 1: Pt = [17 -35] Ge. V J 2: Pt = [35 -70] Ge. V J 3: Pt = [70 -140] Ge. V J 4: Pt = [140 -280] Ge. V J 5: Pt = [280 -560] Ge. V J 6: Pt = [560 -1120] Ge. V ttbar: All hadronic decays BDT e-ID: – High efficiency – Low fake rate Electron ID with BDT 34

Comparison of e-ID Algorithms (v 12) Di-jet Samples J 0: Pt = [8 -17] Ge. V J 1: Pt = [17 -35] Ge. V J 2: Pt = [35 -70] Ge. V J 3: Pt = [70 -140] Ge. V J 4: Pt = [140 -280] Ge. V J 5: Pt = [280 -560] Ge. V J 6: Pt = [560 -1120] Ge. V ttbar: All hadronic decays BDT Results – High electron eff – Low jet fake rate Electron ID with BDT 35

Electron ID Eff vs. h (W en) BDT Likelihood Is. EM Electron ID with BDT 36

Electron ID Eff vs PT (W en ) Electron ID with BDT 37

Overall e-ID Efficiency (ET > 10 Ge. V) From process Is. EM Likelihood BDT (no Isolation) (Isolation) W en 65. 6% 75. 4% 81. 7% 81. 6% Z ee 66. 7% 75. 8% 82. 6% 82. 4% WW enmn 66. 9% 76. 4% 82. 6% 81. 7% ZZ 4 l 67. 5% 77. 0% 83. 1% 81. 4% H WW enmn (140 Ge. V) 66. 1% 75. 4% 80. 7% 78. 7% H WW enmn (150 Ge. V) 66. 4% 76. 0% 81. 2% 78. 6% H WW enmn (160 Ge. V) 66. 8% 76. 7% 81. 9% 78. 6% H WW enmn (165 Ge. V) 67. 3% 77. 2% 82. 1% 78. 8% H WW enmn (170 Ge. V) 67. 7% 77. 3% 82. 3% 79. 5% Electron 67. 7%ID with BDT 77. 5% 82. 4% 80. 1% H WW enmn (180 Ge. V) 38

Overall Electron Fake Rate from Jets ET (EM) > 10 Ge. V From process Is. EM Likelihood BDT (no isolation) (Isolation) J 0: di-jet (8<Pt<17 Ge. V) 2. 6 E-4 2. 8 E-4 1. 0 E-4 J 1: di-jet (17<Pt<35 Ge. V) 6. 3 E-4 7. 7 E-4 4. 9 E-4 2. 0 E-4 J 2: di-jet (35<Pt<70 Ge. V) 1. 7 E-3 2. 3 E-3 1. 4 E-3 4. 4 E-4 J 3: di-jet (70<Pt<140 Ge. V) 1. 5 E-3 2. 0 E-3 6. 6 E-4 4. 7 E-5 J 4: di-jet (140<Pt<280 Ge. V) 1. 4 E-3 1. 7 E-3 8. 4 E-4 1. 7 E-4 J 5: di-jet (280<Pt<560 Ge. V) 1. 5 E-3 2. 0 E-3 1. 2 E-3 2. 3 E-4 J 6: di-jet (560<Pt<1120 Ge. V) 1. 1 E-3 2. 5 E-3 1. 4 E-3 2. 1 E-4 ttbar Wb Wb all jets 4. 2 E-3 4. 8 E-3 3. 0 E-3 2. 8 E-4 Electron ID with BDT 39

Overall Electron Fake Rate from m +Jets Events Why the fake rate increase from single m to di-m events? From process Is. EM Likelihood BDT (no isolation) (Isolation) W mn, J 1 1. 6 E-3 4. 8 E-3 1. 7 E-3 8. 2 E-4 W mn, J 2 2. 0 E-3 4. 6 E-3 1. 8 E-3 9. 6 E-4 W mn, J 3 1. 8 E-3 3. 5 E-3 1. 6 E-3 7. 6 E-4 W mn, J 4 2. 0 E-3 4. 0 E-3 1. 6 E-3 7. 8 E-4 W mn, J 5 2. 0 E-3 3. 6 E-3 1. 8 E-3 6. 7 E-4 Z mm, J 2 2. 3 E-3 6. 8 E-3 2. 1 E-3 Z mm, J 3 2. 0 E-3 6. 1 E-3 2. 1 E-3 1. 7 E-3 Z mm, J 4 2. 2 E-3 5. 5 E-3 2. 5 E-3 1. 6 E-3 Z mm, J 5 2. 1 E-3 5. 1 E-3 2. 3 E-3 1. 3 E-3 Electron ID with BDT 40

Fake Electron from an EM Cluster associated with a muon track It can be suppressed by requiring DR between m & EM greater than 0. 1 DR between m & EM Electron ID with BDT DR between m & EM 41

Fake Electron from an EM Cluster associated with a muon track Electron ID with BDT 42

Rank of Variables (Gini Index) 1. Ratio of Et(DR=0. 2 -0. 45) / Et(DR=0. 2) 2. Number of tracks in DR=0. 3 cone 3. Energy leakage to hadronic calorimeter 4. EM shower shape E 237 / E 277 5. Dh between inner track and EM cluster 6. Ratio of high threshold and all TRT hits 7. h of inner track 8. Number of pixel hits 9. Emax 2 – Emin in LAr 1 st sampling 10. Emax 2 in LAr 1 st sampling 11. D 0 – transverse impact parameter 12. Number of B layer hits 13. Eover. P – ratio of EM energy and track momentum 14. Df between track and EM cluster 15. Shower width in LAr 2 nd sampling 16. Sum of track Pt in DR=0. 3 cone 17. Fraction of energy deposited in LAr 1 st sampling 18. Number of pixel hits and SCT hits 19. Total shower width in LAr 1 st sampling 20. Fracs 1 – ratio of (E 7 strips-E 3 strips)/E 7 strips in LAr 1 st sampling 21. Shower width in LAr 1 st sampling Electron ID with BDT 43

Boosted Decision Trees Relatively new in HEP – Mini. Boo. NE, Ba. Bar, D 0(single top discovery), ATLAS Advantages: robust, understand ‘powerful’ variables, relatively transparent, … “A procedure that combines many weak classifiers to form a powerful committee” BDT Training Process • Split data recursively based on input variables until a stopping criterion is reached (e. g. purity, too few events) • Every event ends up in a “signal” or a “background” leaf • Misclassified events will be given larger weight in the next decision tree (boosting) H. Yang et. al. NIM A 555 (2005)370, NIM A 543 (2005)577, NIM A 574(2007) 342

A set of decision trees can be developed, each re-weighting the events to enhance identification of backgrounds misidentified by earlier trees (“boosting”) For each tree, the data event is assigned +1 if it is identified as signal, - 1 if it is identified as background. The total for all trees is combined into a “score” BDT discriminator Background-like negative positive signal-like