1 Reinforcement learning in High Energy Physics MACIEJ

  • Slides: 24
Download presentation
1 Reinforcement learning in High Energy Physics MACIEJ W. MAJEWSKI – AGH UST (Kraków,

1 Reinforcement learning in High Energy Physics MACIEJ W. MAJEWSKI – AGH UST (Kraków, Poland)

Agenda • Short mention of my main projects • What is Reinforcement Learning •

Agenda • Short mention of my main projects • What is Reinforcement Learning • Deep. Learn. Physics • RL in LARTPC 2

LHCb 3

LHCb 3

Machine Learning for Velo Anomaly detection system for Velo caliration – Uses bayesian network

Machine Learning for Velo Anomaly detection system for Velo caliration – Uses bayesian network to detect anomalies before we use miscalibrated detector Calibration time prediction (survival analysis)– when should we recalibrate Dimensionality reduction for monitoring 4

Velo Calibration – complete model 5

Velo Calibration – complete model 5

Results of calibration prediction 6 • We can predict calibration, basing only on the

Results of calibration prediction 6 • We can predict calibration, basing only on the pedestal subtracted data!

7 Reinforcement Learning in High Energy Physics

7 Reinforcement Learning in High Energy Physics

LARTPC and 3 D imaging Liquid Argon Time Projection Chamber – LARTPC 3 D

LARTPC and 3 D imaging Liquid Argon Time Projection Chamber – LARTPC 3 D detector! (both tracks and energy) The same old problem: monte carlo model is very time consuming Hence the use of Machine Learning 8

LAr. TPC and 3 D imaging The Deep. Learn. Physics Collaboration (I'm not associated)

LAr. TPC and 3 D imaging The Deep. Learn. Physics Collaboration (I'm not associated) But the dataset is opened to the public 9

LAr. TPC – semantic segmentation There is 3 D readout (192 x 192) of

LAr. TPC – semantic segmentation There is 3 D readout (192 x 192) of a single float value (eg. [1. 3] 10 The output is 3 D (same shape) but of four categorical values ( eg. [0, 0, 1, 0])

LAr. TPC – SOA solution Laura Domine and Kazuhiro Terao Solution uses Sparse Convolution

LAr. TPC – SOA solution Laura Domine and Kazuhiro Terao Solution uses Sparse Convolution U-net 11

LAr. TPC – U-net Convolutions Deconvolutions Max pooling Copies propagated through the layers 12

LAr. TPC – U-net Convolutions Deconvolutions Max pooling Copies propagated through the layers 12

LAr. TPC – sparse convolutions Sparse convolutions Think of very picky convolutions 13

LAr. TPC – sparse convolutions Sparse convolutions Think of very picky convolutions 13

LAr. TPC - SOA Accuracy ~98% 14

LAr. TPC - SOA Accuracy ~98% 14

LAr. TPC – My Idea Lets do something really crazy: use reinforcement learning to

LAr. TPC – My Idea Lets do something really crazy: use reinforcement learning to solve this problem. The agent should move through nonzero voxels, and categorise them Hopefully it will learn how to behave like a particle! This is Work In Progress 15

What is Reinforcement Learning 16

What is Reinforcement Learning 16

What is Reinforcement Learning 17

What is Reinforcement Learning 17

RL 18 Source Target Result ?

RL 18 Source Target Result ?

RL – environment interaction 19 Source Window (eg. 3 x 3 x 3) Source

RL – environment interaction 19 Source Window (eg. 3 x 3 x 3) Source Result Position Movement Result Actor (Neural Network) New Result

RL – environment interaction 20 Source Target Window (eg. 3 x 3 x 3)

RL – environment interaction 20 Source Target Window (eg. 3 x 3 x 3) Source Result Position Movement Reward Result Actor (Neural Network) New Result

RL – environment learning 21 Target Result Actor (Neural Network) Source Result

RL – environment learning 21 Target Result Actor (Neural Network) Source Result

RL REWARD( 22 Target , ) = ? ? Result

RL REWARD( 22 Target , ) = ? ? Result

Demo time! (If we still have some time)

Demo time! (If we still have some time)

RL 24 Pros: The underlying model should be able to learn how a single

RL 24 Pros: The underlying model should be able to learn how a single particle behaves, so we have model of an particle, not detector itself Cons: It could be independent of the detector (its size) It’s really, really difficult to find good reward system. . . We can run the model multiple times, to improve the results . . . and then the model that will be able to learn In principle, we could generate examples with said model AFAIK: RL has not been applied to HEP problems …. really cool