Time Series Forecasting with Recurrent Neural Networks NN

  • Slides: 8
Download presentation
Time Series Forecasting with Recurrent Neural Networks NN 3 Competition Mahmoud Abou-Nasr Research &

Time Series Forecasting with Recurrent Neural Networks NN 3 Competition Mahmoud Abou-Nasr Research & Advanced Engineering Ford Motor Company Email: mabounas@ford. com

Software • NTOOL Software Package developed in FORD, used for training the networks.

Software • NTOOL Software Package developed in FORD, used for training the networks.

RMLP Architecture • Typically 1 -4 R-2 -1 L – One input node –

RMLP Architecture • Typically 1 -4 R-2 -1 L – One input node – Four fully recurrent nonlinear (bipolar sigmoid) nodes in the first hidden layer – 2 nonlinear nodes in the second hidden layer – One linear output node

Training Details • EKF multi-stream training, with typically 25 streams. • Each trajectory/stream is

Training Details • EKF multi-stream training, with typically 25 streams. • Each trajectory/stream is of length P, where P is no longer than half the number of points N in the series. • The input for training the network is taken from the actual series for P-M points, and from the network output for the last M points (M is the number of points to be predicted). – Switching logic is internal to the network. • Typically training time: 2 minutes per network • MSE error function.

 • P varies depending on the length of the series. – For a

• P varies depending on the length of the series. – For a short series: P is 35 or about 0. 5 N – For a long series: P is 60 or about 0. 4 N

First P-M Training Steps From Actual Series RMLP From Network Output Last M Training

First P-M Training Steps From Actual Series RMLP From Network Output Last M Training Steps RMLP

P-M=42 M=18 P=60 End Start N = 143 Typical Training Stream For a Long

P-M=42 M=18 P=60 End Start N = 143 Typical Training Stream For a Long Series

Ensemble of Networks • Maximum of ten networks of the same architecture were used

Ensemble of Networks • Maximum of ten networks of the same architecture were used to form an ensemble. • The trained networks were embedded in one architecture, with an output averaging node. • The networks used in the ensemble were the only networks trained. (They were not selected from a larger universe of trained networks).