Application of Data Mining and Machine Learning for

  • Slides: 28
Download presentation
Application of Data Mining and Machine Learning for Weather Forecasting: A Comparative Study Nasimul

Application of Data Mining and Machine Learning for Weather Forecasting: A Comparative Study Nasimul Hasan C 121046 Nayan Chandra Nath C 121038 Department of CSE International Islamic University Chittagong 1

Outline v Introduction v Motivation and Goal v Methodology v Experiment Design v Result

Outline v Introduction v Motivation and Goal v Methodology v Experiment Design v Result analysis v Conclusion 2

Introduction Weather: Has great significance over our agriculture. v Deterministically chaotic system v Lack

Introduction Weather: Has great significance over our agriculture. v Deterministically chaotic system v Lack of proper data v Continuous change of climate 3

Problem Ø The main challenge is to predict weather with most accuracy ØLots of

Problem Ø The main challenge is to predict weather with most accuracy ØLots of work done before ØChange of seasons 4

Previous Work üA. Mellit, A. Massi Pavan & M. Benghanem developed a SVM model

Previous Work üA. Mellit, A. Massi Pavan & M. Benghanem developed a SVM model which can produce up to 99% accurate prediction for different models. ü Hall and Tony proposed A neural network model using input from the Eta model and upper air soundings for the probability of precipitation (Po. P) and quantitative precipitation forecast (QPF) for the Dallas-Fort Worth, Texas area. Their model forecasts with over 70% of the Po. P forecasts being less than 5% or greater than 95%. 5

Motivation & Goal Motivation: ü SVR and ANN is a powerful machine learning technique

Motivation & Goal Motivation: ü SVR and ANN is a powerful machine learning technique for pattern recognition ü Introducing of using different kinds of windowing function as data preprocess is a new idea ü Combining windowing function and support vector regression can make good model for time series prediction. Goal: ü Propose a good Machine Learning model to predict rainfall and temperature. 6

Methodology Support Vector Regression § Support vector machine (SVM), a novel artificial intelligence- based

Methodology Support Vector Regression § Support vector machine (SVM), a novel artificial intelligence- based method developed from statistical learning theory § SVM has two major features: classification (SVC) & regression (SVR). § In SVM regression, the input is first mapped onto a m-dimensional feature space using some fixed (nonlinear) mapping, and then a linear model is constructed in this feature space. § A margin of tolerance (epsilon) is set in approximation. § This type of function is often called – epsilon intensive – loss function. § Usage of slack variables to overcome noise in the data and nonseparability 7

Methodology (cont. . ) The regression problem of SVM can be expressed as the

Methodology (cont. . ) The regression problem of SVM can be expressed as the following optimization problem. Minimize: Subject to: 8

Methodology (cont. . ) Artificial Neural Network has its starting points in endeavors to

Methodology (cont. . ) Artificial Neural Network has its starting points in endeavors to discover numerical representations of data processing in biological systems[31]. Without a doubt, it has been utilized extensively to cover an extensive variety of various models, a lot of them have been the subject of misrepresented cases with respect to their biological credibility. From the viewpoint of applications of pattern recognition, however, biological authenticity would force totallysuperfluous limitations. 9

The ANN Network 10

The ANN Network 10

Methodology (cont. . ) Windowing operator: ü Transform the time series data into a

Methodology (cont. . ) Windowing operator: ü Transform the time series data into a generic data set ü Convert the last row of a window within the time series into a label or target variable Parameters: § Horizon (h) § Window size § Step size § Training window width § Testing window width ü Fed the cross sectional values as inputs to the machine learning technique such as liner regression, Neural Network, Support vector machine and so on. 11

12

12

Methodology (cont. . ) Moving Average: 13

Methodology (cont. . ) Moving Average: 13

Experiment Design Data q Experiment dataset had been collected from Meteorological Department, Bangladesh. q

Experiment Design Data q Experiment dataset had been collected from Meteorological Department, Bangladesh. q 7 year’s historical data (2008 -2014) of Chittagong were collected. q Six attributes, Date, total, avg, max, min, MA were used in experiment. 14

Experiment Design Data Preprocessing q Prepared for ML using q Missing value replacement q

Experiment Design Data Preprocessing q Prepared for ML using q Missing value replacement q 80% for training and 20% for testing 15

Experiment Design Rectifier 16

Experiment Design Rectifier 16

Experiment Flowchart Training Test 17

Experiment Flowchart Training Test 17

Experiment Result evolution technique: Here, = original value of a point for a given

Experiment Result evolution technique: Here, = original value of a point for a given time period t n = the total number of fitted points = the fitted forecast value for the time period t 18

Correlation between features using Pearson Correlation matrix 19

Correlation between features using Pearson Correlation matrix 19

= the actual observations time series, is the estimated or forecasted time series, SAE

= the actual observations time series, is the estimated or forecasted time series, SAE = the sum of the absolute errors (or deviations), N = the number of non-missing data points. 20

SVM produced best result with almost 98. 65% accuracy for rainfall and 95% for

SVM produced best result with almost 98. 65% accuracy for rainfall and 95% for temperature prediction ANN produced best result with almost 97. 45% accuracy for rainfall and 96. 7% for temperature prediction 21

Results for different models using SVR 22

Results for different models using SVR 22

ANN Monthly Rainfall Horizon 1 23

ANN Monthly Rainfall Horizon 1 23

ANN Monthly Temperature Horizon 1 24

ANN Monthly Temperature Horizon 1 24

SVM Monthly Rainfall Horizon 1 25

SVM Monthly Rainfall Horizon 1 25

SVM Monthly Temperature Horizon 1 26

SVM Monthly Temperature Horizon 1 26

Conclusion Discussions : Ø Different windowing function can produce different prediction results. Limitations &

Conclusion Discussions : Ø Different windowing function can produce different prediction results. Limitations & Future works: Ø Used only Moving Average and windowing operators. Ø Only one station data set were used to undertake the experiments. Ø Did not compare with other machine learning techniques. Ø In future, we will apply our model to other rainfall data set and will also compare our research result with other types of data mining techniques. 27

Thank You 28

Thank You 28