DEPARTMENT OF PLANT BIOLOGY AND BIOTECHNOLOGY FACULTY OF
DEPARTMENT OF PLANT BIOLOGY AND BIOTECHNOLOGY FACULTY OF LIFE SCIENCES UNIVERSITY OF BENIN CITY (SEMINAR) TOPIC: PRINCIPAL COMPONENT ANALYSIS (PCA) AS AN IDEAL TOOL FOR ANALYSING ONFARM RESEARCH DATA SPEAKER: Odoligie IMARHIAGBE 27 th DATE : FEBRUARY, 2013. PRINCIPAL COMPONENT ANALYSIS (PCA) AS AN IDEAL TOOL FOR ANALYSING ON-FARM RESEARCH DATA by Odoligie IMARHIAGBE is licensed under a Creative Commons Attribution. Non. Commercial-Share. Alike 4. 0 International License.
INTRODUCTION n n On-Farm research trials have rapidly gained popularity in the past few years with due consideration being given to the knowledge, problems and priorities of farming ‘families’. Researchers, such as plant breeders and agronomists, who have been trained in techniques of on-station research, are now under pressure to move on-farm (Gomez and Gomez, 1984). During data collections in on-farm research, one encounters situations where there are large numbers of variables comprising the dataset. One of the key steps in data analysis is finding ways to reduce dimensionality without sacrificing accuracy. Principal component analysis (PCA) is a multivariate technique in data analysis. It can be used for compressing higher dimensional data sets to lower dimensional ones (Davis, 1986).
ON-FARM RESEARCH n n n On-Farm Research is a set of procedures for adaptive research whose purpose is to develop recommendations for representative groups of farmers. In On-farm research, farmers participate in identifying problems and its priorities, managing experiment and evaluating results (SSC, 1998). The objective of On-farm research is to identify existing inputs or practices that might help to solve major problems of many farmers in a defined study area (Wuest, 1999).
Plate 1: showing On-farm Tobacco Test to Evaluate and Compare Varietal Resistance to the Bacterial Disease, Granville Wilt. Photo credit: Craven County Center, (2012).
REASON FOR ON-FARM RESEARCH n n n On-farm research gives high quality results regarding the suitability of the investigated technological innovations under small, medium and large scale farming conditions (Ashby, 1986). Since farmers are the adopters, the adapters, and often the innovators of new farming techniques. It would therefore be unwise to undertake farmbased research without involving farmers in the research process as much as possible (Wuest et al. , 1991). Using accepted methods of on-farm testing, farmers can achieve experimental precision comparable to those of intensive university research trials (Spencer, 1993).
STAGES OF ON-FARM RESEARCH n Diagnosis n Planning n Experimentation n Assessment / Evaluation of results n Recommendation and diffusion
TYPES OF ON-FARM RESEARCH n Researcher-designed and Managed Trial n Researcher-designed and Farmer-managed Trials n Farmer-designed and Managed Trials
Table 1: Three Types of On-farm Research OBJECTIVES TRIAL TYPE TRIAL DESIGN TRIAL MANAGEMET Biophysical feasibility 1 Researcher-led Profitability, farmers Assessment of prototype 2 Researcher-led Farmer-led Acceptability: Farmers’ own innovations 3 Farmer-led Source: Rudebjer, (2001)
MEASUREMENTS OF ON-FARM RESEARCH DATA In On-farm trial we can distinguish between three types of measurement which include: n n n Measurement of the type that are taken in on-station trials. These are usually yield components, time to flowering, milk yields, disease scores, and so on Measurements of concomitant variables. These can be at a plot level, for example problems of water logging, or at a farm level, for example rainfall or soil type. Some variables, such as dates of sowing and weeding, and other management practices may be at either level. Measurements of the farmers’ opinions. These are gotten from informal discussions or questionnaires (SSC, 1998)
ANALYSIS OF ON-FARM DATA Analysis of on-farm data can be viewed from three ways: n n n Analysis of Questionnaire-type Data- These set of data results from interviews and other observations. This information is normally at the farmer level, though some questions can relate to particular plots. Analysis of Yield Type Data- This information is mainly at the plot level, though with some observations at the farm level. Combination of (I) and (II) Above- using the results from interviews to understand the variation in yield type data.
STATISTICAL ANALYSIS Statistical analysis refers to a collection of methods used to process large amounts of data. Source: SSC, (1998).
Principal Component Analysis n n n Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (Davis, 1986). The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to the preceding components (Cattel, 1966). PCA was invented in 1909 by Karl Pearson. Now it is mostly used as a tool in exploratory data analysis and formatting predictive models (Davis, 1986).
APPLICATIONS OF PRINCIPAL COMPONENT ANALYSIS n PCA as a multivariate technique can be use in analyzing relationships among several quantitative variables. n PCA can be use to analyze variables that are measured on different units. n PCA provides information about the relative importance of each variable in characterizing the objects. n PCA is used to reduce the number of variables of the data set, but retain most of the original variability in the data. A small number of these new variables will usually be sufficient to describe the observational objects (Rencher, 2002).
WAYS OF ANALYSING DATA PCA can be done in two ways: n n Eigenanalysis of the covariance matrix- here data are analysed without standardizing them. Eigenanaysis of the correlation matrix-here data are standardized. When using variables measured in different units, the correlation matrix must be used.
THE PCA TECHNIQUES - STAGE 1 n n The first stage in rotating the data cloud is to standardize the data. The standardized axes are labeled S 1, S 2 and S 3. Source: Hotelling, (1993)
The PCA Techniques – Stage 2 n n PCA chooses the first PCA axis as the line that goes through the centroid and also goes through the maximum variation in the data The second PCA axis also must go through the centroid, goes through the maximum variation in the data and must be “orthogonal” to PCA 1 Source: Hotelling, (1993)
THE PCA TECHNIQUES – STAGE 3 n Rotation of the coordinate frame of PCA Axis 1 to be on the Xaxis, and PCA Axis 2 to be on the Yaxis, we then get scatter diagram like this. Source: Hotelling, (1993)
HOW DO WE DETERMINE HOW MANY PRINCIPAL AXES ARE WORTH INTERPRETING? n n There as many principal components that can be computed as there are original variables, however, only the most important ones are of relevance for further analysis. this can be found by checking the eigenvalues. Every axis has an eigenvalue (also called latent root) associated with it, and they are ranked from the highest to the lowest PCA Axis 1: 63% PCA Axis 2: 33% PCA Axis 3: 4% Source: Hotelling, (1993)
HOW DO WE KNOW WHAT EACH VARIABLE CONTRIBUTE TO THE VARIOUS PRINCIPLE COMPONENT AXES? n We look at the component loadings (or "factor loadings"): this option shows to what degree your different original variables enter into the different components. These component loadings are important when you try to interpret the “meaning” of the components.
Table 2: Showing Component Loading Variables PCA 1 PCA 2 PCA 3 SI 0. 9688 0. 0664 -0. 2387 S 2 0. 9701 0. 0408 0. 2391 S 3 -0. 1045 0. 9945 0. 0061 Source: Hotelling, (1993)
Using PCA to Analysis On-farm Research Data n Chemical and textural properties were measured on soil from 18 farmers’ fields in Yamrat, Bauchi State, Nigeria (Table 3). The table has 18 observational units (Fields), each with measured variables (soil characteristics). The questions which arise are; n Which soil properties are correlated (relationship)? n Which soil properties contribute most to the overall variance in soil characteristics? n How the number of variables can be reduced without losing too much information? Source: Mutasaers et al. , (1997).
Table 3: soil characteristics of 18 farmers’ fields Source: Mutasaers et al. , (1997).
Table 4: Correlation Coefficients of Soil Characteristics of 18 Farmers'Fields Source: Mutasaers et al. , (1997).
Table 5: Eigenvalues of the Correlation Matrix and the Proportion and Total of Variance Explained by the Five Largest Principal Components Source: Mutasaers et al. , (1997).
Table 6: Presents principal components with their percentage variability Principal Component Percentage Variability (%) PRIN 1 43. 4 PRIN 2 20. 2 PRIN 3 12. 9 PRIN 4 7. 8 PRIN 5 7. 1 Source: Mutasaers et al. , (1997).
Table 7: Eigenvectors of Principal Components Representing a Linear Combination of the Original Variables Source: Mutasaers et al. , (1997).
Table 8: Standardized Principal Component Scores used as three New Variables Representing 76. 6% of the Variance from the Original 11 Soil Characteristics. Source: Mutasaers et al. , (1997).
Some Statistical Software Use for PCA Analysis n GENSTAT- General Statistics n AGSTATS- Agricultural Statistics n PAST- Pale Ontological Statistics
Conclusion n n The correct design of experimental studies, the selection of the appropriate statistical analysis of data and the efficient presentation of results are key to the good conduct and communication of science. On-farm research has shown to be site specific, broader and easily adopted as compared to on-station research. In on-farm research, during data collections, one encounter situations where there are large number of variables. A good statistical analysis would be needed to make valid conclusion about such research. Principal component analysis is a powerful tool for reducing a number of observed variables into a smaller number of variables that account for most of the variance in the data set and it allows the use of variables which are not measured in the same units. Thereby making it a good tool for On-Farm research data analyzes.
S K N A H T PRINCIPAL COMPONENT ANALYSIS (PCA) AS AN IDEAL TOOL FOR ANALYSING ON-FARM RESEARCH DATA by Odoligie IMARHIAGBE is licensed under a Creative Commons Attribution-Non. Commercial-Share. Alike 4. 0 International License.
- Slides: 30