Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

The main purpose of PCA is to reduce the number of variables into a smaller number of dimensions (factors) and to classify variables and clusters of observations with similar characteristics with respect to these factors. Interpretation of the PC is based on finding which variables are most strongly correlated with each component, (i.e., which of these numbers are large). Interpretation of the PCA results is based on the three scenarios. 

Scenario 1

The first scenario is composed of 14 variables as shown in table 5.14, and thus the sum of all Eigenvalues is equal to 12. The number of factors was chosen in accordance to Kaiser’s criterion and Cattell’s scree test. The scree plot in figure 5.7 indicates that the point where the continuous drop in Eigenvalues levels off is at factor 3. Therefore, three factors were chosen for analysis with cumulative variance of 77.47%. The remaining Eigenvalues each account for less than 25% of the total variance. The first principal component (first factor) corresponds to the largest eigenvalue (7.04) and accounts for approximately 50.32% of the total variance. It is most correlated with water consumption, number of females, household size, first and the third age categories (AG1 and AG3), three education levels (primary, high school and university level), monthly income and car numbers (negative correlation). This first principal component increases with decreasing in the ten mentioned variables. This suggests that all the criteria vary together. If one increases, then the remaining ones tend to increase as well. The second factor corresponding to the second eigenvalue (2.37) accounts for 16.88% of the total variance. It is correlated with number of males, the second age category (AG2) and medium school for education level (positive correlation). This second PC increases with increasing in the three variables. The third factor corresponding to the eigenvalue 1.44 accounts for 10.27%. It is correlated with the fourth age category (positive correlation). The third PC increases with increasing in this variable. Results show that water consumption is very strongly correlated with the first factor. To complete the analysis, correlation circle (or variables chart) shows the correlations between the components and the initial variables. Figures 5.8 displays coordinates for the three factors. The current analysis is based on correlations, the largest factor coordinate (variable-factor correlation) that can occur is equal to 1, and also, the sum of all squared factor coordinates for a variable (squared correlations between the variables and all factors) cannot exceed 1. Based on the magnitude of the factors coordinates (variable-factor correlations) for the variables in the analysis, the first factor can be labeled as “household water consumption determinants”. Second factor can be labeled as “household male teenagers” and the third factor can be labeled as “old residents”. Figure 5.9 shows the factor coordinates for all houses. 

LIRE AUSSI :  Cours séries temporelles analyse des indices

Scenario 2

The scree plot in figure 5.10 indicates that the point where the continuous drop in Eigenvalues levels off is at factor 2. Therefore, two factors were selected for analysis with cumulative variance of 97.59% (table 5.16). Table 5.16: Eigenvalues of correlation matrix, and related statistics-Scenario2 Eigenvalues of correlation matrix, and related statistics-Scenario2 Value number Eigenvalue % Total variance Cumulative eigenvalue Cumulative % 1 3,841056 64,01760 3,841056 64,0176 2 1,366578 22,77630 5,207634 86,7939 3 0,648320 10,80533 5,855954 97,5992 4 0,098915 1,64858 5,954869 99,2478 5 0,045131 0,75218 6,000000 100,0000 Chapter 05 : Results and Discussion Part I 105 Fig 5.10 Eigenvalues of correlation matrix-Scenario2 Table 5.17 and figure 5.11 present variances of factors and their loadings from variables. The first factor corresponds to the largest eigenvalue (3.84) and accounts for 64.08% of the total variance. It is most correlated with water consumption, the total area of the house, building area and number of rooms (positive correlation). The first factor labeled as “household water consumption determinants”. The second factor corresponding to the eigenvalue (1.37) and accounts for 22.78% for the total variance. It is correlated with garden area and frequency of garden watering (positive correlation) and can be labeled as “garden area”. Figure 5.12 presents the factor coordinates for all houses.

Scenario 3

From the eigenvalues of correlation matrix of the scenario3 (Table 5.18) and the scree plot (Figure 5.13), three factors were chosen for analysis with a variance of 60.90%.  

Formation et coursTélécharger le document complet

Télécharger aussi :

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *