International Journal of Advanced and Applied Sciences

Int. j. adv. appl. sci.

EISSN: 2313-3724

Print ISSN:2313-626X

Volume 3, Issue 8  (August 2016), Pages:  7-13


Title: Comparison of PLSR and PCR techniques in terms of dimension reduction: an application on internal migration data in Turkey

Authors:  Hatice Samkar *, Gamze Guven

Affiliations:

Department of Statistics, Faculty of Arts and Sciences, Eskisehir Osmangazi University,Campus of Meselik, 26480 Eskisehir,Turkey

http://dx.doi.org/10.21833/ijaas.2016.08.002

Full Text - PDF          XML

Abstract:

Partial Least Squares Regression (PLSR) and Principle Component Regression (PCR) are dimension reduction techniques especially used in the presence of multicollinearity. In this study, these two techniques are described and their performance is compared in terms of dimension reduction. Root Mean Square Error of Cross Validation (RMSECV) is used as comparison criteria. PLSR and PCR techniques are applied on internal migration data in Turkey and it is found that PLSR technique is superior to PCR in terms of dimension reduction. 

© 2016 The Authors. Published by IASE.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Keywords: Multicollinearity, dimension reduction, PLSR, PCR and RMSECV

Article History: Received 29 June 2016, Received in revised form 7 August 2016, Accepted 7 August 2016

Digital Object Identifier: http://dx.doi.org/10.21833/ijaas.2016.08.002

Citation:

Samkar H, Guven G (2016). Comparison of PLSR and PCR techniques in terms of dimension reduction: an application on internal migration data in Turkey. International Journal of Advanced and Applied Sciences, 3(8): 7-13

http://www.science-gate.com/IJAAS/V3I8/Samkar.html


References:

Allen DM (1974). The relationship between variable selection and data agumentation and a method for prediction. Technometrics, 16(1): 125-127.
http://dx.doi.org/10.1080/00401706.1974.10489157
Arlot S and Celisse A (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4: 40-79.
http://dx.doi.org/10.1214/09-SS054
Bodzioch K, Bączek T, Kaliszan R and Vander HY (2009). The molecular descriptor logSum AA and its alternatives in QSRR models to predict the retention of peptides. Journal of Pharmaceutical and Biomedical Analysis, 50(4): 563-569.
http://dx.doi.org/10.1016/j.jpba.2008.09.004
PMid:18929455
Chatterjee S and Hadi AS (2015). Regression analysis by example. John Wiley & Sons. New Jersey, USA: 389.
D'Ambra A and Sarnacchiaro P (2010). Some data reduction methods to analyze the dependence with highly collinear variables: A simulation study. Asian Journal Math Statistics, 3(2): 69-81.
http://dx.doi.org/10.3923/ajms.2010.69.81
Diaz TG, Guiberteau A, Burguillos JO and Salinas F (1997). Comparison of chemometric methods: derivative ratio spectra and multivariate methods (CLS, PCR and PLS) for the resolution of ternary mixtures of the pesticides carbofuran carbaryl and phenamifos after their extraction into chloroform. Analyst, 122(6): 513-517.
http://dx.doi.org/10.1039/a607955e
Fekedulegn BD, Colbert JJ, RR Jr J and Schuckers ME (2002). Coping with multicollinearity: An example on application of principal components regression in dendroecology. Newton Square, PA: US. Department of Agriculture, Forest Service, Northeastern Research Station: 43.
Frank LE and Friedman JH (1993). A statistical view of some chemometrics regression tools. Technometrics, 35(2): 109-135.
http://dx.doi.org/10.1080/00401706.1993.10485033
http://dx.doi.org/10.2307/1269659
Garthwaite PH (1994). An interpretation of partial least squares. Journal of the American Statistical Association, 89(425): 122-127.
http://dx.doi.org/10.1080/01621459.1994.10476452
Geladi P and Kowalski BR (1986). Partial least-squares regression: a tutorial. Analytica Chimica Acta, 185: 1-17.
http://dx.doi.org/10.1016/0003-2670(86)80028-9
Guiteras J, Beltran JL and Ferrer R (1998). Quantitative multicomponent analysis of polycyclic aromatic hydrocarbons in water samples. Analytica Chimica Acta, 361(3): 233-240.
http://dx.doi.org/10.1016/S0003-2670(98)00014-2
Gujarati DN (2003). Basic econometrics. McGraw Hill, New York, USA.
He G, Sentell T and Schillinger D (2010). A new public health tool for risk assessment of abnormal glucose levels. Preventing Chronic Disease, 7(2): 1-9.
Helland IS (1988). On the structure of partial least squares regression. Communications in Statistics-Simulation and Computation, 17(2): 581-607.
http://dx.doi.org/10.1080/03610918808812681
Khajehsharifi H, Pourbasheer E, Tavallali H, Sarvi S and Sadeghi M (2014). The comparison of partial least squares and principal component regression in simultaneous spectrophotometric determination of ascorbic acid, dopamine and uric acid in real samples. Arabian Journal of Chemistry, DOI:10.1016/j.arabjc.2014.02.006
http://dx.doi.org/10.1016/j.arabjc.2014.02.006
Li Y (2010). A Comparison Study of Principle Component Regression, Partial Least Square Regression and Ridge Regression with Application to FTIR Data. M.Sc Thesis in statistics, Faculty of Social Sciences, Uppsala University, Sweden.
Luinge HJ, Hop E, Lutz ETG, Van Hemert JA and De Jong EAM (1993). Determination of the fat, protein and lactose content of milk using Fourier transform infrared spectrometry. Analytica Chimica Acta, 284(2): 419-433.
http://dx.doi.org/10.1016/0003-2670(93)85328-H
Mahesh S, Jayas DS, Paliwal J and White NDG (2015). Comparison of partial least squares regression (PLSR) and principal components regression (PCR) methods for protein and hardness predictions using the near-infrared (NIR) hyperspectral images of bulk samples of Canadian wheat. Food and Bioprocess Technology, 8(1): 31-40.
http://dx.doi.org/10.1007/s11947-014-1381-z
Massy WF (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60(309): 234-256.
http://dx.doi.org/10.1080/01621459.1965.10480787
Montgomery DC, Peck EA and Vining GG (2001). Introduction to linear regression analysis. John Wiley & Sons. USA: 641.
Naes T and Martens H (1985). Comparison of prediction methods for multicollinear data. Communications in Statistics-Simulation and Computation, 14(3): 545-576.
http://dx.doi.org/10.1080/03610918508812458
Naes T and Mevik BH (2001). Understanding the collinearity problem in regression and discriminant analysis. Journal of Chemometrics, 15(4): 413-426.
http://dx.doi.org/10.1002/cem.676
Ni Y and Gong X (1997). Simultaneous spectrophotometric determination of mixtures of food colorants. Analytica Chimica Acta, 354(1): 163-171.
http://dx.doi.org/10.1016/S0003-2670(97)00297-3
Rawlings JO, Pantula SG and Dickey DA (1998). Applied regression analysis: a research tool. Springer Science & Business Media. New York, USA: 650
http://dx.doi.org/10.1007/b98890
Rosipal R and Krämer N (2006). Overview and recent advances in partial least squares. In C Saunders, M Grobelnik, S Gunn, and J Shawe-Taylor (Eds.), Subspace, latent structure and feature selection. Volume 3940 of the series Lecture Notes in Computer Science: 34-51. Springer Berlin Heidelberg.
http://dx.doi.org/10.1007/11752790_2
TSI (2011). İç göç istatistikleri, İllerin Aldığı Göç, Verdiği Göç, Net Göç ve Net Göç Hızı. Retrived June 27, 2016 online available http://www.tuik.gov.tr.
Vigneau E, Bertrand D and Qannari EM (1996). Application of latent root regression for calibration in near-infrared spectroscopy. Comparison with principal component regression and partial least squares. Chemometrics and Intelligent Laboratory Systems, 35(2): 231-238.
http://dx.doi.org/10.1016/S0169-7439(96)00051-2
Vigneau E, Devaux MF, Qannari EM and Robert P (1997). Principal component regression, ridge regression and ridge principal component regression in spectroscopy calibration. Journal of Chemometrics, 11(3): 239-249.
http://dx.doi.org/10.1002/(SICI)1099-128X(199705)11:3<239::AID-CEM470>3.0.CO;2-A
Wentzell PD and Montoto LV (2003). Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures. Chemometrics and Intelligent Laboratory Systems, 65(2): 257-279.
http://dx.doi.org/10.1016/S0169-7439(02)00138-7
Wold S, Sjöström M and Eriksson L (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratorys, 58(2): 109-130.
http://dx.doi.org/10.1016/S0169-7439(01)00155-1
Yaroshchyk P, Death DL and Spencer SJ (2012). Comparison of principal components regression, partial least squares regression, multi-block partial least squares regression, and serial partial least squares regression algorithms for the analysis of Fe in iron ore using LIBS. Journal of Analytical Atomic Spectrometry, 27(1): 92-98.
http://dx.doi.org/10.1039/C1JA10164A
Yeniay O and Goktas A (2002). A comparison of partial least squares regression with other prediction methods. Hacettepe Journal of Mathematics and Statistics, 31(99): 99-101.
Zeng XQ, Li GZ, Wu GF and Zou HX (2007). On the number of partial least squares components in dimension reduction for tumor classification. In T. Washio et al. (Eds.), Emerging Technologies in Knowledge Discovery and Data Mining. Volume 4819 of the series Lecture Notes in Computer Science: 206-217. Springer Berlin Heidelberg.
http://dx.doi.org/10.1007/978-3-540-77018-3_22
Ziegel ER (2004). A User-Friendly Guide to Multivariate Calibration and Classification. Technometrics, 46(1):108-110.
http://dx.doi.org/10.1198/004017004000000167