Model cross-validation based on data-splitting approach

Questions about the implementation and application of the PLS-SEM method, that are not related to the usage of the SmartPLS software.
Post Reply
Lacra
PLS Junior User
Posts: 10
Joined: Fri Sep 28, 2012 3:21 pm
Real name and title: Radomir Lacramioara

Model cross-validation based on data-splitting approach

Post by Lacra »

Following the suggestion made by Hair et al. (2011), I intend to cross-validate my model by making use of a holdout sample. However, I do not know which data should be compared in order to be able to state that the results can be generalized. The only paper which I identified to make use of a holdout sample for the same purpose is the one written by Silvia Boßow-Thies and Soenke Albers in Handbook of PLS (chapter 25). However, I have some problems in understanding the data which the two authors report.

In Handbook of PLS, Chapter 25, Silvia Boßow-Thies and Soenke Albers state:

“For further cross-validation of the model, the data-splitting approach is applied as simultaneous methods like the Stone-Geisser approach are only applicable for mode A models (reflective constructs). The sample was randomly split into an estimation sample and a hold-out sample. According to the recommendations of Steckel and Vanhonacker (Steckel and Vanhonacker 1993), 75 % of the cases were used for the estimation sample, while 25% created the hold-out sample.” (p. 599)


The two authors further state:

“High correlations (r) between the calculated and observed values of the holdout sample (0.443–0.852) indicate a good predictive validity of the model and the generality of the results. The same is shown by the small difference between the calculated r2 and the R2 of the hold-out sample.” (p. 599)

My questions are:

1. What do the authors mean by calculated and observed values of the holdout sample? Do they consider the scores obtained for the endogenous LVs in PLS both for the estimation sample and for the holdout sample and than calculate the correlations between the corresponding LVs (i.e. sales strategy for sample 1 correlation with sales strategy for sample 2)? And if so, how do they approach this since the holdout sample is smaller and would mean to have missing values when compared to the estimation sample, in SPSS, for example? Or do you think that they use the Split dataset function in SPSS. And if so, is it possible to obtain the correlation between the same variable for the two groups.
2. What do the authors mean by calculated r2 and the R2 of the holdout sample? Do they simply inspect the difference in the R square for each corresponding endogenous LV in the estimation and holdout sample?


Hair, Joseph, F., Ringle, Christian, M., & Sarstedt, M. (2011). “PLS-SEM: Indeed a Silver Bullet”, Journal of Marketing Theory and Practice, Vol. 19, No. 2, pp. 139-151.

Boßow-Thies, S. & Albers, S. (2010). „Application of PLS in Marketing: Content Strategies on the Internet”. În Vinzi, Esposito, V., Chin, Wynne, W., Henseler, J. & Wang, H. (Eds.), Handbook of Partial Least Squares: Concepts, Methods and Applications, (pp. 589-604), Springer, Berlin.
Post Reply