Page 1 of 1

Treatment of outliers/extremes in PLS

Posted: Wed Feb 08, 2006 11:38 am
by HMO
Dear all,

one big advantage of PLS - in comparison with the covariance SEM - is that it requires no special distribution.

However, what is the best way to deal with outliers/extremes? A generally accepted statistical law is that you shouldn't base your results on outliers/extremes.

The background of my question is the following: I've one latent variable (formative) in my model. The indicators (ordinal scale) with the highest weights proved to be the ones which performed with very low medians in the descriptive analysis. An analysis with the stem-and-leaf plot in SPSS showed that exactly these indicators had many outliers and even extremes. Thus, the possibility exists that the weights are based on these outliers/extremes.

What would you recommend? Substitute the outliers/extremes with missing values or with the median? Or ignore completely the indicators which have outliers/extremes -- I personally would deny the last, as a formative construct is concerned and the deletion of indicators is always connected with a loss of content validity.

Thanks for your answers!

Heike Moses

Posted: Fri Feb 10, 2006 9:08 am
by derfuss
Hi Heike,

you report many outliers for several indicators. So, the question would be, how many are they in comparison to your sample size? Are they really outliers or some characteristic feature of the respondents in question? If this where the case, treating the outliers in any way would mean loosing potentially valuable information.

A second question comming to my mind: what kind of scale are you using, how many points does it have? Thus, how far out is an outlier?

Just my quick thoughts.

Greetings
Klaus

Posted: Mon Feb 13, 2006 8:06 am
by HMO
Dear Klaus, dear all,

thanks for your answer.

The problem arises only within one of the 13 latent variables. This formative latent variable consists of 9 items, which are all measured on an ordinal-scale (5-point). Two of the 9 items show in 5 out of 325 cases extremes, measured with the boxplot-diagram in SPSS -- thus there are really outliers and no characteristics of the study. Yet these few extremes make that the item has a big weight for the latent variable.

In such a case, would it be justified to substitute these few extremes? And if yes, in which way?

Many thanks!

Heike