Treatment of outliers/extremes in PLS

Frequently asked questions about PLS path modeling.
Post Reply
HMO
PLS Junior User
Posts: 2
Joined: Mon Jan 02, 2006 8:03 am
Real name and title:

Treatment of outliers/extremes in PLS

Post by HMO »

Dear all,

one big advantage of PLS - in comparison with the covariance SEM - is that it requires no special distribution.

However, what is the best way to deal with outliers/extremes? A generally accepted statistical law is that you shouldn't base your results on outliers/extremes.

The background of my question is the following: I've one latent variable (formative) in my model. The indicators (ordinal scale) with the highest weights proved to be the ones which performed with very low medians in the descriptive analysis. An analysis with the stem-and-leaf plot in SPSS showed that exactly these indicators had many outliers and even extremes. Thus, the possibility exists that the weights are based on these outliers/extremes.

What would you recommend? Substitute the outliers/extremes with missing values or with the median? Or ignore completely the indicators which have outliers/extremes -- I personally would deny the last, as a formative construct is concerned and the deletion of indicators is always connected with a loss of content validity.

Thanks for your answers!

Heike Moses
derfuss
PLS Junior User
Posts: 6
Joined: Fri Jan 27, 2006 5:09 pm
Real name and title:

Post by derfuss »

Hi Heike,

you report many outliers for several indicators. So, the question would be, how many are they in comparison to your sample size? Are they really outliers or some characteristic feature of the respondents in question? If this where the case, treating the outliers in any way would mean loosing potentially valuable information.

A second question comming to my mind: what kind of scale are you using, how many points does it have? Thus, how far out is an outlier?

Just my quick thoughts.

Greetings
Klaus
HMO
PLS Junior User
Posts: 2
Joined: Mon Jan 02, 2006 8:03 am
Real name and title:

Post by HMO »

Dear Klaus, dear all,

thanks for your answer.

The problem arises only within one of the 13 latent variables. This formative latent variable consists of 9 items, which are all measured on an ordinal-scale (5-point). Two of the 9 items show in 5 out of 325 cases extremes, measured with the boxplot-diagram in SPSS -- thus there are really outliers and no characteristics of the study. Yet these few extremes make that the item has a big weight for the latent variable.

In such a case, would it be justified to substitute these few extremes? And if yes, in which way?

Many thanks!

Heike
Post Reply