About Outliers

Research topics can be discussed in this area.
Post Reply
PLS Junior User
Posts: 4
Joined: Wed Dec 02, 2015 12:23 pm
Real name and title: Thomas Mak

About Outliers

Post by thomasmakz@gmail.com » Wed Dec 02, 2015 12:46 pm

Dear all,

I have a a question. My primary data has 472 observations after removal of straight lining, and I found there were more than 30 outliers (univariate outlier, manifest variable). I do not think removal of these outlier is a wise solution. Shall I keep it as a subgroup and run a multi group analysis? Or other treatment I can do?
Thank you all.

PLS Junior User
Posts: 4
Joined: Wed Dec 02, 2015 12:23 pm
Real name and title: Thomas Mak

Re: About Outliers

Post by thomasmakz@gmail.com » Wed Dec 02, 2015 3:20 pm

I have run PLS, here is my findings
Fullset (include outlier)
Construct A: AVE =0.697 ; CR = 0.920 , Cronbach's Alpha = 0.891
Construct B: AVE = 0.756; CR = 0.925, Cronbach's Alpha = 0.892
Construct C:AVE = 0.691; CR = 0.870, Cronbach's Alpha = 0.779
Construct D:AVE = 0.887; CR = 0.959, Cronbach's Alpha = 0.936
Path: A-->D: 0.456 (p<0.001)
Path B--> D: 0.204 (p<0.001)
Path C-->D: 0.132 (ns)
D adjusted R-square: 0.5, Q-square: 0.441

Dataset without outlier (totally 30 outlier out of 481 data)
Construct A: AVE =0.736 ; CR = 0.933 , Cronbach's Alpha = 0.910
Construct B: AVE = 0.776; CR = 0.933, Cronbach's Alpha = 0.904
Construct C:AVE = 0.733 CR = 0.892, Cronbach's Alpha = 0.819
Construct D:AVE = 0.917; CR = 0.959, Cronbach's Alpha = 0.954
Path: A-->D: 0.381 (p<0.001)
Path B--> D: 0.236 (p<0.001)
Path C-->D: 0.189 (p<0.01)
D adjusted R-square: 0.514, Q-square: 0.441

Outlier Dataset
Construct A: AVE =0.495 ; CR = 0.826 , Cronbach's Alpha = 0.738
Construct B: AVE = 0.662; CR = 0.887, Cronbach's Alpha = 0.836
Construct C:AVE = 0.418; CR = 0.586, Cronbach's Alpha = 0.418
Construct D:AVE = 0.732; CR = 0.891, Cronbach's Alpha = 0.817
Path: A-->D: 0.673 (p<0.001)
Path B--> D: 0.117 (ns)
Path C-->D: -0.008 (ns)
D adjusted R-square: 0.552, Q-square: 0.441

However, I run the muligroup analysis:
A-->D, path mean diff = 0.292 (ns)
B-->D, path mean diff = 0.119 (ns)
C-->D, path mean diff = 0.197 (ns)

So, shall i drop the 30 outliers for better reliability and validity as well as R-square? However, it has no sig difference in path coefficient...please help.

SmartPLS Developer
Posts: 1110
Joined: Tue Mar 28, 2006 11:09 am
Real name and title: Dr. Jan-Michael Becker

Re: About Outliers

Post by jmbecker » Wed Dec 02, 2015 5:14 pm

I would always be careful with excluding outliers. You need to explain, why you think that the outliers are not meaningful and not just extreme cases of the population. Just because there are some very unsatisfied customers in a customer satisfaction survey does not make the worth excluding. You would lose valuable information.
However, given the lower reliability of your measures it seems that these respondents have different response styles and interpret the measurement items differently (or act randomly?). You really need to investigate the outliers to judge their situation.
Dr. Jan-Michael Becker, University of Cologne, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de

User avatar
PLS Super-Expert
Posts: 1618
Joined: Sun Apr 24, 2011 10:13 am
Real name and title: Hengky Latan
Location: AMQ, Indonesia

Re: About Outliers

Post by Hengkov » Fri Dec 11, 2015 9:21 am


In some situations, outliers can not be excluded. It would not describe the real situation there. If the presence of outlier results remain good, why should be removed?
With only pursue the fulfillment of assumptions and so on, it's not a good reason to remove outliers or change it with winsorize also not a good option.


PLS Junior User
Posts: 1
Joined: Mon Mar 09, 2020 7:51 am
Real name and title: Matheus Ylatki

Re: About Outliers

Post by PLStudent » Mon Mar 09, 2020 8:15 am

Hello! I have some dilemma regarding cleaning or data preparation before conducting PLS-SEM analysis. I want to test model which contains 5 latent constructs all measured in a reflective way. All manifest variables are Likert type with a range of response from 1 (strongly disagree) till 9 (strongly agree). Obtained results show pretty low outer loading and few significant path coefficients with coefficients of determination 0.01, 0.06 and 0.144. My research is from social sciences field so it can be considered as acceptable. My main concern is outliers. Which methods to identify outliers are acceptable bearing in mind that all my manifest variables are Likert type? I examined data to observe cases where respondents answered with only one or two values (they did not make try to discriminate there answeres) or answered in some patterns (strait lining answers)...I also used SPSS and boxplots to identify some extreme values for each manifest item because I saw in some papers that Likert ordinal data can be treated as an interval scale. Also, some researches propose to use Median absolute deviation method to detect outliers. Please, could you give me a piece of advice on how to conduct in this outlier identification issue?

Post Reply