zero-inflated (very skewed) data

Questions about the implementation and application of the PLS-SEM method, that are not related to the usage of the SmartPLS software.
Post Reply
Powgie
PLS Junior User
Posts: 1
Joined: Wed Feb 13, 2019 2:57 pm
Real name and title: Dr Georgie Powell

zero-inflated (very skewed) data

Post by Powgie »

My main dependent measure has a zero-inflated distribution. It's not count data- it comes from a questionnaire with a scale from 0-10, but it so happens that most people tend to score 0 on it.

I originally thought of using SEM-PLS because the bootstrapping is robust against deviations of normality, however, I’m aware that it still relies on correlations underneath so perhaps this doesn’t solve the problem (e.g. if the relationship is non-linear).

My sample size is quite large (900+) and there is variance in the measure – but it definitely has a zero-inflated highly skewed distribution.

I am finding strong relationships between my DV and IVs, and they are theoretically sound (e.g. it correlates with the things you’d expect, but not the things you wouldn’t).

So I guess my question is – how much of a problem is this? Will I run into problems when I try to publish the results?

Any advice very much appreciated.
jmbecker
SmartPLS Developer
Posts: 1281
Joined: Tue Mar 28, 2006 11:09 am
Real name and title: Dr. Jan-Michael Becker

Re: zero-inflated (very skewed) data

Post by jmbecker »

"Will I run into problems when I try to publish the results?"
That is hard to answer. It depends on so many things most importantly the knowledge of the reviewer and how honestly you report these things.

I think you have generally described the dilemma quite well. The estimation of the underlying correlations and regressions might be effected by the high skewness and hence you might encounter strange coefficients or simply bias in coefficients. On the other hand the inference is bootstrapping based and therefore, relatively robust against deviations from normality.

Given that your findings are plausible you may try to proceed. You may also use the estimated latent variable score for all the variables and then regress them onto your dependent measure in a separate regression that accounts for the zero-inflation in your measure.

If you measure is a single item construct you might also try some transformation of the variable, like a log transformation that could make your variable appear more normal. Of course it changes the interpretation of coefficients, but that should not be a general problem. However, that does not work well if that variable is part of a construct with multiple items.
Dr. Jan-Michael Becker, BI Norwegian Business School, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de
Post Reply