My main dependent measure has a zero-inflated distribution. It's not count data- it comes from a questionnaire with a scale from 0-10, but it so happens that most people tend to score 0 on it.
I originally thought of using SEM-PLS because the bootstrapping is robust against deviations of normality, however, I’m aware that it still relies on correlations underneath so perhaps this doesn’t solve the problem (e.g. if the relationship is non-linear).
My sample size is quite large (900+) and there is variance in the measure – but it definitely has a zero-inflated highly skewed distribution.
I am finding strong relationships between my DV and IVs, and they are theoretically sound (e.g. it correlates with the things you’d expect, but not the things you wouldn’t).
So I guess my question is – how much of a problem is this? Will I run into problems when I try to publish the results?
Any advice very much appreciated.
zero-inflated (very skewed) data
-
- SmartPLS Developer
- Posts: 1284
- Joined: Tue Mar 28, 2006 11:09 am
- Real name and title: Dr. Jan-Michael Becker
Re: zero-inflated (very skewed) data
"Will I run into problems when I try to publish the results?"
That is hard to answer. It depends on so many things most importantly the knowledge of the reviewer and how honestly you report these things.
I think you have generally described the dilemma quite well. The estimation of the underlying correlations and regressions might be effected by the high skewness and hence you might encounter strange coefficients or simply bias in coefficients. On the other hand the inference is bootstrapping based and therefore, relatively robust against deviations from normality.
Given that your findings are plausible you may try to proceed. You may also use the estimated latent variable score for all the variables and then regress them onto your dependent measure in a separate regression that accounts for the zero-inflation in your measure.
If you measure is a single item construct you might also try some transformation of the variable, like a log transformation that could make your variable appear more normal. Of course it changes the interpretation of coefficients, but that should not be a general problem. However, that does not work well if that variable is part of a construct with multiple items.
That is hard to answer. It depends on so many things most importantly the knowledge of the reviewer and how honestly you report these things.
I think you have generally described the dilemma quite well. The estimation of the underlying correlations and regressions might be effected by the high skewness and hence you might encounter strange coefficients or simply bias in coefficients. On the other hand the inference is bootstrapping based and therefore, relatively robust against deviations from normality.
Given that your findings are plausible you may try to proceed. You may also use the estimated latent variable score for all the variables and then regress them onto your dependent measure in a separate regression that accounts for the zero-inflation in your measure.
If you measure is a single item construct you might also try some transformation of the variable, like a log transformation that could make your variable appear more normal. Of course it changes the interpretation of coefficients, but that should not be a general problem. However, that does not work well if that variable is part of a construct with multiple items.
Dr. Jan-Michael Becker, BI Norwegian Business School, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de