Difference between Original sample and sample mean

Small talk on SmartPLS that does not correspond to the other forums!
Post Reply
Kata
PLS Junior User
Posts: 2
Joined: Mon Jan 14, 2019 2:14 pm
Real name and title: Catherine E Batt Phd student

Difference between Original sample and sample mean

Post by Kata » Mon Jan 14, 2019 2:21 pm

Hello,

I have a formative model with 1 IV (12 indicators), and 2 DV (6 and 7 indicators).
I run a bootstrap with 5.000 subsamples and the results I get are a little bit strange.

the Path coefficient between IV and DV1 is -0.501 for the original sample, but in the column Mean sample, the path coefficient is -0.079

Can someone explain why the difference is so huge between the original sample and the sample mean?

Kind regards.
K

jmbecker
SmartPLS Developer
Posts: 971
Joined: Tue Mar 28, 2006 11:09 am
Real name and title: Dr. Jan-Michael Becker

Re: Difference between Original sample and sample mean

Post by jmbecker » Fri Jan 18, 2019 9:05 am

The original sample estimate is the parameter from estimating the model on your original dataset as you would also get it from a normal PLS algorithm estimation.
The sample mean estimate is the average of the estimates from all the subsamples of you dataset drawn during the bootstrapping procedure.
If the two deviate strongly it is likely that there is a data problem in your sample or a model problem that causes large outliers in the sampling distribution of your parameter estimates (you may also want to investigate the histogram of the parameter estimates from the bootstrapping).
That can have multiple reasons: wrong coding of variables, severe multicollinearity, model problems (i.e., using PLSc although your model is not a common factor model), very small sample size, excessive missing values, wrong coded missing values, and many others to only name a few. You need to carefully trace the problem by looking more deeply into your model and data.
Dr. Jan-Michael Becker, University of Cologne, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Ja ... v=hdr_xprf
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de

Kata
PLS Junior User
Posts: 2
Joined: Mon Jan 14, 2019 2:14 pm
Real name and title: Catherine E Batt Phd student

Re: Difference between Original sample and sample mean

Post by Kata » Wed Feb 13, 2019 11:41 am

Thank you very much for your reply. I do not know if this can help but here is what we found based on what you suggested:

Wrong coding of variables: we looked again and again and everything seems coded in the correct direction.
Severe multicollinearity: there is no multicollinearity, all VIF are around 1,...
Model problems (i.e., using PLSc although your model is not a common factor model): we double checked and this was not the problem.
Very small sample size: our sample size is n=148. The latent variable with the largest number of arrows pointing at it has 12 arrows.
Excessive missing values: we have no missing values
Wrong coded missing values: n/a as we have no missing values

We studied the histograms from the bootstrapping. We found that the histograms from our DV are bimodal (with two picks). These are actually the paths with the large differences between the original sample and the mean sample.
Would that mean or be an indication of heterogeneity?

Best regards,
K

jmbecker
SmartPLS Developer
Posts: 971
Joined: Tue Mar 28, 2006 11:09 am
Real name and title: Dr. Jan-Michael Becker

Re: Difference between Original sample and sample mean

Post by jmbecker » Wed Feb 13, 2019 12:09 pm

It could be heterogeneity, but it like another problem that is known in the PLS literature. A bimodal distribution is likely the effect of a very weak (or zero) relationship while one of the variables has also no relations with other constructs in the model. Then the weight estimation procedure is not identified (empirically identified).
There are two things that you could do:
1) Use equally weighted sumscores instead of PLS weights (you can double click on the construct and select that manually) to fix the weights and there circumvent the identification problem.
2) Connect the LV with another variables where you expect a strong (or at least medium) correlation to identify the weight estimation.

In any case, you will likely find that the effect is close to zero and not significant.
Dr. Jan-Michael Becker, University of Cologne, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Ja ... v=hdr_xprf
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de

Post Reply