Dear Forum,
I need a little help in interpreting FIMIX results in conjunction with a multigroup analysis (MGA) that has been done independently from FIMIX. The model is based on a total of 270 evaluable answers.
Based on control considerations, a MGA was already performed which revealed significant differences (significant with p< 0,01 and p <0.05) for four relationships between variables (controls were not considered as they are included as single items in the model). The requirements for a MGA are fulfilled (based on MICOM, step 2, not step 3).
The FIMIX analysis was performed for the model shown below (i.e., including control variables and moderators). The results from the FIMIX analysis are shown below:
My questions to the forum are:
1) Is it correct that it is advisable for a MGA that no control variables are considered as a single indicator?
2) Is it correct that the FIMIX data indicates a solution between 1 and a maximum of 3 segments (independent of the EN value for a 4segment solution)?
3) Is it correct that there is no significant problem of unobstructed heterogeneity, as the CAIC result indicates one segment and the case numbers for a 2 or 3 segment solution (50 cases for 2 segments; 85 and 32 cases for 3 segments) for further examination is too small. Or would you limit the analysis to the first two segments in the 3segment solution (with 153 and 85 cases)?
4) I would have expected that the results from the MGA are more or less reflected in the FIMIX analysis. The number of cases for the MGA is 145 (group 1) and 125 (group 2). What explanation could be made that this combination is not apparent?
I would be very happy about your feedback!
Best regards
Holger
FIMIX  Assessment of heterogenity level
FIMIX  Assessment of heterogenity level
 Attachments

 Model.PNG (64.34 KiB) Viewed 10551 times

 FIMIXResults.PNG (22.06 KiB) Viewed 10551 times

 SmartPLS Developer
 Posts: 1094
 Joined: Tue Mar 28, 2006 11:09 am
 Real name and title: Dr. JanMichael Becker
Re: FIMIX  Assessment of heterogenity level
Dear Holger,
1) In my opinion, excluding control variables in the MGA is generally a not a good idea. You want to estimate the same model as initially proposed including all controls if they are deemed important. The only exception is the grouping variable itself. If, for example, gender is a control in your original model and you also use it as grouping variable then you need to exclude it in the MGA (because you already control for it by the grouping and it would thus have zero variance in each group).
2) Yes. The EN is by itself usually a bad indicator of the number of segments. AIC3 and CAIC usually indicate the range of meaningful solutions.
3) Well, it depends. I would have a slightly stronger preference for the 1 segment solution, because of the relatively small group sizes of the second and third segment. However, I would still try to investigate if the second and third segment in more detail: Are they plausible (by means of coefficients) and do they relate to some other observable behavior/characteristics? Because you might also have a dataset where you simply under sampled the other segments (maybe these are respondents that are harder to get in a survey, etc.). Thus, you want to understand the segmentation solution before discarding unobserved heterogeneity.
Sometimes the small segments are also outliers or faulty responses, for example, straigthliners or respondents with many missing values that constitute their shared behavior. You might then generally think of excluding these responses from your analysis.
4) Not necessarily. Unobserved heterogeneity might go beyond observed heterogeneity. Thus, the observed grouping might not be optimal and thus the segmentation finds other (more optimal solutions). Your grouping might represent some local optima out of the many different solutions that are possible in grouping the data. However, what you find with FIMIX is also always limited to its assumptions (i.e., multivariate normal response on the endogenous latent variables).
1) In my opinion, excluding control variables in the MGA is generally a not a good idea. You want to estimate the same model as initially proposed including all controls if they are deemed important. The only exception is the grouping variable itself. If, for example, gender is a control in your original model and you also use it as grouping variable then you need to exclude it in the MGA (because you already control for it by the grouping and it would thus have zero variance in each group).
2) Yes. The EN is by itself usually a bad indicator of the number of segments. AIC3 and CAIC usually indicate the range of meaningful solutions.
3) Well, it depends. I would have a slightly stronger preference for the 1 segment solution, because of the relatively small group sizes of the second and third segment. However, I would still try to investigate if the second and third segment in more detail: Are they plausible (by means of coefficients) and do they relate to some other observable behavior/characteristics? Because you might also have a dataset where you simply under sampled the other segments (maybe these are respondents that are harder to get in a survey, etc.). Thus, you want to understand the segmentation solution before discarding unobserved heterogeneity.
Sometimes the small segments are also outliers or faulty responses, for example, straigthliners or respondents with many missing values that constitute their shared behavior. You might then generally think of excluding these responses from your analysis.
4) Not necessarily. Unobserved heterogeneity might go beyond observed heterogeneity. Thus, the observed grouping might not be optimal and thus the segmentation finds other (more optimal solutions). Your grouping might represent some local optima out of the many different solutions that are possible in grouping the data. However, what you find with FIMIX is also always limited to its assumptions (i.e., multivariate normal response on the endogenous latent variables).
Dr. JanMichael Becker, University of Cologne, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de
Re: FIMIX  Assessment of heterogenity level
Dear Dr. Becker,
Thank you for your friendly and fast response!
Due to my previous (limited) experience with PLS models, I still have the following questions.
1) On the subject of MGA: In various models, I have found that control variables that are configured as a single indicator, more often lead to problems with a MGA. If the prerequisites for a MGA are present in the variables of the main model, can one conclude that an analysis of the main model seems possible, but that no conclusions can be drawn about the relevance of control variables? It would therefore be a limitation of the study, similar to studies that do not use control variables at all. Or would you exclude an MGA due to incomplete fulfillment of the MICOM requirements?
3) On the subject of FIMIX: Is it correct that the results of the FIMIX procedure vary due to methodological issues? If I try to reconstruct the results from the FIMIX procedure, I find differences in the number of possible segments and in segment sizes. Even an exact copy of the path model leads to results that are not comparable to the copied model (the recommendations of Hair et al. 2018 for repetition are followed). Can you explain why the results can be different and how they should be handled in reporting?
4) Related to observed and unobserved heterogenity: Hair et al. 2018, page 177 describes the aim of the FIMIX procedure as follows: "The aim of FIMIX_PLS is to disentangle the overall mixture distribution and estimate parameters (eg the path coefficients) of each group in a regression framework. " Is my interpretation correct, that the FIMIX procedure searches for the optimum of regression coefficients in complete path models (segments) and thus does not necessarily reveal individual, significant differences? In other words, because the goal of the FIMIX procedure is to optimize all regression coefficients in segments, individual differences which can be significant are not necessarily taken into account.
I really appreciate the work in this forum and look forward for feedback!
Best regards
Holger
Thank you for your friendly and fast response!
Due to my previous (limited) experience with PLS models, I still have the following questions.
1) On the subject of MGA: In various models, I have found that control variables that are configured as a single indicator, more often lead to problems with a MGA. If the prerequisites for a MGA are present in the variables of the main model, can one conclude that an analysis of the main model seems possible, but that no conclusions can be drawn about the relevance of control variables? It would therefore be a limitation of the study, similar to studies that do not use control variables at all. Or would you exclude an MGA due to incomplete fulfillment of the MICOM requirements?
3) On the subject of FIMIX: Is it correct that the results of the FIMIX procedure vary due to methodological issues? If I try to reconstruct the results from the FIMIX procedure, I find differences in the number of possible segments and in segment sizes. Even an exact copy of the path model leads to results that are not comparable to the copied model (the recommendations of Hair et al. 2018 for repetition are followed). Can you explain why the results can be different and how they should be handled in reporting?
4) Related to observed and unobserved heterogenity: Hair et al. 2018, page 177 describes the aim of the FIMIX procedure as follows: "The aim of FIMIX_PLS is to disentangle the overall mixture distribution and estimate parameters (eg the path coefficients) of each group in a regression framework. " Is my interpretation correct, that the FIMIX procedure searches for the optimum of regression coefficients in complete path models (segments) and thus does not necessarily reveal individual, significant differences? In other words, because the goal of the FIMIX procedure is to optimize all regression coefficients in segments, individual differences which can be significant are not necessarily taken into account.
I really appreciate the work in this forum and look forward for feedback!
Best regards
Holger

 SmartPLS Developer
 Posts: 1094
 Joined: Tue Mar 28, 2006 11:09 am
 Real name and title: Dr. JanMichael Becker
Re: FIMIX  Assessment of heterogenity level
1) You first say singleindicators lead to problems with MGA (which particular problems??). You later mention problems with MICOM. Are these the problems you earlier refer to?
MICOM doesn't really make sense for singleindicator variables. Step 2 should always be fulfilled because you actually do not have a measurement model (the results sometimes tell a different story, but then simply ignore them / do not report step 2 for singleindicators). Step 3 is basically a test of group mean differences in the singleindicators. If you find differences then it is even more important to include controls in each group because there are observed differences for this control (for example differences in the gender distribution).
Again, if you think that the controls are important then you also want to include them into the MGA. Especially as you have them available. The only reason the exclude important controls is because you were not able to collect them. Or because you have good reasons to believe that they are not important.
3) FIMIX always starts from a new random partition of the data. Thus, each time you start FIMIX you will start from different starting points and improve to a (local) solution. Ideally, and if you have enough repetitions you should be able to find the global optimum among all these different solutions which is stable and replicable over consecutive runs. Otherwise you are only finding local optima which might slightly differ.
4) Generally, yes. The assumption is that subpopulations generally affect the whole model and not individual paths. And hence, it optimizes for the whole model and not for individual path differences. Although individual path differences are of courses included in the process but harder to identify. In an ideal case that should not matter much, but with limited sample sizes and information it will make a difference.
Consider the following model: A>C and B>C and you have 4 groups in reality:
1) A>C: 0.2; B>C: 0.2
2) A>C: 0.6; B>C: 0.2
3) A>C: 0.2; B>C: 0.6
4) A>C: 0.6; B>C: 0.6
Ideally you should find the 4 groups with FIMIX. But you may also only find two groups: either 1&2 vs. 3&4 (difference on B>C) or 1&3 vs. 2&4 (difference on A>C). In both cases the other effect would not be (significantly) different.
These two solutions are local optima for FIMIX and are especially likely if you do not have enough sample size for the 4 groups. Of course, in more complex models there may be even more possible solutions and if a segment has only a small difference in one path then it will be hard to identify.
MICOM doesn't really make sense for singleindicator variables. Step 2 should always be fulfilled because you actually do not have a measurement model (the results sometimes tell a different story, but then simply ignore them / do not report step 2 for singleindicators). Step 3 is basically a test of group mean differences in the singleindicators. If you find differences then it is even more important to include controls in each group because there are observed differences for this control (for example differences in the gender distribution).
Again, if you think that the controls are important then you also want to include them into the MGA. Especially as you have them available. The only reason the exclude important controls is because you were not able to collect them. Or because you have good reasons to believe that they are not important.
3) FIMIX always starts from a new random partition of the data. Thus, each time you start FIMIX you will start from different starting points and improve to a (local) solution. Ideally, and if you have enough repetitions you should be able to find the global optimum among all these different solutions which is stable and replicable over consecutive runs. Otherwise you are only finding local optima which might slightly differ.
4) Generally, yes. The assumption is that subpopulations generally affect the whole model and not individual paths. And hence, it optimizes for the whole model and not for individual path differences. Although individual path differences are of courses included in the process but harder to identify. In an ideal case that should not matter much, but with limited sample sizes and information it will make a difference.
Consider the following model: A>C and B>C and you have 4 groups in reality:
1) A>C: 0.2; B>C: 0.2
2) A>C: 0.6; B>C: 0.2
3) A>C: 0.2; B>C: 0.6
4) A>C: 0.6; B>C: 0.6
Ideally you should find the 4 groups with FIMIX. But you may also only find two groups: either 1&2 vs. 3&4 (difference on B>C) or 1&3 vs. 2&4 (difference on A>C). In both cases the other effect would not be (significantly) different.
These two solutions are local optima for FIMIX and are especially likely if you do not have enough sample size for the 4 groups. Of course, in more complex models there may be even more possible solutions and if a segment has only a small difference in one path then it will be hard to identify.
Dr. JanMichael Becker, University of Cologne, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de

 PLS Junior User
 Posts: 1
 Joined: Fri Jan 22, 2016 3:04 pm
 Real name and title: Hassan Mohamed
Re: FIMIX  Assessment of heterogenity level
Dear All,
I hope you are safe and well.
Recently I have done the FIMIX in order to check the unobserved heterogeneity as the prior ones reveals very few significant differences among all possible segments using MGAPLS.
Hence, for the sample size (327 after case wise deletion) consideration, K can not exceed 6. The results of the fit indices are following:
In this regard, according to the AIC, AIC3, BIC, CAIC, and MDL5, the appropriate number of segments is between 25.
My questions are:
1) at the "Lucy M Matthews Marko Sarstedt Joe Hair Christian M Ringle , (2016),"Identifying and treating unobserved heterogeneity
with FIMIXPLS: part II # a case study", European Business Review" p. 211, the authors mentioned that the minimums sample size per segment is 54 based on 5% significance level, R squared of 0.25, and max number of indicators of 8. However, at the Primer PLS book (Hair et al., 2014) p. 21, I found the minimum sample size with the same determinants should be 84 not 54. and the 54 is for R squared that is higher than 0.5. As I have the same determinants, so what should be the minimum sample size 84 or 54?
2) the minimum sample size per segment, that will be used for the number of segments selection, should be specified for the total sample (426 in my case) or after the case wise deletion (327 in my case)?
3) when selecting the number of segments based on the indices, I have 25 segments. Based on sample size consideration, I have to ignore the 4 and 5 segments solutions as they have segments less than 54 observations. Hence, I have 2 and 3 segments solutions? Am I right or wrong?
4) when comparing between the 2 and 3 segments solutions, I found the 3 segment solution is more practical as the sample in the 2 segment solutions is not well balanced 84.1% which will give biased results. Am I right or wrong?
5) In both solutions, I could not match the membership with any of the explanatory variables. So, as mentioned int he above mentioned application article, there is another way to combine more than explanatory variables to explain the membership. Do you know how can I do this?
6) When I apply the Confirmatory Composite Analysis for each segment of the 2 or 3 segment solutions, and then following it by MGA, I found non significant differences between the segments at the same solution, either with case wise deletion or mean replacement. So, where is the heterogeneity in my data. Thus, should I have to consider the whole data set and prove that the FIMIX does not give sig differences among the segments.
Thanks for your patience.
I hope you are safe and well.
Recently I have done the FIMIX in order to check the unobserved heterogeneity as the prior ones reveals very few significant differences among all possible segments using MGAPLS.
Hence, for the sample size (327 after case wise deletion) consideration, K can not exceed 6. The results of the fit indices are following:
In this regard, according to the AIC, AIC3, BIC, CAIC, and MDL5, the appropriate number of segments is between 25.
My questions are:
1) at the "Lucy M Matthews Marko Sarstedt Joe Hair Christian M Ringle , (2016),"Identifying and treating unobserved heterogeneity
with FIMIXPLS: part II # a case study", European Business Review" p. 211, the authors mentioned that the minimums sample size per segment is 54 based on 5% significance level, R squared of 0.25, and max number of indicators of 8. However, at the Primer PLS book (Hair et al., 2014) p. 21, I found the minimum sample size with the same determinants should be 84 not 54. and the 54 is for R squared that is higher than 0.5. As I have the same determinants, so what should be the minimum sample size 84 or 54?
2) the minimum sample size per segment, that will be used for the number of segments selection, should be specified for the total sample (426 in my case) or after the case wise deletion (327 in my case)?
3) when selecting the number of segments based on the indices, I have 25 segments. Based on sample size consideration, I have to ignore the 4 and 5 segments solutions as they have segments less than 54 observations. Hence, I have 2 and 3 segments solutions? Am I right or wrong?
4) when comparing between the 2 and 3 segments solutions, I found the 3 segment solution is more practical as the sample in the 2 segment solutions is not well balanced 84.1% which will give biased results. Am I right or wrong?
5) In both solutions, I could not match the membership with any of the explanatory variables. So, as mentioned int he above mentioned application article, there is another way to combine more than explanatory variables to explain the membership. Do you know how can I do this?
6) When I apply the Confirmatory Composite Analysis for each segment of the 2 or 3 segment solutions, and then following it by MGA, I found non significant differences between the segments at the same solution, either with case wise deletion or mean replacement. So, where is the heterogeneity in my data. Thus, should I have to consider the whole data set and prove that the FIMIX does not give sig differences among the segments.
Thanks for your patience.