Low incidence vs missing data

This forum is closed, and read-only.
PLS Junior User
Posts: 1
Joined: Wed Jan 17, 2018 4:29 pm
Real name and title: Jean Azevedo - CX/ VoC Analytics

Low incidence vs missing data

Post by jeanazevedo »


I come from a regression background, which allow us to consider a data stream for which some independent variables are zero.

For instance, in a Market Mix model, if a price promotion, promotion, advertising, etc. does not happen, I just enter zero as the value of this variable. The data is not missing, it just happen that the activity did not take place.

Currently, I am evaluating client interactions by using surveys (scales are in a 5, 7 or 10-point scale). My final and target variable is "Overall Satisfaction" score, only using observed variables; there was no need to create latent constructs. The observed variables are the satisfaction scores of specific touch point interactions (call center, web center, etc.); each one with certain incidence. Not every one will have the same type of interactions.

My current issues with SmartPLS:

- Let's say I have 1,000 interactions with the call center, and 100 with the web site, but only 50 had interactions with both.
- To use both interactions in the same model, SmartPLS will consider that I have 950 missing variables for the people who had interaction with the call center only, and 50 for the people who interacted with the web
- Actually, the data is not missing, instead, the interaction did not take place - however, if I enter zero, SmartPLS will account for it as a valid number, and standardize before entering the model

My questions:

- In the context of analyzing survey data (in which zero might be part of the scale), how can SmartPLS handle an interaction that did not take place, without considering it a missing value?

Thank you,

SmartPLS Developer
Posts: 1265
Joined: Tue Mar 28, 2006 11:09 am
Real name and title: Dr. Jan-Michael Becker

Re: Low incidence vs missing data

Post by jmbecker »

Why do you not want to use a missing value?
Using anything else then a missing value would probably not be correct as your observations do not have a satisfaction score of zero. It is actually the case that they might have any satisfaction score, but you do not know it because you do not observe it. Thus, it is a missing value.
If you use pairwise deletion, the method will calculate the correlations/regressions/etc. for each sub-model only on the available data. Thus, it will assess the influence of you predictors for each touchpoint satisfaction only for those observations that have a score for a particular touchpoint, i.e., the 1000 observation for call center and 100 for website. Those that have a satisfaction score for both are always used and those that only have a score for one touchpoint will only be used in that regression.

You might also consider not modelling every satsifaction separately, but an overall satisfaction as a latent variable composite (composed of the satisfactions with the different touchpoints). Thereby, you aggregate those that have interactions with several touchpoints into one overall satisfaction score.
Dr. Jan-Michael Becker, BI Norwegian Business School, SmartPLS Developer
Researchgate: https://www.researchgate.net/profile/Jan_Michael_Becker
GoogleScholar: http://scholar.google.de/citations?user ... AAAJ&hl=de