Page **1** of **1**

### Control Variable

Posted: **Tue Mar 05, 2019 2:53 pm**

by **skr**

Hi,

Suppose, age and gender are considered to be control variables in my model. Values for gender are as follows: 1=female, 2=male (nominal qualitative). Values for 5 different age groups are denoted by 1 to 5 (ordinal qualitative). Should I simply include these two as single indicator constructs in a similar manner irrespective of their measures? or any other modification is required? What does a negative path co-efficient of the gender signify, if the path is significant?

Thanks

### Re: Control Variable

Posted: **Wed Mar 06, 2019 1:11 pm**

by **jmbecker**

Gender is easy to include: Teat it as a dummy variable (0/1) single indicator construct (you can also leave it as 1/2, because it will be standardized). The effect is then interpreted as the average difference on your DV between female and male weighted by the sample size of females and males (if both are equal it is simply the difference). Hence, negative coefficient means that on average you DV is lower for males than for females.

Age groups are more complicated: If the ordinal variable categories are equidistant you may use it as it is. Many people are using Likert scales (5 or 7 ratings) as quasi-metric directly in PLS. If your age groups are not equidistant you may need to create several dummies and add them all as single indicator constructs to the model. The effects are then all interpreted compared to the reference group (the one group that does not get a dummy variable, but is always zero).

### Re: Control Variable

Posted: **Wed Mar 06, 2019 1:21 pm**

by **skr**

Thank you Dr. Becker.

### Re: Control Variable

Posted: **Sat Mar 09, 2019 7:32 pm**

by **skr**

Dr. Becker,

If values for 4 different age groups are mentioned as: 18-26=1, 27-34=2, 35-50=3, 51 onward=4, could I go without creating dummies and include as it is, as a single indicator construct?

### Re: Control Variable

Posted: **Sat Mar 09, 2019 7:58 pm**

by **jmbecker**

That does not quite sound like equidistant categories as they are unequally large. You would need to justify that a change from 1 to 2 is conceptually the same as from 2 to 3 or 3 to 4.

### Re: Control Variable

Posted: **Wed Mar 13, 2019 11:32 am**

by **skr**

Dr. Becker,

Please get me corrected. Since there are four different age groups, I have to create three dummies (e.g. Age1, Age2, Age3). Now, I can assume the first age group i.e. 18-26 as the reference group since most of the participants are from this group. Now I have to include Age1, Age2, and Age3 as single indicator constructs to the model. The first group doesn't get a dummy. I just want to ask you whether the effect of the first group (not directly visible in the model) is embedded with the joint effect of three dummies.

### Re: Control Variable

Posted: **Wed Mar 20, 2019 9:59 am**

by **jmbecker**

Yes that is correct. The dummies Age1, Age2, Age3 show the difference to the first group (reference group) that does not have a dummy. Thereby, the effect is embedded within the model.

### Re: Control Variable

Posted: **Wed Mar 20, 2019 10:03 am**

by **skr**

Thank you.

### Re: Control Variable

Posted: **Sat Mar 23, 2019 9:27 am**

by **skr**

Dr. Becker,

I have been facing another problem. Bootstrapping doesn't exhibit appropriate result (all Zero values) with the inclusion of three dummies (age1, age2, age3) as control variables while assuming the first age group i.e. 18-26 as the reference group (most number of participants are from this group). However bootstrapping works well while assuming the last group i.e. 51 onward=4 as the reference group (least number of participants/ only 6 out of 581). Please comment.

### Re: Control Variable

Posted: **Sat Apr 13, 2019 1:42 pm**

by **jmbecker**

Dummy variables may cause problems with the bootstrapping procedure if they are unevenly distributed. That is because bootstrapping is a random sampling procedure. It samples with replacement from the original dataset. Hence, there might be subsamples where one of your dummies has only ones or zeros because it has sampled only those observations. For example, if you have a dummy with only a few ones then it is likely that the random procedure does not pick any of those and you have a variable with only zeros.

That would create a variable with zero variance which cannot be estimated in a *standardized* regression which is used for the PLS path model.