Endogeneity issues in recursive SEM (use copulas?)
Posted: Mon Sep 25, 2023 10:33 am
Consider the following structural equations:
store sales := f(store visits, x_1, x_2)
store visits := f(x_1, x_2)
Where as x_1, x_2 denotes promotional activity spending.
Note that x_1, x_2 are not the sole drivers behind store visits, we also have a baseline of store visits.
In this case: assume that we model store visits and store sales as:
log(store sales) := dayofweekbaseline + dayofmonthbaseline + b_3 * log(x_1) + b_4 * log(x_2) + b_5 * \hat log(store visits)
where as \hat denotes that the storevisits are predicted values from the equation below:
log(store visits) := dayofweekbaseline + dayofmonthbaseline + b_1 * log(x_1) + b_2 * log(x_2)
Now note that the baseline of store visits and store sales stems from the same underlying demand which is unobserved although we are trying to model it with e.g an UCM approach or just basic regression with dummies.
In the equation for store visits we aim to capture this demand by the baselines but notice now when we plug into these predicted values into the equation for store sales the estimate for b_5 will be severely biased since store visits now captures this underlying demand(which should have been attributed to the baselines in the store sales equation).
I been thinking about using copulas as latent instrumental variables to attack this issue, i wonder however if the normality tests for my predicted store visit values would be sufficient, it bothers me that we assume a standard error for the store visits equation but that does not dictate that the actual predicted values would be normally distributed or am i missing out on something..
Any tips on how to go on with this?
store sales := f(store visits, x_1, x_2)
store visits := f(x_1, x_2)
Where as x_1, x_2 denotes promotional activity spending.
Note that x_1, x_2 are not the sole drivers behind store visits, we also have a baseline of store visits.
In this case: assume that we model store visits and store sales as:
log(store sales) := dayofweekbaseline + dayofmonthbaseline + b_3 * log(x_1) + b_4 * log(x_2) + b_5 * \hat log(store visits)
where as \hat denotes that the storevisits are predicted values from the equation below:
log(store visits) := dayofweekbaseline + dayofmonthbaseline + b_1 * log(x_1) + b_2 * log(x_2)
Now note that the baseline of store visits and store sales stems from the same underlying demand which is unobserved although we are trying to model it with e.g an UCM approach or just basic regression with dummies.
In the equation for store visits we aim to capture this demand by the baselines but notice now when we plug into these predicted values into the equation for store sales the estimate for b_5 will be severely biased since store visits now captures this underlying demand(which should have been attributed to the baselines in the store sales equation).
I been thinking about using copulas as latent instrumental variables to attack this issue, i wonder however if the normality tests for my predicted store visit values would be sufficient, it bothers me that we assume a standard error for the store visits equation but that does not dictate that the actual predicted values would be normally distributed or am i missing out on something..
Any tips on how to go on with this?