Interaction background variable & predictor logistic regression

Rich Ulrich

2020-10-16 19:08:34 UTC

Post by ***@gmail.com
Hii,
I have a question about whether or not including a significant interaction between background variable (SES) and predictor (Years in firm).
I have 2 background variables: Gender and SES. My main predictor is years in firm, and my outcome variable is whether people do or do not prefer to cooperate with colleagues. I am trying to build a logistic regression model.
I first built the following model.
block 1: Gender & SES. I included this in the first block as I wanted to correct for those background variables.
block 2: Years.
However, I found that the interaction term SES*Years is significant. So should I include it in my final model or should I leave my model as described above.

How big are the effects, alone and compared to each other?
What is the N?

What is taught in classes does not always mention that huge
Ns can inflate trivial effects to be "significant." When I've had
F-tests over 200 for main effects (say) I've been happy to ignore
barely-significant tests of interaction. And say so in the write-up,
of course.

What the the plot look like? I can readily imagine an interaction
that reflects /only/ the fact that "Years in firm" has been
treated as linear and equal interval where the plot shows a
decreasing effect for year. That's just as /I/ would expect for
Years-in-firm: The interval from 1 to 2 years is far larger than
the interval from 30 to 31.

Taking the log or square root of years would do better to reflect
"equal intervals" and could eliminate the significance of the
interaction. I admit that this consequence is not an obvious one --
What I am referring to is that fact that "bad modeling" of any
strong predictor-variable has the tendency to draw into the model
other variables and interactions, in order to patch the error of the
direct model. Artifacts.

"Artifacts" are one reason that "Ranks" are disparaged for multiple-
variable models -- the non-Normal, not-natural scaling of Ranks
can invite other variables to reduce the error variance created by
the bad scaling. Then you see those artifacts as "significant."

If the plot needs the interaction to describe it adequately for
whatever audience you are addressing, then you surely need
to keep the interaction in the model.

And whether or not this is not exploratory work, you should
let your audience know the facts about how many tests you
have run, etc.

--
Rich Ulrich