Discussion:
Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12
(too old to reply)
tHatDudeUK
2005-04-17 14:51:27 UTC
Permalink
Hi,

My sample size is 149. I have one dependent variable and 10
independent (or predictor) variables which I'm analysing using
multiple linear regression (with the enter method).

There are some assumptions of multiple regression I'm not sure how to
test for, including normality, homoscedasticity and linearity. I have
already considered collinearity (that's the easy one). Not sure if
I've missed any out. I've been told there's an easy graphical way to
do this but I'm not sure how to do this. I think it might be
prefereable to use a statistical test where possible though.

If you can provide any pointers to where I should be looking I'd be
very grateful.

Thanks in advance,

Regards

Alan.
Reef Fish
2005-04-17 16:39:42 UTC
Permalink
Post by tHatDudeUK
Hi,
My sample size is 149. I have one dependent variable and 10
independent (or predictor) variables which I'm analysing using
multiple linear regression (with the enter method).
There are some assumptions of multiple regression I'm not sure how to
test for, including normality,
Do a normal probability plot of the residuals.
Post by tHatDudeUK
homoscedasticity
Do a scatter plot of the residuals vs the FITTED dependent variable.

These two are assumptions about the ERRORS, being iid N(0, sigma-sq.),
so the third component of the assumption is INDEPENDENCE of the errors.

Here you need to do some sequence plots of the residuals vs fitted
values (possibly other variables as well).
Post by tHatDudeUK
and linearity.
This is the assumption about the deterministic part of the model.
The linearity is about the regression COEFFICIENTS in the linear
model, or whether the data fits the postulated hyperplane.

A recent thread here discussed various (graphical) methods to examine
this model assumption.
Post by tHatDudeUK
I have already considered collinearity (that's the easy one).
How did you do it? It's not necessarily an easy one unless it's so
severe that the programs bombs by trying to invert a near-singular
matrix. :-)
Post by tHatDudeUK
Not sure if
I've missed any out. I've been told there's an easy graphical way to
do this but I'm not sure how to do this. I think it might be
prefereable to use a statistical test where possible though.
An effective graphical method is ALWAYS preferable to a statistical
test!
Post by tHatDudeUK
If you can provide any pointers to where I should be looking I'd be
very grateful.
Any GOOD textbook on APPLIED Regression Analysis should treat ALL of
the above (and more) GRAPHICAL methods, each aimed at detecting
specific departure of your data from the functional (linear) and
probability (iid N(0, sigma-sq)) assumptions.
Post by tHatDudeUK
Thanks in advance,
Nada.

-- Bob.
tHatDudeUK
2005-04-17 17:34:16 UTC
Permalink
On 17 Apr 2005 09:39:42 -0700, "Reef Fish"
Post by Reef Fish
Post by tHatDudeUK
I have already considered collinearity (that's the easy one).
How did you do it? It's not necessarily an easy one unless it's so
severe that the programs bombs by trying to invert a near-singular
matrix. :-)
Hmmm I read a book which suggested multicollinearity occues when
collinearity tolerance values are less than or equal to 0.1. All mine
were above 0.4. This is probably too simplistic a method I assume?
Post by Reef Fish
Any GOOD textbook on APPLIED Regression Analysis should treat ALL of
the above (and more) GRAPHICAL methods, each aimed at detecting
specific departure of your data from the functional (linear) and
probability (iid N(0, sigma-sq)) assumptions.
Went to the library but Tabachnik and Fidell was missing :-( and it's
expensive on amazon :-(
Reef Fish
2005-04-17 18:27:38 UTC
Permalink
Post by tHatDudeUK
On 17 Apr 2005 09:39:42 -0700, "Reef Fish"
Post by Reef Fish
Post by tHatDudeUK
I have already considered collinearity (that's the easy one).
How did you do it? It's not necessarily an easy one unless it's so
severe that the programs bombs by trying to invert a near-singular
matrix. :-)
Hmmm I read a book which suggested multicollinearity occues when
collinearity tolerance values are less than or equal to 0.1. All mine
were above 0.4. This is probably too simplistic a method I assume?
You guessed right.

I assumed your "collinearity tolerance values" has something to do
with the statistical significance of the INDIVIDUAL coefficients.

In a multivariable multicollinearity situation, you may have
X1 = aX2 + bX3 + cX4 + 10^(-10) <or however small the in-exact
linear relation between the independent variables hold, and for
NONE of the pairwise correlation between (Xi and Xj) to be high
as to be detected by the packaged software to warn the users.
Post by tHatDudeUK
Post by Reef Fish
Any GOOD textbook on APPLIED Regression Analysis should treat ALL of
the above (and more) GRAPHICAL methods, each aimed at detecting
specific departure of your data from the functional (linear) and
probability (iid N(0, sigma-sq)) assumptions.
Went to the library but Tabachnik and Fidell was missing :-( and it's
expensive on amazon :-(
Did they take a vacation to Cuba? :-) Surely there are better books
in APPLIED regression analysis than Tabachnik and Fidell. Besides, I
don't think you're up to the task of reading a book (even a good one)
on regression and multicollinearity to understand all the theory and
nuances behind the PRACTICE of how to do a multiple linear regression
well. JMHO. You need the watchful eye of someone who is experienced
in the problem for guidance and help.

-- Bob.
Thom
2005-04-19 12:28:04 UTC
Permalink
Post by Reef Fish
I assumed your "collinearity tolerance values" has something to do
with the statistical significance of the INDIVIDUAL coefficients.
No, tolerance just refers to the degree to which predictors are
independent of other predictors. If tolerance is high (close to 1)
there is little evidence of collinearity. tolerance close to zero is to
be avoided. Values in between reflect degrees of collinearity. A sharp
cut-off for tolerance should be avoided as collinearity is not an all
or nothing thing.

Thom
Reef Fish
2005-04-19 13:05:27 UTC
Permalink
Post by Thom
Post by Reef Fish
I assumed your "collinearity tolerance values" has something to do
with the statistical significance of the INDIVIDUAL coefficients.
No, tolerance just refers to the degree to which predictors are
independent of other predictors. If tolerance is high (close to 1)
there is little evidence of collinearity. tolerance close to zero is to
be avoided. Values in between reflect degrees of collinearity. A sharp
cut-off for tolerance should be avoided as collinearity is not an all
or nothing thing.
Thom
Thanks for the clarification.

Can you elaborate on HOW this tolerance is defined and/or calculated.
in a multiple regression of Y on 10 independent variables Xs, say?

-- Bob.
Ray Koopman
2005-04-20 08:09:11 UTC
Permalink
Post by Reef Fish
Can you elaborate on HOW this tolerance is defined and/or calculated.
in a multiple regression of Y on 10 independent variables Xs, say?
In a multiple regression context, the "tolerance" of a predictor =
1 - (its squared multiple correlation with the other predictors).

I think this usage originated in the 60s, in a program (BMD?) that
asked the user to specify how small a pivot value could get before
being considered to be zero.
Bruce Weaver
2005-04-20 11:18:44 UTC
Permalink
Post by Ray Koopman
Post by Reef Fish
Can you elaborate on HOW this tolerance is defined and/or
calculated.
Post by Ray Koopman
Post by Reef Fish
in a multiple regression of Y on 10 independent variables Xs, say?
In a multiple regression context, the "tolerance" of a predictor =
1 - (its squared multiple correlation with the other predictors).
I think this usage originated in the 60s, in a program (BMD?) that
asked the user to specify how small a pivot value could get before
being considered to be zero.
And variance inflation factor (VIF) is 1/tolerance. I tried to post
this link yesterday, but my news server was not cooperating.

http://www2.chass.ncsu.edu/garson/pa765/regress.htm#toleranc
--
Bruce Weaver
***@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
Reef Fish
2005-04-20 15:18:04 UTC
Permalink
Post by Reef Fish
Post by Ray Koopman
Post by Reef Fish
Can you elaborate on HOW this tolerance is defined and/or
calculated.
Post by Ray Koopman
Post by Reef Fish
in a multiple regression of Y on 10 independent variables Xs, say?
In a multiple regression context, the "tolerance" of a predictor =
1 - (its squared multiple correlation with the other predictors).
Not a bad indicator for the "degree of multicollinearity due to the
linear dependence of a predictor on the OTHER predictors".

It's just another way of looking at the "near singularity", "ill-
conditioning", determinant, eigenvalues, etc. of the (X'X) matrix.
Post by Reef Fish
Post by Ray Koopman
I think this usage originated in the 60s, in a program (BMD?) that
asked the user to specify how small a pivot value could get before
being considered to be zero.
That sounds right. I am sure Wil Dixon (ASA Fellow 1955) and his BMD
development staff are well familiar with all of the theory and methods
we are talking about now.
Post by Reef Fish
And variance inflation factor (VIF) is 1/tolerance. I tried to post
this link yesterday, but my news server was not cooperating.
http://www2.chass.ncsu.edu/garson/pa765/regress.htm#toleranc
That's not a bad link, which could be called "Multiple Regression
for Dummies" (I used that in a complimentary way). :-)


*> When there are two or more independents, the b coefficient is
*> a partial regression coefficient, though it is common simply
*> to call it a "regression coefficient"

Right! It reflects the PARTIAL correlation information between Y and
Xi
given all the other Xj,j.ne.i in the model.


The link discusses the role of partial correlations and its relation
to a multiple regression coefficient in a rather obsucre way:

*> From statistical output for regression, how may we derive the
*> significance of a partial correlation coefficient like rYX1.X2X3??

I would have answered it this way, which is the way I wrote in my
unpublished Data Analysis textbook (Notes) over 30 years ago:

It is the SIMPLE correlation between two sets of residuals: the
residuals of regressing Y on X2 and X3; and the residuals of
regressing X1 on X2 and X3, in theory and in computation!

This is INTUITIVE. You remove the part of Y that is FITTED (the
word "explained" promotes abuse) by X2 and X3. You remove the
part of X1 that is FITTED by X2 and X3. The simple linear
relation between these two sets of rediduals is precisely what
the PARTIAL correlation is about.

In the MR of Y on X1, X2, and X3, the fitted coefficient of X1 =

rYX1.X2X3 times sqrt({partial variance of Y, given X2 and X3)/
(partial variance of X1, given X2 and x3).

completely analogous to b = rxy(sy/sx) in a simple regression!
(with Y and X replaced by two specific sets of residuals)


In my chapter (which I could not locate now <G>), I made the
point that ALL Multiple Rs and ALL partial correlations are
SIMPLE correlations between two sets of values -- you simply have
to identify WHICH sets of values those are.


In the GENERAL multiple linear regression case of Y on X1, ..., Xk

the regression coefficient of Xi, for i = 1, 2, ..., k,

is the partial correlation between Y and Xi, given all Xj, j.ne.i.
which is the SIMPLE correlation between

the RESIDUALS of Y regressed on all Xj, ..., j.ne.i
and the RESIDUALS of Xi regressed on all Xj, ..., j.ne.i


The above is the EXPLICIT descriptions/expressions of the partial
correlations I've been talking about throughout this and related
threads.

The Multiple R of that regression is the SIMPLE correlation between
the observed Y and the FITTED Y, or corr(Y, Y-hat).

You link says,

*> The significance will be the same as for the regression coefficient,

*> b sub YX1.X2X3

which doesn't really explain anything. Of course the significance of
b is the same as the significance of the partial correlation, because
each b is just a "non-negative scale factor" times the partial
correlation, which is WHY ....

drum roll please, maestro ...

The SIGN of the multiple regression coefficient of Xi is the SAME as
the SIGN of the partial correlation between Y and Xi, given all the
OTHER Xj, for j not equal to i!

-- Bob.
Post by Reef Fish
--
Bruce Weaver
www.angelfire.com/wv/bwhomedir
Reef Fish
2005-04-20 16:35:33 UTC
Permalink
Post by Reef Fish
In my chapter (which I could not locate now <G>), I made the
point that ALL Multiple Rs and ALL partial correlations are
SIMPLE correlations between two sets of values -- you simply have
to identify WHICH sets of values those are.
I should have acknowledged that what I posted here and what I had
written in that chapter were the results of numerous statisticians,
who had written about the Analysis of Residuals, such as Frank
Anscombe, Tukey; and various authors who had written about the
use of the SWEEP operator in linear models computation (which
actually facilitated theoretical discussion of the related
topics).

Beaton (1964; SWP in his dissertation), Marty Schatzoff in
his development of COMB (Console oriented muldel building),
Jim Goodnight, in his various articles in the use of SWEEP in
SAS computation, as well other authors of Multiple Regression
theory and methodology, too numerous to enumerate. These
include: Box, Chambers, Chatterjee, Cook, David Cox, Dempster,
Efron, Greybill, Hadi, Hartigan, Hoaglin, Hocking, Mallows,
Mosteller, Neter, Rubin, Velleman, Weisberg, and Welsch, all
of whom are Fellows of the ASA and all made significant
contributions to the APPLICATION of statistics in general, and
the application of regression methods in particular.


The knowledge and understanding is NOT something one can pick up
in an SPSS or SAS or whatever computer software manual, and
understand the role various quantities play and how they are
inter-related.

The ABUSE of the "expected SIGN" by social scientists and other
computer-software suppoted statistical methodology is the
penalty we pay for the illusion that reading a chapter from
a brain-surgeon's manual is sufficient for one to do


-- Bob.
Aleks Jakulin
2005-04-20 18:58:56 UTC
Permalink
Post by Reef Fish
This is INTUITIVE. You remove the part of Y that is FITTED (the
word "explained" promotes abuse) by X2 and X3. You remove the
part of X1 that is FITTED by X2 and X3. The simple linear
relation between these two sets of rediduals is precisely what
the PARTIAL correlation is about.
There is something one should be careful about here, and that I haven't
yet seen explicated: it's how the covariance matrix used for working out
the partial correlation coefficients is obtained. Usually, the
covariance matrix is the maximum *joint* likelihood one. This matrix is
different from the maximum *conditional* likelihood one that's implicit
in multiple regression.

If someone wants to toy around with the difference between maximizing
conditional likelihood (or least squares) and maximizing joint
likelihood (or orthogonal least squares), this applet is pretty neat:

http://mathcs.holycross.edu/~spl/java/LeastSquare/
--
mag. Aleks Jakulin
http://kt.ijs.si/aleks/
Department of Knowledge Technologies,
Jozef Stefan Institute, Ljubljana, Slovenia.
Reef Fish
2005-04-21 03:16:20 UTC
Permalink
Post by Aleks Jakulin
Post by Reef Fish
This is INTUITIVE. You remove the part of Y that is FITTED (the
word "explained" promotes abuse) by X2 and X3. You remove the
part of X1 that is FITTED by X2 and X3. The simple linear
relation between these two sets of rediduals is precisely what
the PARTIAL correlation is about.
This result assumes the usual Conditional model of Y given X, and the
Ordinary Least Squares solution/estimation to the regression problem.
Post by Aleks Jakulin
There is something one should be careful about here, and that I haven't
yet seen explicated: it's how the covariance matrix used for working out
the partial correlation coefficients is obtained.
I can't give you any precise attribution of the facts in my paragraph
above because I don't have any reference books. Besides, the FACTS
came from many different sources, and the stated self-contained results
relating all Multiple and Partial correlations to simple correlations,
AFAIK, is found only in my unpublished Lecture Notes for my Data
Analysis course.

But the stated result can be EITHER the theoretical model (without
extimating anything) version or the OLS estimated quantities in a
multiple regression.

The simplest way to actually SEE that the relation holds is to take
ANY multiple regression package to fit Y on X2 and X3; calculate
the residual vector, and its sample std dev., denoting it by sy.23.

Then do the same thing, regressing X1 on X2 and X3; calculate the
residuals vector, and its sample std dev., denoting it by s1.23.

Finally, calculate the simple correlation r between those two sets of
residuals.

r * ( sy.23/s1.23 )

should be EXACTLY the multiple regr. coefficient of X1, when you
regress Y on X1, X2, and X3.

The THEORETICAL/model version of the above is analogous.

IMHO, the beauty of this conceptual AND empirical method is its
simplicity.


Here's a CONCEPTUAL exercise to see if you understand the phenomenon --
which is an EXERCISE (and sometime EXAM question) I give to students
of the course after these results are discussed and presented.

Exercise. Suppose you have a computer program which dan do ONLY
simple regressions. Show how that simple regression program can be
used to compute the Multiple Regression of Y on X1, X2, ..., Xk,
and get exactly the same result as you would if you had an OLS
program to perform the multiple regression.

Hint: Each coefficient of a higher order regression requires TWO
lower-order regressions and one simple correlation between the
residuals.

-- Bob.
Post by Aleks Jakulin
Usually, the
covariance matrix is the maximum *joint* likelihood one. This matrix is
different from the maximum *conditional* likelihood one that's
implicit
Post by Aleks Jakulin
in multiple regression.
If someone wants to toy around with the difference between maximizing
conditional likelihood (or least squares) and maximizing joint
http://mathcs.holycross.edu/~spl/java/LeastSquare/
--
mag. Aleks Jakulin
http://kt.ijs.si/aleks/
Department of Knowledge Technologies,
Jozef Stefan Institute, Ljubljana, Slovenia.
Reef Fish
2005-04-21 03:50:09 UTC
Permalink
The subject was: Simple, Partial, and Multiple Correlations explained.
Post by Reef Fish
Post by Reef Fish
This is INTUITIVE. You remove the part of Y that is FITTED (the
word "explained" promotes abuse) by X2 and X3. You remove the
part of X1 that is FITTED by X2 and X3. The simple linear
relation between these two sets of rediduals is precisely what
the PARTIAL correlation is about.
This result assumes the usual Conditional model of Y given X, and the
Ordinary Least Squares solution/estimation to the regression problem.
It has just occurred to me, after explaining explicitly how
the multiple regression coefficient of a single varialbe depend
on ALL other variables in the model, that this is an excellent
ALTERNATE way to view the "expected sign" fallacy.
Post by Reef Fish
The simplest way to actually SEE that the relation holds is to take
ANY multiple regression package to fit Y on X2 and X3; calculate
the residual vector, and its sample std dev., denoting it by sy.23.
Then do the same thing, regressing X1 on X2 and X3; calculate the
residuals vector, and its sample std dev., denoting it by s1.23.
Finally, calculate the simple correlation r between those two sets of
residuals.
r * ( sy.23/s1.23 )
should be EXACTLY the multiple regr. coefficient of X1, when you
regress Y on X1, X2, and X3.
Here's a CONCEPTUAL exercise to see if you understand the phenomenon --
which is an EXERCISE (and sometime EXAM question) I give to students
of the course after these results are discussed and presented.
This conceptual exercise should convince one the ABSURDITY of
stating (a priori) that the "expected sign" of the coefficient
of X1, say is POSITIVE, say, when there are DOZENS of variables
in the multiple regression model.
Post by Reef Fish
Exercise. Suppose you have a computer program which dan do ONLY
simple regressions. Show how that simple regression program can be
used to compute the Multiple Regression of Y on X1, X2, ..., Xk,
and get exactly the same result as you would if you had an OLS
program to perform the multiple regression.
Hint: Each coefficient of a higher order regression requires TWO
lower-order regressions and one simple correlation between the
residuals.
For a multiple regression model with dozens of independent variables,
the SIGN of any of the coefficients can be determinded ONLY from
such a "pyramid" of dozens of levels, of which the "expected-sign"
guesser has done/observed none.

It's equally absurd to think that one can argue (a prior) what to
"expect" in this 30-story (or more or less) pyramid the SIGNS at
each level of the pyramid == on residuals never observed --
which would have been REQUIRED to justify any "expected sign"
statement.

Perhaps this will hslp some of the "expected sign" folks THINK
harder if they really know what to "expect" once they realize
the theoretical and empirical mechanism of the process called
Multiple Regression Analysis which they thought they knew.

-- Bob.
Richard Ulrich
2005-04-21 21:59:54 UTC
Permalink
On 20 Apr 2005 20:50:09 -0700, "Reef Fish"
<***@Yahoo.com> wrote:


[]
Post by Reef Fish
This conceptual exercise should convince one the ABSURDITY of
stating (a priori) that the "expected sign" of the coefficient
of X1, say is POSITIVE, say, when there are DOZENS of variables
in the multiple regression model.
Oh, my. You think I have been defending an "expected sign"
when there are DOZENS of variables in the multiple
regression model. No. I haven't seen people present
models that big, either. I thought we were clearly talking
about, say, six to ten, at most.


I said, many times, we pay very CAREFUL attention to
variables for a model; I never said "dozens" or imagined
a model with that many. I could not give so much care,
to THAT many. If your complaint depends on "dozens",
it could have been mentioned back then.

Perhaps it would have come up earlier, too, if you had
not ignored attempts I made (we made) to explore your
objections in related contexts.

So, here is a negative clue -- what does *not* separate us.

[snip, rest]
--
If you have a Newsreader that follows conventions....
Note: a double-dash line (above) usefully causes a Reply
to be truncated at that mark. My reader had trouble picking
up the whole message I was quoting because there was a
double-dash line in it, from previous wrap-around of a quoted
line. I got past it by high-lighting the whole message
before hitting Reply.
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-22 01:32:26 UTC
Permalink
Post by Richard Ulrich
On 20 Apr 2005 20:50:09 -0700, "Reef Fish"
[]
Post by Reef Fish
This conceptual exercise should convince one the ABSURDITY of
stating (a priori) that the "expected sign" of the coefficient
of X1, say is POSITIVE, say, when there are DOZENS of variables
in the multiple regression model.
Oh, my. You think I have been defending an "expected sign"
when there are DOZENS of variables in the multiple
regression model. No. I haven't seen people present
models that big, either.
If you defended those "expected sign" abusers talking about six
to ten variables, you are in the same boat as those talking about
dozens. You made the SAME blunders.

And don't flatter yourself that I was talking only to you.

Newsgroups: comp.soft-sys.stat.spss, sci.stat.math
From: jim clark <***@uwinnipeg.ca> -
Date: Sun, 17 Apr 2005 23:27:43 -0500
Subject: Re: The "expected sign" of a multiple regression coefficient

JC> In Streisguth's work on the
JC> influence of maternal alcohol consumption on subsequent
JC> children's iqs, for example, the predicted (and observed) sign
JC> for the alcohol variable is negative and remains so even with
JC> dozens of other predictors in the equation.
JC> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Post by Richard Ulrich
I said, many times, we pay very CAREFUL attention to
variables for a model;
You did say that many times. But your "careful attention"
was clearly mis-directed, ill-founded, and nebulously stated,
because you had NOTHING to explain how you or anyone else arrived
at the "expected sign", in the light of the present thread of
partial correlations relative to the pyramid of simple correlation,
and the correlation between RESIDUALS which none of you "expected-
sign" abusers had ever seen!.
Post by Richard Ulrich
I never said "dozens" or imagined a model with that many.
Your present remark reminded me of the real or apocryphal story
about George Bernard Shaw and his conversation with a woman ...

You're like the woman who would sleep with Shaw for $1,000,000 but
said "what do you think I am" when offered $20. Shaw's alledged reply,

"We already know WHAT you are, we are just haggling now."
Post by Richard Ulrich
I could not give so much care,
to THAT many. If your complaint depends on "dozens",
it could have been mentioned back then.
Mosteller and Tukey showedd that even with two or three you CANNOT do
it (a priori). We had a recent thread in which someone wanted to
know how to look at a 1000 x 1000 scatter matrix for his "data
mining". "Social scientists" throw dozens of predictor variables into
a regression package more often than you think. They are as successful
as YOU are in the abuse of "expected sign" with 6 independent variables

(when sometimes you don't even KNOW what those OTHER variables are.
let alone how the DOZENS of residuals in your unexplored simple
regressions related to each other!

THAT was my point in using the CONCEPTUAL scheme in the "pyramid" of
simple correlations as an ALTERNATE new of the "expected sign" fallacy.
Post by Richard Ulrich
[snip, rest]
If you have a Newsreader that follows conventions....
Note: a double-dash line (above) usefully causes a Reply
to be truncated at that mark. My reader had trouble picking
up the whole message I was quoting because there was a
double-dash line in it, from previous wrap-around of a quoted
line. I got past it by high-lighting the whole message
before hitting Reply.
You problem with your Newsreader is the LEAST of all of your
problems in a discussion.

I read and post from groups.google.com, which archives ALL posts
in newsgroups since 1981.

-- Bob.
Post by Richard Ulrich
http://www.pitt.edu/~wpilib/index.html
Richard Ulrich
2005-04-22 02:30:39 UTC
Permalink
- A digression about "newsreaders". No statistics.

On 21 Apr 2005 18:32:26 -0700, "Reef Fish"
Post by Reef Fish
Post by Richard Ulrich
On 20 Apr 2005 20:50:09 -0700, "Reef Fish"
[snip, Bob discussing MR, signs; I thought he said
(elsewhere?) that he was done with that.]

Below - This was my note-to-interested readers, who might
not know why a certain message was truncated when
they attempted to Reply to it. I told why, and what to do.

I could have been more redundant, but I thought it was
awfully wordy as it was.
Post by Reef Fish
Post by Richard Ulrich
If you have a Newsreader that follows conventions....
Note: a double-dash line (above) usefully causes a Reply
to be truncated at that mark. My reader had trouble picking
up the whole message I was quoting because there was a
double-dash line in it, from previous wrap-around of a quoted
line. I got past it by high-lighting the whole message
before hitting Reply.
RF >
Post by Reef Fish
You problem with your Newsreader is the LEAST of all of your
problems in a discussion.
(Can't you stop trying to pick a fight?)

My lines above had been preceded by "-- ".
If your Google-groups reader did not try to strip off the
lines above, then it doesn't follow the long-standing
convention. That's one convention that might be
disappearing, even without Google's choice (assuming
they've made that choice). A lot of the posts that quote
me do include my .sig, even though I always put in "-- ".
Post by Reef Fish
I read and post from groups.google.com, which archives ALL posts
in newsgroups since 1981.
I look there for old posts, of course. My ISP keeps notes for
several months. Reading daily, I never had any concern
about old posts with my previous ISP, who only kept posts
on their server for a week. My reader downloads everything
new, and saves whatever I elect to save.

From what I've read about Google's group-reading, it takes a
couple of extra steps to Reply with the preceding message
there at all; and the server strips leading blanks from lines when
you are trying to post that way -- which annoys folks who
try to post functional computer code.

Anything to add about that?
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-22 05:40:21 UTC
Permalink
Post by Richard Ulrich
- A digression about "newsreaders". No statistics.
Then you should have changed the subject, as I did, for you.
That's one of the Netiquette Rules in USENET.
Post by Richard Ulrich
I could have been more redundant, but I thought it was
awfully wordy as it was.
RF >
Post by Reef Fish
You problem with your Newsreader is the LEAST of all of your
problems in a discussion.
(Can't you stop trying to pick a fight?)
A simple statement of personal opinion. Why use your newsreader
as your excuse for missing what others posted?
Post by Richard Ulrich
My lines above had been preceded by "-- ".
If your Google-groups reader did not try to strip off the
lines above, then it doesn't follow the long-standing
convention.
Complain to Goggle-groups.com. if you wish.
Post by Richard Ulrich
Post by Reef Fish
I read and post from groups.google.com, which archives ALL posts
in newsgroups since 1981.
I look there for old posts, of course.
"Archives" mean old posts. When I am travelling, sometime I don't
read any ng for weeks, and when I return, I read any "old" posts
in CONTEXT, in completely threaded threads as if I read it the
day before when the post was 2 months, or 2 years old.
Post by Richard Ulrich
My ISP keeps notes for several months.
I have no problem with folks who PREFER to use whatever Newsreaders
they use. But when they complain about what OTHERS use, such as
google, then it's something else. ANYONE should learn how to use
google for archive-retrieval.
Post by Richard Ulrich
From what I've read about Google's group-reading, it takes a
couple of extra steps to Reply with the preceding message
there at all; and the server strips leading blanks from lines when
you are trying to post that way -- which annoys folks who
try to post functional computer code.
Anything to add about that?
When in Rome, do as the Romans do.

"When thou enter a city, abide by its customs." -- The Talmud.


Nothing in the sci.stat.math ng tells readers WHICH newsreader
they should use.


When gorups.google.com FORCED readers to use the new -beta version,
I disliked its clumsiness and complained for several days about it.
That was mostly becaues of MY unfamiliarity of the new version.

Then the folks in rec.games.bridge pointed out I could do EVERYTHING
in the beta version as I did in the old version -- sincle line
with subjet and author, chronologically arranged, etc., so that I
could easily scan what to read and what to skip, I was satisfied
that the new beta groups.google is still far better than any other
ng readers I have used, on UNIX, Windows, and Mac, and the
archive-retrieval with "advanced groups search" is invaluable!

The "Find messages by this author" -- an optional info with any
post -- is valuable too. That gives me instantly some idea
about the experience/expertise/interests of the poster. When
someone has been reading ngs for 3 weeks tells other who have
been around for years what they've been doing wrong is a pretty
good sign that the poster is a Clueless Newbie, in the parlance
of newsgroup nettiquette.

There are many other advanteges in using google to read posts.

But that's get OT on an OT topic.

-- Bob.
Richard Ulrich
2005-04-22 18:25:22 UTC
Permalink
On 21 Apr 2005 22:40:21 -0700, "Reef Fish"
<***@Yahoo.com> wrote:
[... ]
me >
Post by Reef Fish
Post by Richard Ulrich
My lines above had been preceded by "-- ".
If your Google-groups reader did not try to strip off the
lines above, then it doesn't follow the long-standing
convention.
RF >
Post by Reef Fish
Complain to Goggle-groups.com. if you wish.
umm. I wasn't thinking through all the details before,
but your "Google-groups reader" would be on your
own computer -- a web browser: Netscape or Internet
Explorer for most people.

I don't think that either of them act as Newsreaders,
whereas Newsreading is a function that can be added
on to "mail" programs like Outlook or Outlook Express --
at least one of those can read Newsgroups. As of a
couple of years ago, their threading, etc., was not nearly
as good as what you got from a reader designed as a
reader, such as Forte Agent, or Opera.
- Google is providing the "newsreading" and threading,
and so on. It sounds like they do it better than OE.

My reader shows me text with or without word-wrap,
in fixed or proportional font, depending on settings.
When I post a Reply, it shortens the lines and word-wraps,
and inserts " >" (or what I select) at the front, and it
shortens the message if there was a line that was "--"
in column 1 followed by nothing.

It seems to me that Google would not do word-wrap or
other editing of what it passed along to your browser
when you try to do a Reply -- that would be left to your
browser.
So, Google might be unlikely to shorten at "--", and your
browser wasn't designed with Newsreading conventions
in mind.

[snip, some]
Thanks for these comments -
Post by Reef Fish
When gorups.google.com FORCED readers to use the new -beta version,
I disliked its clumsiness and complained for several days about it.
That was mostly becaues of MY unfamiliarity of the new version.
Then the folks in rec.games.bridge pointed out I could do EVERYTHING
in the beta version as I did in the old version -- sincle line
with subjet and author, chronologically arranged, etc., so that I
could easily scan what to read and what to skip, I was satisfied
that the new beta groups.google is still far better than any other
ng readers I have used, on UNIX, Windows, and Mac, and the
archive-retrieval with "advanced groups search" is invaluable!
The "Find messages by this author" -- an optional info with any
post -- is valuable too. That gives me instantly some idea
about the experience/expertise/interests of the poster. When
someone has been reading ngs for 3 weeks tells other who have
been around for years what they've been doing wrong is a pretty
good sign that the poster is a Clueless Newbie, in the parlance
of newsgroup nettiquette.
There are many other advanteges in using google to read posts.
But that's get OT on an OT topic.
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-22 20:54:52 UTC
Permalink
Post by Richard Ulrich
On 21 Apr 2005 22:40:21 -0700, "Reef Fish"
[... ]
umm. I wasn't thinking through all the details before,
but your "Google-groups reader" would be on your
own computer -- a web browser: Netscape or Internet
Explorer for most people.
I don't think that either of them act as Newsreaders,
You should have done a little bit more research before making
statements like that which is your unsubstantiated opinion, but not
facts.
QUOTE
2. What is a Usenet Newsgroup?

Usenet refers to the distributed online bulletin board system begun in
1979 at Duke University. Usenet users can post messages in newsgroups
that can be read or contributed to by anyone with access to the
Internet and special newsreader software. Over the years, the number of
newsgroups has grown to the thousands, hosted all over the world and
covering every conceivable topic.

Google Groups contains the world's most comprehensive archive of
postings to Usenet, dating back to 1981.

Google Groups eliminates the need for newsreading software
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
END QUOTE
Post by Richard Ulrich
It seems to me that Google would not do word-wrap
See the 5-lines of word-wrapped paragraph above.

-- Bob.
Post by Richard Ulrich
[snip, some]
Thanks for these comments -
Post by Reef Fish
When gorups.google.com FORCED readers to use the new -beta version,
I disliked its clumsiness and complained for several days about it.
That was mostly becaues of MY unfamiliarity of the new version.
Then the folks in rec.games.bridge pointed out I could do
EVERYTHING
Post by Richard Ulrich
Post by Reef Fish
in the beta version as I did in the old version -- sincle line
with subjet and author, chronologically arranged, etc., so that I
could easily scan what to read and what to skip, I was satisfied
that the new beta groups.google is still far better than any other
ng readers I have used, on UNIX, Windows, and Mac, and the
archive-retrieval with "advanced groups search" is invaluable!
The "Find messages by this author" -- an optional info with any
post -- is valuable too. That gives me instantly some idea
about the experience/expertise/interests of the poster. When
someone has been reading ngs for 3 weeks tells other who have
been around for years what they've been doing wrong is a pretty
good sign that the poster is a Clueless Newbie, in the parlance
of newsgroup nettiquette.
There are many other advanteges in using google to read posts.
But that's get OT on an OT topic.
--
http://www.pitt.edu/~wpilib/index.html
Richard Ulrich
2005-04-23 21:17:54 UTC
Permalink
On 21 Apr 2005 18:32:26 -0700, "Reef Fish"
[... ]
Post by Reef Fish
Your present remark reminded me of the real or apocryphal story
about George Bernard Shaw and his conversation with a woman ...
You're like the woman who would sleep with Shaw for $1,000,000 but
said "what do you think I am" when offered $20. Shaw's alledged reply,
"We already know WHAT you are, we are just haggling now."
I'm a little puzzled. There's a contradiction.

What I grasped immediately from the metaphor of the
story does not seem consistent with "everything else."

From the story, I wanted to conclude that you are
opposed to drawing any scientific conclusions - any,
whatsoever - from "observational studies" or convenience
samples. That epistemological stance was used to defend
tobacco against critics, up through the 1950s. And it was
a fair defense, at a time when the evidence was badly arrayed.
Fisher, I think, was a notable critic of early associational
studies that blamed smoking for lung cancer and heart disease.

Then there were a couple of fine statements, laying out how
to use epidemiological evidence, and the position disappeared.

You have been willing to use prediction equations, without
any attribution of meaning (I guess) to any of the terms.
If you are just saying that we don't have to understand
*all* the coefficients... shoot, we have no argument there.

The practice of epidemiology is essentially an exercise
in direct contradiction to your mantra: there are always
multiple terms, and most of them must be interpretable.
We start with an association, and bring in every "reasonable"
factor that might explain it. Sometimes there are arguments
about what has "reason." This is true for looking at
chronic diseases, or for looking at an epidemic of
stomach-aches.

The single-predictor association is the *starting* place
for figuring out what might be going on. ("People got
sick from eating at Chi-Chi's" was a local example.)


Your example of prediction of course grades was not
"scientific" in the sense of drawing any conclusions about
the terms. Is that the use you see for MR -- No conclusion
about the terms? Only, specific predictions for a population
that is well-circumscribed?

What do you say about epidemiology?
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-23 23:02:23 UTC
Permalink
Post by Richard Ulrich
On 21 Apr 2005 18:32:26 -0700, "Reef Fish"
[... ]
Post by Reef Fish
Your present remark reminded me of the real or apocryphal story
about George Bernard Shaw and his conversation with a woman ...
You're like the woman who would sleep with Shaw for $1,000,000 but
said "what do you think I am" when offered $20. Shaw's alledged reply,
"We already know WHAT you are, we are just haggling now."
I'm a little puzzled. There's a contradiction.
What I grasped immediately from the metaphor of the
story does not seem consistent with "everything else."
I am sorry if the intended analogy/metaphor wasn't clear.

The "haggling" referred to your statement that you did not support
those "expected sign" abusers who use DOZENS of variables in their
model, because you wouldn't use more than 6-10.

That "WHAT you are" is -- the "expected sign" abuser.

I went on to explain that even if there are only THREE independent
variables in the model, Mosteller and Tukey explained why you
CANNOT know what sign to expect on X1, without knowing the OTHER
carriers.

Those talk about "expected sign" without explicit mention of other
variables (or explain how they arrived at the "expected sign" in
the presence of the other variables) would have to explain HOW
they know how various RESIDUAL vectors would behave or relate to
each other BEFORE they did any regression!

The ALTERNATE method helps one focus on the "correlation" pyramid.

You would have to know the correlation between two sets of residuals
just to know the sign of a FIRST ORDER partial correlation, such
as r x1.x2.

So, with 6 independent variables, you would have to known the signs
of 15 sets of simple correlations between different sets of residuals
just to know the required FIRST-order PARTIAL correlations, of
r xi.xj. for all i .ne. j.

THEN you have to argue HOW you know a priori all the signs of the
SECOND order partial correlations r xi.xjxk, for all i.ne.j.ne.k.

Then you have to argue HOW you know a priori, ...the THIRD order
partial correlations, and the 4th order, and the 5th order PARTIAL
correlatons.


If you had understood my "correlation pyramid" CONCEPTUAL exercise,
you would have realized the above, as well as WHY any a priori
"expected sign" is NOT tenable, and those who say they know
what the "expected sign" of X1 is, merely shows that:

1. They mistaken the sign to reflect the SIMPLE correlation
between X1 and the dependent variable Y.

2. They are DEFICIENT in their understanding of both the theory
and the methodology of Multiple Regression Analysis,
irrespective of what FIELD or what DATA they misapply their
Multiple Regression on.


< off target comments snipped >
Post by Richard Ulrich
What do you say about epidemiology?
See (2) above, if anyone uses the "expected sign" fallacy in any
Multiple Regression.

-- Bob.
Richard Ulrich
2005-04-24 21:19:48 UTC
Permalink
On 23 Apr 2005 16:02:23 -0700, "Reef Fish"
Post by Reef Fish
Post by Richard Ulrich
On 21 Apr 2005 18:32:26 -0700, "Reef Fish"
[... ]
Post by Reef Fish
Your present remark reminded me of the real or apocryphal story
about George Bernard Shaw and his conversation with a woman ...
[snip, restatement of how hard it is to build models
and know the algebraic signs... ]

So far, you have stated again and again that is it tough to
make numerical models with interpretations. I have agreed
with that, again and again. Except, you believe it is *impossible*,
whereas I believe that it is a conventional methodology.
(I also grant it is one that is subject to much lousy application).
The "correlation pyramid" is a new point; it illustrates
why the choice and scaling of variables is important
(as I've said before). It is a good thing to be as orthogonal
as possible.

I liked the Tukey quote and example -- To my reading, it only
confirmed the degree that I go, "it is difficult." (I really
enjoyed the Mosteller-Tukey book years ago, so I don't think
it offers a different style of support to your thesis. I will look
at it again whenever I have my books out of their boxes.)


So, Bob. It was my observation that Epidemiology
seemed to defeat all those difficulties. If you read
what I posted, I gave some details, and suggested a
couple of historical examples. I agree, it is not *always*
possible to build a useful model, because the variables
may be too fuzzy, or the data may be too confounded.
But history shows *me* that useful models have been
built, so the effort is not a total waste.
Post by Reef Fish
< off target comments snipped >
Let's see, across several Replies to me, this is at
least the 3rd time you have used something like that,
in response to my attempts to probe the issue at hand.
The direct implication, to me, is that you have less
interest in "discussion" than in lecturing.
Post by Reef Fish
Post by Richard Ulrich
What do you say about epidemiology?
See (2) above, if anyone uses the "expected sign" fallacy in any
Multiple Regression.
Isn't that a "Don't accept it"?

- This coyness seems to imply that you really, really
don't want to make open admissions that ruin your "case"
since these folks always want to interpret their main
parameters. (And, you don't want to overtly state the fact
you won't accept epidemiological studies.)




Do you reject Observational studies on principle?

If you accept only some of them, can you define *which*,
either by pointing to examples, or describing principles?
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-24 21:57:58 UTC
Permalink
Post by Richard Ulrich
On 23 Apr 2005 16:02:23 -0700, "Reef Fish"
Post by Reef Fish
Post by Richard Ulrich
On 21 Apr 2005 18:32:26 -0700, "Reef Fish"
[... ]
Post by Reef Fish
Your present remark reminded me of the real or apocryphal story
about George Bernard Shaw and his conversation with a woman ...
[snip, restatement of how hard it is to build models
and know the algebraic signs... ]
But THAT is the ONE and ONLY key issue under consideration -- the one
you have steadfastly trying to evade through your own confusion and
obfuscation.
Post by Richard Ulrich
So far, you have stated again and again that is it tough to
make numerical models with interpretations.
So far, you have misquoted me on EVERY post of yours. And you EVADED
the issue that you CANNOT know what "sign" of a multiple regression
coefficient to "expect".
Post by Richard Ulrich
The "correlation pyramid" is a new point; it illustrates
why the "expected sign" abusers are committing a FALLACY.
Post by Richard Ulrich
why the choice and scaling of variables is important
(as I've said before).
That's entirely IRRLEVANT to the point of "expected sign". Scaling
would not change an iota. Choise of variable would not tell you
anything about the expected sign either.

See, you don't COMPREHEND what you read. And instead of paying a
little more attention and read some more of the STANDARD literature
on the subject of Multiple Regression, by competent statsisticians,
you continue to waste your time defending yourself and other QUOCKS.
Post by Richard Ulrich
It is a good thing to be as orthogonal as possible.
Except for complete orthogonality, "as orthogonal as possible" is NOT
a consideration of the "expected sign" abusers, nor does it make
the "expect sign" argument any more valid!

Just tell how HOW you can "expect" all the partial correlation SIGNS,
first order, second order, third order, ..., involving ALL of the
"almost orthogonal independent variables".



Richard, you are wasting your time, everyone's time, arguing through
your hat, on the theory and methods of Multiple Regression you have
not learned properly, and have been abusing.

Just start doing things and THINKING about those signs the PROPER
way, rather than wasting the rest of your life making the same blunders
you've been making.

Better late than never -- as they say -- in learning how to do
Multiple Regression CORRECTLY and PROPERLY.
Post by Richard Ulrich
But history shows *me* that useful models have been
built, so the effort is not a total waste.
You are again evading the issue that an "expect sign" in a multiple
regression is an INVALID notion when the user is UNEDUCATED about the
sign being that of a PARTIAL correlation, and that it depends on ALL
the other independent variables; and not the intuitive SIMPLE
correlation sign.

The fact that some abusers found useful models is beside the point
of what is the CORRECT interpretation and CORRECT way of doing things.
If a useful model is found, the "expected sign" fallacy if used, is
nothing but a worthless, irrelevant, red-herring.
Post by Richard Ulrich
Post by Reef Fish
< off target comments snipped >
Let's see, across several Replies to me, this is at
least the 3rd time you have used something like that,
in response to my attempts to probe the issue at hand.
No, when your response did NOT address the issue of "expect sign" and
went on tangents of irrlevant issues, that's when I made those
statements.
Post by Richard Ulrich
The direct implication, to me, is that you have less
interest in "discussion" than in lecturing.
When it comes to what is CORRECT, and what is 100% wrong, you can say
that. There is NOTHING to discuss when you don't know the BASICS of
the theory and methods of regression, and continue ARGUING when
the "solution manual" has been shown you, and work several different
ways to explain the same TRUTH.
Post by Richard Ulrich
Do you reject Observational studies on principle?
See, you keep bringing up these rhetorical questions that has NOTHING
to do with the issue of the "expect sign" FALLACY.
Post by Richard Ulrich
If you accept only some of them, can you define *which*,
either by pointing to examples, or describing principles?
I do not IMMEDIATE REJECT those that don't VIOLATE the theory and
methods of Multiple Regression. Some are good and some are useless.
That evaluation comes later.

I IMMEDIATELY reject the validity of anyone who use the "expected
sign" fallacy because it shows the "researcher's" LACK of expertise
and LACK of credibility in their understanding of the methodology used.

I don't waste time on QUACKS once it has been identified to be a QUACK.

Comprendez?

-- Bob.
Richard Ulrich
2005-04-25 03:00:22 UTC
Permalink
- forty minutes between the date on my post and
the date on this. RF, it shows the signs of haste.
Or indifference.

On 24 Apr 2005 14:57:58 -0700, "Reef Fish"
Post by Reef Fish
Post by Richard Ulrich
On 23 Apr 2005 16:02:23 -0700, "Reef Fish"
[...]

RU > >
Post by Reef Fish
Post by Richard Ulrich
So far, you have stated again and again that is it tough to
make numerical models with interpretations.
RF >
Post by Reef Fish
So far, you have misquoted me on EVERY post of yours.
Damn! I thought you once taught classes!
Aren't you happier to hear a re-statement of the lesson
in the student's own words, rather than parroting of phrases?
Even if he gets something wrong? Communication has
to be two-way to be efficient, right?

If I've "misquoted" in "EVERY post", you have failed to show
me my error, before. You fail again, here. I don't see any
substantive distinction between the *cases* based on what's
above, and what you have said. Echoing the same critique
does not add anything. Please clarify?

It seems to me to be as trivial as the difference between
Multiple regression as the analysis tool of psychologists
(who make some the sloppiest models) and Logistic regression
as the analysis tool of epidemiologists; they show the same
confounding through covariances.


RF > And you EVADED
Post by Reef Fish
the issue that you CANNOT know what "sign" of a multiple regression
coefficient to "expect".
Gee, if you read what I said about epidemiology,
you know that I addressed it. They take it plumb
seriously. Sometimes they learn something new, but
that's a fair game in everyone's book. It's the *arbitrary*
wrong sign that has to be explained, and they try to
explain it, and are not satisfied until they do.

I assume that is fair to take an Observational arena like
psychology and replace it with another one where the
gains have been clearer - Success in epidemiology may
be credited to variables with higher reliability, and that
the medical solutions are more tangible than the
psychological ones. Does your *fundamental* complaint
reduce to the argument that psychologists (or other social
scientists) have relatively lousy variables?

You see, I am trying to *generalize*, and to find the
*principles* of your argument - if there are any.
It seems to me that you are taking a well-recognized
*difficulty* and elevating it to

You are saying "This cannot be done." I am saying,
"Hasn't it been, already?"

[...]
RU >
Post by Reef Fish
Post by Richard Ulrich
why the choice and scaling of variables is important
(as I've said before).
RF >
Post by Reef Fish
That's entirely IRRLEVANT to the point of "expected sign". Scaling
would not change an iota. Choise of variable would not tell you
anything about the expected sign either.
Okay, here is a lesson for you about scaled variables
of symptoms (psychiatry). A regression with a reasonable
criterion, stepping up, can put in a variable whose *function*
is to account for the bad scaling of a previous variable:
You look at plots, and you see that the top end *without*
the variable in question is under- or over-predicting; and
the variable added *fixes* a few outliers ... perhaps, while
adding some new outliers.

Next lesson: After a few variables have been entered in
a stepwise fashion, *most* of the partial coefficients of
the variables-not-entered are decreased. It is not uncommon
to see a whole set of F-to-Enter that are rather less than 1.0,
and they go lower when another variable is put in.

That's what can happen to your pyramid of partial-correlations
when you start with a *careful set* of meaningful variables.
Truly random variables will act more like what you describe.
That is a problem of overfitting, and why I've said through the
years that "stepwise" with a bunch of variables does not lead
to intelligible solutions.

[...]
Post by Reef Fish
Richard, you are wasting your time, everyone's time, arguing through
your hat, on the theory and methods of Multiple Regression you have
not learned properly, and have been abusing.
But, I conclude from your previous statements, the vast
majority of social scientists are doing it my way, or
hoping to learn to. I think it is still useful for me to
refine my statement of what these principles are; I haven't
been through these arguments before, or I expect I
would have gotten bored with your repetition.
- I had not noticed before that the modeling of
epidemiologists serves as a such good example for what
other social scientists are doing, or trying to do.

[...]

RU >
Post by Reef Fish
Post by Richard Ulrich
Do you reject Observational studies on principle?
RF >
Post by Reef Fish
See, you keep bringing up these rhetorical questions that has NOTHING
to do with the issue of the "expect sign" FALLACY.
I *think* that rejection of Observational studies, which
you are just a hair away from, would put you on the outs
with 99.9% of all scientists.

RU >
Post by Reef Fish
Post by Richard Ulrich
If you accept only some of them, can you define *which*,
either by pointing to examples, or describing principles?
RF >
Post by Reef Fish
I do not IMMEDIATE REJECT those that don't VIOLATE the theory and
methods of Multiple Regression. Some are good and some are useless.
That evaluation comes later.
That's a weaseling response, isn't it?

RF >
Post by Reef Fish
I IMMEDIATELY reject the validity of anyone who use the "expected
sign" fallacy because it shows the "researcher's" LACK of expertise
and LACK of credibility in their understanding of the methodology used.
[ ... ]

You seem to lack experience in epidemiology and its
methodology of the last 40 years, so far as I can tell.
<sarcasm > Is it okay what they do, so long as they don't
use certain words when they describe it? </sarcasm>

Back up to my parallel: (Isn't it parallel?) Epidemiologists
insist on making sense of terms in their multiple-variables
models: the direction and also the magnitude. That is *all*
the variables in epidemiology, even more so than in psychology.
- Now, they are frequently wrong in what gets in the media,
even though the experts recognize what is potentially due
to overfitting, capitalizing on chance, or the omission of
important variables. I'm not blindly admiring of epidemiology.
But, aren't their reports (whether right or wrong) *precisely*
like what you object to "in principle" in multiple regression?

If this is the comparison - I rank their success as a more
fruitful lesson than your total faith in your unique rule-of-thumb
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-25 03:18:11 UTC
Permalink
Post by Richard Ulrich
- forty minutes between the date on my post and
the date on this. RF, it shows the signs of haste.
Or indifference.
In shows efficiency in pinpointing what your problem has been,
as I had repeatedly done the SAME so many times before, to
your deaf ears, about your misunderstanding and ABUSE of the
"expected sign" practice.
Post by Richard Ulrich
If I've "misquoted" in "EVERY post", you have failed to show
me my error, before.
I did so everytime. Your "expected sign" ABUSE and FALLCY.
Post by Richard Ulrich
RF > And you EVADED
Post by Reef Fish
the issue that you CANNOT know what "sign" of a multiple regression
coefficient to "expect".
You see, I am trying to *generalize*, and to find the
*principles* of your argument -
You need to SPECIALIZE, to that one blatent BLUNDER of yours in
Multiple regression theory and methology.

Take a refresher course in Multiple Regression Analysis or
Applied Linear Models, from a non-disreputable Statistics
Department, and learn about the SIGN of a multiple regression
coefficient before coming back here to spew further garbage and
pollution.

I've been VERY, VERY, patient with you, Richard. I am losing my
patience. If you were one of my students, I would have flunked
you half way through this "expected sign" thread.

-- Bob.
Reef Fish
2005-04-25 03:32:19 UTC
Permalink
Post by Reef Fish
Post by Richard Ulrich
- forty minutes between the date on my post and
the date on this. RF, it shows the signs of haste.
Or indifference.
It was only 18 minutes this time. The only reason it took so long
was that I was responding to a different post at 8:06, which took
a few minutes, and then responded to your post of vacuous
Multiple Regression "expected sign" SUBSTANCE (see subject),
which is deja vu.
Post by Reef Fish
In shows efficiency in pinpointing what your problem has been,
It didn't take more than a minute or two to see that you haven't
said anything of relevance.

All you have to do is go back and read my last DOZEN reaponses to
your posts and see that you've missed or evaded the POINT of
this subject ("expected sign") every time.

-- Bob.
Richard Ulrich
2005-04-25 18:37:43 UTC
Permalink
- If anyone else is still reading,
can I call for a referee's opinion? -
- Is Reef Fish evasive, or am I? Both?


On 24 Apr 2005 20:32:19 -0700, "Reef Fish"
<***@Yahoo.com> wrote:

[snip, personal slurs]
Post by Reef Fish
All you have to do is go back and read my last DOZEN reaponses to
your posts and see that you've missed or evaded the POINT of
this subject ("expected sign") every time.
In that most recent post by me -
===== start
RF > And you EVADED
Post by Reef Fish
the issue that you CANNOT know what "sign" of a multiple regression
coefficient to "expect".
(me) >
Gee, if you read what I said about epidemiology,
you know that I addressed it. They take it plumb
seriously. Sometimes they learn something new, but
that's a fair game in everyone's book. It's the *arbitrary*
wrong sign that has to be explained, and they try to
explain it, and are not satisfied until they do.
===== end
I'll expand that - MODELs.
Most often the expected sign and value is near the univariate
one. Other values are okay if there's a model for it.

To use your word, I've only EVADED confessing to gross error,
so far as I can tell. Can you point to anything else?

To use it again, it seems to me that you have EVADED giving
an explanation of why your indictment of multi-variable
models in Observation studies does not extend, fully, to
epidemiology.


In your post of 5:57 p.m.,
====== start
me>
Post by Reef Fish
If you accept only some of them, can you define *which*,
either by pointing to examples, or describing principles?
RF >
I do not IMMEDIATE REJECT those that don't VIOLATE the theory and
methods of Multiple Regression. Some are good and some are useless.
That evaluation comes later.
====== end

It seems to me that your personal "theory and methods of
Multiple Regression" states firmly, over and over, that you
can't even be sure of the sign of a variable (not to mention
its value) without considering the highly-arbitrary effects
of all the other variables; which can become increasingly
STRONG as more variables are included.

So, to me, your statement there seems inconsistent with all
your other posts, where you say, "Some are good...."
Perhaps you would expand on that notion of what makes
them good, and our differences would disappear -
--
Rich Ulrich, ***@Pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-25 18:56:22 UTC
Permalink
Post by Richard Ulrich
- If anyone else is still reading,
can I call for a referee's opinion? -
- Is Reef Fish evasive, or am I? Both?
On 24 Apr 2005 20:32:19 -0700, "Reef Fish"
[snip, personal slurs]
Post by Reef Fish
All you have to do is go back and read my last DOZEN reaponses to
your posts and see that you've missed or evaded the POINT of
this subject ("expected sign") every time.
In that most recent post by me -
===== start
RF > And you EVADED
Post by Reef Fish
the issue that you CANNOT know what "sign" of a multiple regression
coefficient to "expect".
(me) >
Gee, if you read what I said about epidemiology,
you know that I addressed it. They take it plumb
seriously. Sometimes they learn something new, but
that's a fair game in everyone's book. It's the *arbitrary*
wrong sign that has to be explained, and they try to
explain it, and are not satisfied until they do.
===== end
That's rationalization. Not addressing the how they ARRIVED
at the "expected sign" in the first place.
Post by Richard Ulrich
To use your word, I've only EVADED confessing to gross error,
so far as I can tell. Can you point to anything else?
You evaded ALL the partial correlation issue in the "expected
sign" abuse.
Post by Richard Ulrich
To use it again, it seems to me that you have EVADED giving
an explanation of why your indictment of multi-variable
models in Observation studies does not extend, fully, to
epidemiology.
Nover said so, if they applied it CORRECTLY. I indicted only
the "expected sign" abusers. Epidemiology is irrelevant.
Post by Richard Ulrich
In your post of 5:57 p.m.,
====== start
me>
Post by Reef Fish
If you accept only some of them, can you define *which*,
either by pointing to examples, or describing principles?
RF >
I do not IMMEDIATE REJECT those that don't VIOLATE the theory and
methods of Multiple Regression. Some are good and some are useless.
That evaluation comes later.
====== end
It seems to me that your personal "theory and methods of
Multiple Regression" states firmly, over and over, that you
can't even be sure of the sign of a variable (not to mention
its value) without considering the highly-arbitrary effects
of all the other variables;
You are FINALLY getting that POINT!

That is a FACT based on the THEORY (of Multiple Regression Models)
which precedes any proper application of its methodolgy.


which can become increasingly
Post by Richard Ulrich
STRONG as more variables are included.
So, to me, your statement there seems inconsistent with all
your other posts, where you say, "Some are good...."
Perhaps you would expand on that notion of what makes
them good, and our differences would disappear -
There is absolutely NO inconsistency in anything I've posted on
this subject.

Get past the first step of learning about the "expected sign"
fallacy FIRST. After you take your remedial refresher course,
I'll be glad to discuss OTHER issue of Data Analysis and
Model Building, without charging you the tuition the Harvard
Dept of Statistics would charge you for such a course.
Post by Richard Ulrich
--
http://www.pitt.edu/~wpilib/index.html
-- Bob.
Anon.
2005-04-25 19:03:29 UTC
Permalink
Post by Richard Ulrich
- If anyone else is still reading,
can I call for a referee's opinion? -
- Is Reef Fish evasive, or am I? Both?
Well, I think Reef Fish is, if not actually being evasive, then not
fully engaging with your argument. Which is a pity, as it would help
the rest of us see explicitly what role multiple regression he feels
has, and how he feels it can be used.

Unfortunately, you've both reached the point where the rhetoric is
preventing the possibility of mutual understanding.

Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
Reef Fish
2005-04-26 02:12:47 UTC
Permalink
Richard Ulrich wrote: <at 11:36 am>
Post by Richard Ulrich
- If anyone else is still reading,
can I call for a referee's opinion? -
- Is Reef Fish evasive, or am I? Both?
That's really getting to the gutter/tabloid level for someone
who is deficient in his knowledge about Multiple Regression
theory and methods!

What referee? Anyone who understood what BOTH of us were saying
was and always is welcome to put in their comments one way or another,
without this pointless call, which served no purpose other than
getting an officious poster completely clueless about the SUBSTANCE
of the discussion to put in his NOISE <that's you, Anon, aka
Bob O'Hara>.


I replied to Ulrich at 11:56 am and gave an absolutely unequivocal
and non-evasive response to ALL of Ulrich's points, and concluded
with

RF> Get past the first step of learning about the "expected sign"
RF> fallacy FIRST. After you take your remedial refresher course,
RF> I'll be glad to discuss OTHER issue of Data Analysis and
RF> Model Building, without charging you the tuition the Harvard
RF> Dept of Statistics would charge you for such a course.
Well, I think Reef Fish is, if not actually being evasive, then not
fully engaging with your argument. Which is a pity, as it would help
the rest of us see explicitly what role multiple regression he feels
has, and how he feels it can be used.
That only gave an unmistakable self-indictment that you didn't
get the point of the "expected sign" thread. You had already proven
that fact the couple times you jumped in the middle without a clue,
and I suspect you haven't read 1/10 of what I posted on the sobject
and your understanding of the technical substance about THIS SUBJECT
(look at it: the "expected sign" fallacy in Multiple Regression)
is exactly ZERO, ZILCH.
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
From your LIFETIME posting history in sci.stat.math, it appears
that vitually ALL your postings concerned trivial undergraduate
questions in prob and stat. In the few times (topics) in which
I was in the same thread, your points were at best sophomoric,
if not vacuous.

I don't think at the time you posted your impertinence, you had
read my response to Ulrich, which appeared 7 minutes before
your post. You should read it.

This topic is far above your head, Bob O'Hara.

-- Bob.
Richard Ulrich
2005-04-26 04:40:47 UTC
Permalink
On 25 Apr 2005 19:12:47 -0700, "Reef Fish"
<***@Yahoo.com> wrote:
[... ]
Post by Reef Fish
I replied to Ulrich at 11:56 am and gave an absolutely unequivocal
and non-evasive response to ALL of Ulrich's points, and concluded
with
[... ]

My reader showed me exactly THESE lines in your response :

=== (deleting/ summarizing my lines)
( ... me, quoting my previous post where I described an origin
for "expected sign.")
That's rationalization. Not addressing the how they ARRIVED
at the "expected sign" in the first place.
...
You evaded ALL the partial correlation issue in the "expected
sign" abuse.
( ... about epidemiology)
Nover said so, if they applied it CORRECTLY. I indicted only
the "expected sign" abusers. Epidemiology is irrelevant.
( ... Bob says, can't know anything about coefficients)
You are FINALLY getting that POINT!

That is a FACT based on the THEORY (of Multiple Regression Models)
which precedes any proper application of its methodolgy.
( ... )
There is absolutely NO inconsistency in anything I've posted on
this subject.

Get past the first step of learning about the "expected sign"
fallacy FIRST. After you take your remedial refresher course,
I'll be glad to discuss OTHER issue of Data Analysis and
Model Building, without charging you the tuition the Harvard
Dept of Statistics would charge you for such a course.
==== end

That's cute - Here is the "full and unequivocal answer"
about the difference between epidemiology and other
Observational studies, and an explanation for why
they work differently from other studies - "irrelevant".

... a little weak on articulation, yes?

If that's the deepest you can go - you keep choking
at that point - then I guess that's the best you can do.
--
Rich Ulrich, ***@Pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-26 04:58:06 UTC
Permalink
Post by Richard Ulrich
On 25 Apr 2005 19:12:47 -0700, "Reef Fish"
[... ]
Post by Reef Fish
I replied to Ulrich at 11:56 am and gave an absolutely unequivocal
and non-evasive response to ALL of Ulrich's points, and concluded
with
[... ]
=== (deleting/ summarizing my lines)
( ... me, quoting my previous post where I described an origin
for "expected sign.")
That's rationalization. Not addressing the how they ARRIVED
at the "expected sign" in the first place.
...
You evaded ALL the partial correlation issue in the "expected
sign" abuse.
( ... about epidemiology)
Nover said so, if they applied it CORRECTLY. I indicted only
the "expected sign" abusers. Epidemiology is irrelevant.
( ... Bob says, can't know anything about coefficients)
You are FINALLY getting that POINT!
That is a FACT based on the THEORY (of Multiple Regression Models)
which precedes any proper application of its methodolgy.
( ... )
There is absolutely NO inconsistency in anything I've posted on
this subject.
Get past the first step of learning about the "expected sign"
fallacy FIRST. After you take your remedial refresher course,
I'll be glad to discuss OTHER issue of Data Analysis and
Model Building, without charging you the tuition the Harvard
Dept of Statistics would charge you for such a course.
==== end
You quoted me CORRECTLY. Especially the last paragraph.
Post by Richard Ulrich
That's cute - Here is the "full and unequivocal answer"
about the difference between epidemiology and other
Observational studies, and an explanation for why
they work differently from other studies - "irrelevant".
... a little weak on articulation, yes?
STILL did not address the question of "expected sign", did you?
Post by Richard Ulrich
If that's the deepest you can go - you keep choking
at that point - then I guess that's the best you can do.
Richard, you're choking yourself with IGNORANCE on the
Methodology of Multiple Regression, and your inability to
address the "expected sign" abuse you've had ever since you
started, and continue to evade it.

You have proven beyond a shadow of a doubt of your
INCOMPETENCE in the theory and methodology of Multiple
Regression Analysis.

You can quote me on this and show any of my posts about
your "expected sign" ABUSE and FALLCY, to ANY competent
statistician (there must be some at your university), and
tell us what they say, if you so desire.

Don't show the snippets you clip out of context or your
mis-paraphrase and misrepresentation. Show them the ENTIRE
TRANSCRIPT.

That's the best I can offer you.

-- Bob.
Richard Ulrich
2005-04-17 19:57:14 UTC
Permalink
On Sun, 17 Apr 2005 15:51:27 +0100, tHatDudeUK
Post by tHatDudeUK
Hi,
My sample size is 149. I have one dependent variable and 10
independent (or predictor) variables which I'm analysing using
multiple linear regression (with the enter method).
Stepwise regression is seldom a good idea. None of the tests
are good. For a typical problem, the wrong variables are chosen.
Where is it good? -- When you know there will be a prediction
from any set of the variables, and you want a short equation.

You can see my stat-FAQ for Posted comments on the subject,
which I collected a few years ago. You can google groups
in < sci.stats.* > for additional comments on Stepwise, and
for references.
Post by tHatDudeUK
There are some assumptions of multiple regression I'm not sure how to
test for, including normality, homoscedasticity and linearity. I have
Going in, the important thing to consider is that there are
no extreme outliers -- or gaps in the distributions.
That matters for the predictors and for the outcome.
Are there any "basement" or "ceiling" effects on scales?

The content of the variables and how they arose, etc., is
usually a pretty good guide to the main aspects of linearity.

Those assumptions matter for the purpose of a *test* rather
than the purpose of *doing* a linear regression; you can do
a regression with anything you have (barring extreme multi-
collinearity).
Post by tHatDudeUK
already considered collinearity (that's the easy one). Not sure if
I've missed any out. I've been told there's an easy graphical way to
do this but I'm not sure how to do this. I think it might be
prefereable to use a statistical test where possible though.
If you can provide any pointers to where I should be looking I'd be
very grateful.
...
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-17 20:54:41 UTC
Permalink
Post by Richard Ulrich
On Sun, 17 Apr 2005 15:51:27 +0100, tHatDudeUK
Post by tHatDudeUK
Hi,
My sample size is 149. I have one dependent variable and 10
independent (or predictor) variables which I'm analysing using
multiple linear regression (with the enter method).
Stepwise regression is seldom a good idea.
And stepping backwards is almost always better than stepping forward,
for the obvious that X1 and X2 may fit Y perfectly as a pair, but
neither will do as well by itself as X3.
Post by Richard Ulrich
Going in, the important thing to consider is that there are
no extreme outliers -- or gaps in the distributions.
This comment needs to be further clarified.

An extreme outlier can be in the (X-y) space or in the space of
residuals (observed errors).

In the older books, papers, and older software packages, undue
emphases had often beeb placed on the observed error space only.
But some of those "outliers" near the center of the X-space may
have as little as ZERO influence on the fitted model (if it
coincides with X-bar).

Conversely, there may be points far from the center of X that
exerts tremendous influence on the fitted model but would not
exhibit itself as an "outlier" in the residual space.


Jerry Dallal gave a simple expository explanation of these concepts
of "leverage points" and "influential points" in the link:

http://www.tufts.edu/~gdallal/diagnose.htm


The new subject area has been known as "regression diagnostics",

much more than just the analysis of "outliers", e.g.
http://www.google.com/search?hl=en&q=textbook+on+regression+diagnostics

Cook and Weisberg (1982) is another standard reference text:
http://www.math.montana.edu/Rweb/Rhelp/influence.measures.html

Chatterjee, Hadi, and Price (1984?) is wtill another:
http://www.ats.ucla.edu/stat/examples/chp/default.htm

See also Chatterjee, S., and A. S. Hadi, "Influential Observations,
High Leverage Points, and Outliers in Linear Regression," Statistical
Science, 1:379-416, 1986.

and many other references in post-1980 literature.

-- Bob.
Reef Fish
2005-04-18 03:28:12 UTC
Permalink
Post by Reef Fish
An extreme outlier can be in the (X-y) space or in the space of
residuals (observed errors).
In the older books, papers, and older software packages, undue
emphases had often beeb placed on the observed error space only.
But some of those "outliers" near the center of the X-space may
have as little as ZERO influence on the fitted model (if it
coincides with X-bar).
Conversely, there may be points far from the center of X that
exerts tremendous influence on the fitted model but would not
exhibit itself as an "outlier" in the residual space.
Jerry Dallal gave a simple expository explanation of these concepts
http://www.tufts.edu/~gdallal/diagnose.htm
Having endorsed Jerry's expository explanation, I found a disturbing
statement which I feel obliged to explain my complete disagreement:

JD> Belsley, Kuh, and Welch. Roy Welch tells of getting interested
JD> in regression diagnostics when he was once asked to fit models
JD> to some banking data. When he presented his results to his
clients,
JD> they remarked that the model could not be right because the sign
JD> of one of the predictors was different from what they expected.

The "expected sign" of a (multiple) regression coefficient is the one
single ERROR most often committed by social scientists and economist in
their interpretation of regression coefficients.

Over the years, I have not found a SINGLE CASE in which a justification
was given (nor hinted) on where the "expectation" of the expected sign
came from.

The ERROR was always when the user think of the sign of a multiple
correlation coefficient as the sign of the SIMPLE correlation between
that X and Y, whereas the SIGN of the coefficient is the sign of the
PARTIAL correlation between that X and Y, GIVEN all the rest of the
independent variables in the regression!

I am skeptical of Jerry's attribution of the anecdotal account to Roy
Welsch, but I KNOW Belsley and kuh (economists) made that ERROR
numerous
times in their book.

One of the latest national news is the NEW SAT exam, consisting of
Verbal, Quantitative, and Essay. Let's say those are THREE of some
10 independent variables used to predict (of fit) the GPR of admitted
college Freshmen after their first year, so that the data can be used
in FUTURE years (by the admissions office) to decide whether to admit
certain students based on their SAT scores and the info in the other
predictor variables (whatever they are). This is commonly done by
the admissions office of universities. In fact they have different
predicted success models for students in Engineering Colleges and
Liberal Arts and other Colleges.

Now what would you say is the "expected sign" of the coefficients of
the SAT verbal and SAT math variables?

If you say "positive", then you are WRONG!!!!! because you don't even
know what the other variables are. In the PRESENCE of some other
variables (such as another math achievement score), the coefficient
of the SAT Math variable MAY very well be NEGATIVE, because in the
PRESENCE of the other variable, the partial correlation between that
SAT math score and the GPR at the end of the Freshmen year may be
theoretically and empirically negative, and there would be nothing
wrong about that sign.

The MIS-interpretation of the "expected sign" of multiple regression
coefficient gave rise to a flurry of papers on Ridge Regression, for
the sole purpose of make the observed sign "correct", when they could
give NO raason (or even know that those are not SIMPLE ocrrelation
signs) why any sign should have a "positive" or "negative" expectation.

I have rejected more submitted journal papers based on that faulty and
false premise than you can imagine. But such misinterpretations are
EVERYWHERE in the applied journals of economics and social sciences.

VERY Important FACT in Multiple Regression
(about the SIGN of a regression coefficient)

It does NOT have the sign of the SIMPLE correlation between that X
and Y, unless the independent variables are completely and mutually
orthogonal to each -- which is almost NEVER the case in the
observational data used in multiple regression analysis.

The "expected sign" is the "expected" (non mathematical) sign of
the PARTIAL CORRELATION (or the correlation between that X in the
presence of ALL the other X's in the regression equation.

Thus, the both the "expected sign" and the magnitude of the
multiple regression coefficient change, depending on WHICH VARIABLES
are in the full equation.

I've seen the "expected sign" MIS-interpreted every time I've seen
that term used in a multiple regression context; I've NEVER seen
anyone argue on why that sign is expected to be "positive" or
"negative" by arguing from a partial correlation point of view!

-- Bob.
jim clark
2005-04-18 04:27:43 UTC
Permalink
Hi
Post by Reef Fish
Having endorsed Jerry's expository explanation, I found a disturbing
JD> Belsley, Kuh, and Welch. Roy Welch tells of getting interested
JD> in regression diagnostics when he was once asked to fit models
JD> to some banking data. When he presented his results to his
clients,
JD> they remarked that the model could not be right because the sign
JD> of one of the predictors was different from what they expected.
The "expected sign" of a (multiple) regression coefficient is the one
single ERROR most often committed by social scientists and economist in
their interpretation of regression coefficients.
Over the years, I have not found a SINGLE CASE in which a justification
was given (nor hinted) on where the "expectation" of the expected sign
came from.
The ERROR was always when the user think of the sign of a multiple
correlation coefficient as the sign of the SIMPLE correlation between
that X and Y, whereas the SIGN of the coefficient is the sign of the
PARTIAL correlation between that X and Y, GIVEN all the rest of the
independent variables in the regression!
If you are using expectation in the sense of hypothesized, your
statement would appear incorrect for many literatures that use
multiple regression. That is, many researchers predict a sign on
the basis of an hypothesized causal relationship between x1 and y
controlling for other predictors that could produce spurious
results or mask an observed effect. In Streisguth's work on the
influence of maternal alcohol consumption on subsequent
children's iqs, for example, the predicted (and observed) sign
for the alcohol variable is negative and remains so even with
dozens of other predictors in the equation.

Best wishes
Jim

============================================================================
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
Winnipeg, Manitoba R3B 2E9 ***@uwinnipeg.ca
CANADA http://www.uwinnipeg.ca/~clark
============================================================================
Reef Fish
2005-04-18 05:21:44 UTC
Permalink
Post by jim clark
Hi
Post by Reef Fish
Having endorsed Jerry's expository explanation, I found a
disturbing
Post by jim clark
Post by Reef Fish
JD> Belsley, Kuh, and Welch. Roy Welch tells of getting interested
JD> in regression diagnostics when he was once asked to fit models
JD> to some banking data. When he presented his results to his
clients,
JD> they remarked that the model could not be right because the sign
JD> of one of the predictors was different from what they
expected.
Post by jim clark
Post by Reef Fish
The "expected sign" of a (multiple) regression coefficient is the one
single ERROR most often committed by social scientists and
economist in
Post by jim clark
Post by Reef Fish
their interpretation of regression coefficients.
Over the years, I have not found a SINGLE CASE in which a
justification
Post by jim clark
Post by Reef Fish
was given (nor hinted) on where the "expectation" of the expected sign
came from.
The ERROR was always when the user think of the sign of a multiple
correlation coefficient as the sign of the SIMPLE correlation between
that X and Y, whereas the SIGN of the coefficient is the sign of the
PARTIAL correlation between that X and Y, GIVEN all the rest of the
independent variables in the regression!
If you are using expectation in the sense of hypothesized, your
statement would appear incorrect for many literatures that use
multiple regression.
Read my post again, more carefully. That's exactly what I mean --
that the researchers have no A PRIORI reason to hypothesize the
relation of the partial correlation information. They merely
mistook the expected sign of the SIMPLE correlation as if it were
the partial correlation.
Post by jim clark
That is, many researchers predict a sign on
the basis of an hypothesized causal relationship between x1 and y
That's still another blunder in application, hypothesizing CAUSAL
relation without doing a carefully designed controlled experiment.
Post by jim clark
In Streisguth's work on the
influence of maternal alcohol consumption on subsequent
children's iqs, for example, the predicted (and observed) sign
for the alcohol variable is negative and remains so even with
dozens of other predictors in the equation.
But can he argue A PRIORI why it should remain negative in the
presence of dozens of other variables?

It appears to be a classic case of the MISUSE and MIS-interpretation
of multiple regression coefficients to me.
Post by jim clark
Best wishes
Jim
============================================================================
Post by jim clark
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
CANADA http://www.uwinnipeg.ca/~clark
============================================================================

-- Bob.
jim clark
2005-04-18 20:43:19 UTC
Permalink
Hi
Post by Reef Fish
Post by jim clark
If you are using expectation in the sense of hypothesized, your
statement would appear incorrect for many literatures that use
multiple regression.
Read my post again, more carefully. That's exactly what I mean --
that the researchers have no A PRIORI reason to hypothesize the
relation of the partial correlation information. They merely
mistook the expected sign of the SIMPLE correlation as if it were
the partial correlation.
But right below I give the A PRIORI reason ... i.e., a theory!
Post by Reef Fish
Post by jim clark
That is, many researchers predict a sign on
the basis of an hypothesized causal relationship between x1 and y
That's still another blunder in application, hypothesizing CAUSAL
relation without doing a carefully designed controlled experiment.
No, you can hypothesize all kinds of causal connections. That's
what a theory is. The strongest test of them is through
controlled experiments.
Post by Reef Fish
Post by jim clark
In Streisguth's work on the
influence of maternal alcohol consumption on subsequent
children's iqs, for example, the predicted (and observed) sign
for the alcohol variable is negative and remains so even with
dozens of other predictors in the equation.
But can he argue A PRIORI why it should remain negative in the
presence of dozens of other variables?
She would argue that if the hypothesized causal connection
between maternal alcohol consumption and brain development is
correct, then the effect should remain significant even if we
control for a number of other potentially confounding predictors
(e.g., mother's education, medication, diet, mother's age,
smoking, ...). And that is what she finds. Supplementary
evidence comes from controlled animal experiments.
Post by Reef Fish
It appears to be a classic case of the MISUSE and MIS-interpretation
of multiple regression coefficients to me.
Not so to me. Multiple regression tells you whether or not
variation in a predictor independent of all other predictors in
the equation has any significant relationship to the criterion
variable. One use of this principle is to determine whether the
predicted effect of a predictor remains when potential
confounding variables are controlled statistically (i.e., their
covariation with the target predictor is removed).

Another use that again leads to predicted results is to control
statistically for inter-relationships among predictors that mask
their effects. In research on the contribution of aptitude and
study time to grades, for example, one often observes weak and
perhaps even slightly negative simple relationships between study
time and grades. But this is because aptitude and study time
tend to be negatively correlated (brighter students study less to
learn the material). The hypothesis that more study time
increases grades (other things being equal) predicts a positive
relationship between study time and grades when both aptitude and
study time are included as predictors. And that is exactly what
happens.

There are surely innumerable examples in various literatures of
these uses of multiple regression and the researchers are
certainly NOT confusing simple and partial correlation.

Best wishes
Jim

============================================================================
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
Winnipeg, Manitoba R3B 2E9 ***@uwinnipeg.ca
CANADA http://www.uwinnipeg.ca/~clark
============================================================================
Reef Fish
2005-04-18 22:32:32 UTC
Permalink
Post by jim clark
Hi
Post by Reef Fish
Post by jim clark
If you are using expectation in the sense of hypothesized, your
statement would appear incorrect for many literatures that use
multiple regression.
Read my post again, more carefully. That's exactly what I mean --
that the researchers have no A PRIORI reason to hypothesize the
relation of the partial correlation information. They merely
mistook the expected sign of the SIMPLE correlation as if it were
the partial correlation.
But right below I give the A PRIORI reason ... i.e., a theory!
No. Your examples are those who SPECULATES on the sign, when they
have no earthly idea why the partial correlation between two variables,
in the presence of DOZENS of other variables, should be the sign they
expect!

Then they INSISTS on empirical models that fit that speculation.
Post by jim clark
Post by Reef Fish
Post by jim clark
That is, many researchers predict a sign on
the basis of an hypothesized causal relationship between x1 and y
That's still another blunder in application, hypothesizing CAUSAL
relation without doing a carefully designed controlled experiment.
No, you can hypothesize all kinds of causal connections.
Yup. The same abusers practice path-analysis and think that
observational data fitted to multiple regression ascertains CAUSAL
effect.
Post by jim clark
Post by Reef Fish
Post by jim clark
In Streisguth's work on the
influence of maternal alcohol consumption on subsequent
children's iqs, for example, the predicted (and observed) sign
for the alcohol variable is negative and remains so even with
dozens of other predictors in the equation.
But can he argue A PRIORI why it should remain negative in the
presence of dozens of other variables?
She would argue that if the hypothesized causal connection
Strike 1. Causation through regression?
Post by jim clark
between maternal alcohol consumption and brain development is
correct, then the effect should remain significant even if we
control for a number of other potentially confounding predictors
(e.g., mother's education, medication, diet, mother's age,
smoking, ...). And that is what she finds.
Strike 2. Garbage In, Garbage Out.
Post by jim clark
Supplementary
evidence comes from controlled animal experiments.
Strike 3. Cart before the Horse (is that what you mean by animal
experiment?). <g> Should have done the controlled
experiment FIRST.
Post by jim clark
Post by Reef Fish
It appears to be a classic case of the MISUSE and
MIS-interpretation
Post by jim clark
Post by Reef Fish
of multiple regression coefficients to me.
I stand on that remark.
Post by jim clark
Not so to me. Multiple regression tells you whether or not
variation in a predictor independent of all other predictors in
the equation has any significant relationship to the criterion
variable. One use of this principle is to determine whether the
predicted effect of a predictor remains when potential
confounding variables are controlled statistically (i.e., their
covariation with the target predictor is removed).
That's skirting around the issue of NOT UNDERSTANDING how to
interpret the multiple regression coefficients as having signs
of particular PARTIAL correlations.
Post by jim clark
There are surely innumerable examples in various literatures of
these uses of multiple regression and the researchers are
certainly NOT confusing simple and partial correlation.
Name ONE, in those who talk about "expected sign", especially
when the "conditioned variables" are not even identified.

Read my reply to Richard Ulrich on this same topic.
Post by jim clark
Best wishes
Jim
============================================================================
Post by jim clark
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
CANADA http://www.uwinnipeg.ca/~clark
============================================================================


-- Bob.
jim clark
2005-04-19 04:36:26 UTC
Permalink
Hi
Post by Reef Fish
Post by jim clark
But right below I give the A PRIORI reason ... i.e., a theory!
No. Your examples are those who SPECULATES on the sign, when they
have no earthly idea why the partial correlation between two variables,
in the presence of DOZENS of other variables, should be the sign they
expect!
You appear to be saying that no one is ever justified in
hypothesizing a causal relationship between two variables,
irrespective of how they test it. You cannot imagine a
rationale, for example, to warrant the expectation of increases
in grades with increases in study time (all other things being
equal)? Or a rationale to warrant the expectation of decreases
in children's intelligence with increases in maternal alcohol
consumption (other things being equal)? It is completely
irrelevant to these expectations how the researchers plan to test
them.
Post by Reef Fish
Then they INSISTS on empirical models that fit that speculation.
I don't understand this statement. The results, whether
non-experimental or experimental, turn out to be consistent with
the prediction or not.
Post by Reef Fish
Post by jim clark
Post by Reef Fish
Post by jim clark
That is, many researchers predict a sign on
the basis of an hypothesized causal relationship between x1 and y
That's still another blunder in application, hypothesizing CAUSAL
relation without doing a carefully designed controlled experiment.
No, you can hypothesize all kinds of causal connections.
Yup. The same abusers practice path-analysis and think that
observational data fitted to multiple regression ascertains CAUSAL
effect.
Wrong again. A theory is completely independent of the means
that you use to test it. Often the multiple tests of a theory
involve all kinds of non-experimental, experimental, and
quasi-experimental phenomena. You seem to lack any conception of
a theory independent of how it is tested.
Post by Reef Fish
Post by jim clark
Post by Reef Fish
Post by jim clark
In Streisguth's work on the
influence of maternal alcohol consumption on subsequent
children's iqs, for example, the predicted (and observed) sign
for the alcohol variable is negative and remains so even with
dozens of other predictors in the equation.
But can he argue A PRIORI why it should remain negative in the
presence of dozens of other variables?
She would argue that if the hypothesized causal connection
Strike 1. Causation through regression?
No causation specified by a theoretical model.
Post by Reef Fish
Post by jim clark
between maternal alcohol consumption and brain development is
correct, then the effect should remain significant even if we
control for a number of other potentially confounding predictors
(e.g., mother's education, medication, diet, mother's age,
smoking, ...). And that is what she finds.
Strike 2. Garbage In, Garbage Out.
What does this mean?
Post by Reef Fish
Post by jim clark
Supplementary
evidence comes from controlled animal experiments.
Strike 3. Cart before the Horse (is that what you mean by animal
experiment?). <g> Should have done the controlled
experiment FIRST.
For all I know they may have. But what difference does it make?
Or perhaps the hypothesis was originally developed based on
simple observational data; e.g., children of mothers who drank
during pregnancy were observed informally to be slower at school.
A bell goes off in someone's head and they (and others) decide to
test a causal link. Some studies involve non-experimental data
on humans, others tightly controlled experiments with animals.
Post by Reef Fish
Post by jim clark
Post by Reef Fish
It appears to be a classic case of the MISUSE and
MIS-interpretation
Post by jim clark
Post by Reef Fish
of multiple regression coefficients to me.
I stand on that remark.
So your view is that the regression results are just as
consistent with the competing hypotheses of alcohol having a
positive effect on children's intelligence, no effect, or a
negative effect? The findings add nothing to our understanding
of the relationship between maternal alcohol use and child
intelligence?
Post by Reef Fish
Post by jim clark
Not so to me. Multiple regression tells you whether or not
variation in a predictor independent of all other predictors in
the equation has any significant relationship to the criterion
variable. One use of this principle is to determine whether the
predicted effect of a predictor remains when potential
confounding variables are controlled statistically (i.e., their
covariation with the target predictor is removed).
That's skirting around the issue of NOT UNDERSTANDING how to
interpret the multiple regression coefficients as having signs
of particular PARTIAL correlations.
No it is not. The sign is that of the correlation between
the _residual_ predictor and the criterion (or the residual
criterion in the case of partial rs).
Post by Reef Fish
Post by jim clark
There are surely innumerable examples in various literatures of
these uses of multiple regression and the researchers are
certainly NOT confusing simple and partial correlation.
Name ONE, in those who talk about "expected sign", especially
when the "conditioned variables" are not even identified.
I already did, but without even reading the literature on
maternal alcohol consumption, you were able to ascertain that it
was "garbage." And I doubt that giving a citation for my other
example (study time and grades) would fair any better in your
mind.
Post by Reef Fish
Read my reply to Richard Ulrich on this same topic.
I started to, but it did not enlighten me any more than your
comments in this thread, which truly strike me as bizarre (e.g.,
one cannot hypothesize a causal relation in a theory, predicting
and finding an expected effect [as opposed to the opposite or no
effect] in a non-experimental study is meaningless even if
numerous potential confounds are controlled statistically).
Sorry to be so testy ... too much marking right now!

Best wishes
Jim

============================================================================
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
Winnipeg, Manitoba R3B 2E9 ***@uwinnipeg.ca
CANADA http://www.uwinnipeg.ca/~clark
============================================================================
Reef Fish
2005-04-19 07:16:36 UTC
Permalink
Post by jim clark
Hi
Post by Reef Fish
Post by jim clark
But right below I give the A PRIORI reason ... i.e., a theory!
No. Your examples are those who SPECULATES on the sign, when they
have no earthly idea why the partial correlation between two
variables,
Post by jim clark
Post by Reef Fish
in the presence of DOZENS of other variables, should be the sign they
expect!
You appear to be saying that no one is ever justified in
hypothesizing a causal relationship between two variables,
irrespective of how they test it.
I wasn't saying that at all, even though a multiple regression on
observation data is definitely NOT a valid way to ascertain cause.

But here, the "expected sign", as used by the practioners, is just
an un-tested, un-reasoned invalid GUESS hoisted upon a regression
model, as if it were necessarily true.
Post by jim clark
Post by Reef Fish
Then they INSISTS on empirical models that fit that speculation.
I don't understand this statement.
Take the Ridge Regression folks who insist that certain signs must
agree with what they expected <sic for speculated> because they
wrongly taken simple correlationa to be where partial correlations
are aupposed to be.
Post by jim clark
Wrong again. A theory is completely independent of the means
that you use to test it. Often the multiple tests of a theory
involve all kinds of non-experimental, experimental, and
quasi-experimental phenomena. You seem to lack any conception of
a theory independent of how it is tested.
You lack the understanding of TESTING METHODOLOGY, and that certain
methods are simply not valid for testing causal hypotheses.

In the INTERPRETATION of the sign of the multiple regression
coefficient, there is NO assumption, NO test, NO speculation
necessary. They are the signs of PARTIAL correlations, no ifs
or buts.
Post by jim clark
Post by Reef Fish
Post by jim clark
Post by Reef Fish
It appears to be a classic case of the MISUSE and
MIS-interpretation
Post by jim clark
Post by Reef Fish
of multiple regression coefficients to me.
I stand on that remark.
So your view is that the regression results are just as
consistent with the competing hypotheses of alcohol having a
positive effect on children's intelligence, no effect, or a
negative effect? The findings add nothing to our understanding
of the relationship between maternal alcohol use and child
intelligence?
I don't speculate that kind of hypotheses on mere regression results
on observational data. Many regression-methodolgy ABUSERS do.

Besides, make it easy for yourself but thinking of PREDITIVE models.
They don't have to "explain" anything. They you'll rid yourself
of all those red-herrings dangling everywhere, and will be able
to concentrate on the TECHNICAL meaning of the coefficients, that
they are NECESSARILY the partial correlation information, and
not what some researcher speculates in the presence of DOZENS
of variables in the partial correlations in question.

Show me a person who can reaonably argue what they can "expect"
(a priori) the SIGN of a partial correlation between X and Y, in
the presence of a dozen other variables z1, z2, ..., z12, and
I'll show you a wishful thinking fool, or one deficient in the
understanding of the THEORY behind multiple regression analysis.

Take your pick. It's 100% guaranteed to be one of the two, or
both.
Post by jim clark
Post by Reef Fish
Post by jim clark
There are surely innumerable examples in various literatures of
these uses of multiple regression and the researchers are
certainly NOT confusing simple and partial correlation.
Name ONE, in those who talk about "expected sign", especially
when the "conditioned variables" are not even identified.
I already did, but without even reading the literature on
maternal alcohol consumption, you were able to ascertain that it
was "garbage."
The garbage part is the misuse of multiple regresssion to "EXPLAIN"
causal phenomenon. Let me repeat what you kept on overlooking:

Show me a person who can reaonably argue what they can "expect"
(a priori) the SIGN of a partial correlation between X and Y, in
the presence of a dozen other variables z1, z2, ..., z12, and
I'll show you a wishful thinking fool, or one deficient in the
understanding of the THEORY behind multiple regression analysis
Post by jim clark
Post by Reef Fish
Read my reply to Richard Ulrich on this same topic.
I started to, but it did not enlighten me any more than your
comments in this thread, which truly strike me as bizarre (e.g.,
Only to one who is deficient in his understanding of what regression
CAN or CANNOT do.
Post by jim clark
one cannot hypothesize a causal relation in a theory, predicting
and finding an expected effect [as opposed to the opposite or no
effect] in a non-experimental study is meaningless even if
numerous potential confounds are controlled statistically).
Sorry to be so testy ... too much marking right now!
You are wandering in the wrong ball park, light-years away from
the central issue of the meaning of the SIGNS of multiple
regression coefficients.
Post by jim clark
Best wishes
Jim
============================================================================
Post by jim clark
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
CANADA http://www.uwinnipeg.ca/~clark
============================================================================

-- Bob.
Anon.
2005-04-19 08:27:54 UTC
Permalink
<snip>
Post by Reef Fish
Post by jim clark
Wrong again. A theory is completely independent of the means
that you use to test it. Often the multiple tests of a theory
involve all kinds of non-experimental, experimental, and
quasi-experimental phenomena. You seem to lack any conception of
a theory independent of how it is tested.
You lack the understanding of TESTING METHODOLOGY, and that certain
methods are simply not valid for testing causal hypotheses.
I think you're talking at cross purposes here. Jim (and I) read RF's
original comment as being about producing hypotheses, whereas RF is now
talking about testing hypotheses. These are usually taken to be
separate processes: you dream up a hypothesis (sometimes literally), and
then you test it. But you can come to a hypothesis in any way you wish
- it's the testing where you need a methodology.

<snip>
Post by Reef Fish
Post by jim clark
So your view is that the regression results are just as
consistent with the competing hypotheses of alcohol having a
positive effect on children's intelligence, no effect, or a
negative effect? The findings add nothing to our understanding
of the relationship between maternal alcohol use and child
intelligence?
I don't speculate that kind of hypotheses on mere regression results
on observational data. Many regression-methodolgy ABUSERS do.
Besides, make it easy for yourself but thinking of PREDITIVE models.
I guess you missed a C there (not that I'm one to talk). Alternatively,
you could have meant predative, and in a twisted way I like the idea of
models stalking the landscape, looking for data to chew up.

"Aaagh, no! I'm being attacked by the fearsome loglinear model. AND
it's over-dispersed!"

Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
Reef Fish
2005-04-20 06:45:35 UTC
Permalink
Post by Anon.
<snip>
Post by Reef Fish
You lack the understanding of TESTING METHODOLOGY, and that certain
methods are simply not valid for testing causal hypotheses.
I think you're talking at cross purposes here.
Only as a comment of Jim's tangent into TESTING, while the topic
had always been the interpretation of the SIGN of a multiple
regression coefficient.

I think plenty had been posted about that subject and using
Mosteller and Tukey's explanation and example.

Anyone who still doesn't understand that you CANNOT "expect
the sign" of one coefficient when you don't know the other
variables in the regression model is just WASTING everyone's
time.

There had already been tons of garbage in the literature already
on that one single abuse. We need no more.

-- Bob the Reef Fish.
(Fellow of the ASA also)
Post by Anon.
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
Bruce Weaver
2005-04-20 11:29:06 UTC
Permalink
---- big snip ----
Post by Reef Fish
Post by jim clark
She would argue that if the hypothesized causal connection
Strike 1. Causation through regression?
Careful now. This suggests that the form of analysis, rather than the
study design, determines whether one can make a causal attribution.
That's obviously not so, because the data from a good old fashioned
two-group experiment could be analyzed with regression (instead of the
more common t-test or ANOVA). Karl Wuensch has a webpage on that topic
("When does Correlation Imply Causation?").

http://core.ecu.edu/psyc/wuenschk/StatHelp/Correlation-Causation.htm
--
Bruce Weaver
***@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
Reef Fish
2005-04-20 12:52:36 UTC
Permalink
Post by Bruce Weaver
---- big snip ----
Post by Reef Fish
Post by jim clark
She would argue that if the hypothesized causal connection
Strike 1. Causation through regression?
Careful now. This suggests that the form of analysis, rather than the
study design, determines whether one can make a causal attribution.
Not so. The form of the ANALYSIS has nothing to do with it.

She did NOT do any pre-requisite controlled experiment in the first
place. She "hypothesized causal connection.

You CANNOT draw an valid causal inference by assuming it's causal
(draw little arrows, as in path analysis), and then use regression
analysis of observation data to draw causal inference.

That was why Strike 3 was that she was putting cart before the horse.
Post by Bruce Weaver
That's obviously not so, because the data from a good old fashioned
two-group experiment could be analyzed with regression (instead of the
more common t-test or ANOVA).
That was neither the point nor the counterpoint to the issue of how
the multiple regression coefficient MUST be interpreted, as explained
by Mosteller and Tukey which I have now referenced several times.

In the present discussion, in observational studies, the independent
variables are nearly always,if not always correlated among themselves.
That's why the SIGN of a multiple regression coefficient is the sign
of the PARTIAL correlation, rather than the SIMPLE correlation.

The common ABUSE is to act in "expected sign" as if it is the SIMPLE
correlation between the particular X and the dependent variable Y,
which is true IF and ONLY IF all the X's are orthogonal.

The only redeeming value of the use of indicator variables in a
multiple regression set up for an ANOVA problem is that the indicator
variables are orthogonal, hence there are NO partial correlations
in the coefficients.

But all the other prerequisites of a controlled study to ascertain
cause still have to apply. Randomization, double-blind, and the
rest of the step in a well-designed experiment come into play.


Even though nobody ever said Mosteller and Tukey are Popes, what
they said about WOES of REGRESSION (especially relative to the
SIGN of a coefficient) are ... unequivocal and INFALLIBLE.
Post by Bruce Weaver
Karl Wuensch has a webpage on that topic
("When does Correlation Imply Causation?").
That does address SOME of the control issues, but left out completely
the issue of PARTIAL correlations in the observed X variables, even
if they were observed in a controlled experiment.

-- Bob.
http://core.ecu.edu/psyc/wuenschk/StatHelp/Correlation-Causation.htm
Post by Bruce Weaver
--
Bruce Weaver
www.angelfire.com/wv/bwhomedir
Bruce Weaver
2005-04-20 14:28:26 UTC
Permalink
Post by Reef Fish
Post by Bruce Weaver
---- big snip ----
Post by Reef Fish
Post by jim clark
She would argue that if the hypothesized causal connection
Strike 1. Causation through regression?
Careful now. This suggests that the form of analysis, rather than
the
Post by Bruce Weaver
study design, determines whether one can make a causal attribution.
Not so. The form of the ANALYSIS has nothing to do with it.
That was *exactly* my point.

Cheers,
Bruce
--
Bruce Weaver
***@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
Richard Ulrich
2005-04-18 19:27:33 UTC
Permalink
- by the way, I do note that the OP was not (apparently)
asking about stepwise. My mistake.

Here are some comments about interpreting regression in
social sciences, and why the "opposite sign" is bad.

There might be a bit of my discussion in my stats-FAQ, which
collected some earlier posts on stepwise regression. I don't
remember anyone defending the position that RF has taken.



On 17 Apr 2005 20:28:12 -0700, "Reef Fish"
<***@Yahoo.com> wrote:
[snip, much...]
Post by Reef Fish
The "expected sign" of a (multiple) regression coefficient is the one
single ERROR most often committed by social scientists and economist in
their interpretation of regression coefficients.
I seem to differ greatly from you on the nature of this error....
Post by Reef Fish
Over the years, I have not found a SINGLE CASE in which a justification
was given (nor hinted) on where the "expectation" of the expected sign
came from.
Deaf to all explanations? no, making a rhetorical point.
Post by Reef Fish
The ERROR was always when the user think of the sign of a multiple
correlation coefficient as the sign of the SIMPLE correlation between
that X and Y, whereas the SIGN of the coefficient is the sign of the
PARTIAL correlation between that X and Y, GIVEN all the rest of the
independent variables in the regression!
Yes, the sign of the simple correlation is important to
pay attention to. I was considering adding a brief Reply
to your earlier post advocating the "step-down" version
of stepwise: Given that Stepwise is ordinarily a bad idea,
the Stepdown version is more prone to start with variables
in the "wrong direction" -- and that is a very bad indicator
(historically and rationally) for the robustness of equations
in the social sciences.

[...]
Post by Reef Fish
One of the latest national news is the NEW SAT exam, consisting of
Verbal, Quantitative, and Essay. Let's say those are THREE of some
10 independent variables used to predict (of fit) the GPR of admitted
[snip, invented example of "opposite-sign" prediction.]

Here is a *real* example of opposite-sign prediction.

It was either the SAT or another achievement test which figured
out that they achieved more reliable estimation of Verbal by
subtracting off some of the achievement on a Reading Speed
sub-scale that was computed internally (not reported to users).
Folks who read faster could get farther through the test, without
knowing more, so it provided a *rational* correction.

Here is a rule, I think, for using opposite-sign predictors:
Make sure that they actually work. I think, too, there will
have to be a face-valid interpretation of them. The easiest
instances that I know of have involved pairs of variables,
so that the (B-k*A) term can be explicitly used in prediction;
also, you can figure out separately whether (B-k*A) works
better than another model of difference, like k*log(B/A).
Post by Reef Fish
The MIS-interpretation of the "expected sign" of multiple regression
coefficient gave rise to a flurry of papers on Ridge Regression, for
the sole purpose of make the observed sign "correct", when they could
give NO raason (or even know that those are not SIMPLE ocrrelation
signs) why any sign should have a "positive" or "negative" expectation.
But, I take it, you forever *missed* the explanation for why
people did not like the opposite-sign predictions: They
didn't hold up. That's why the ridge-regressions *did* tend
to work -- they replicated. "Reduced variance" was the goal.
Post by Reef Fish
I have rejected more submitted journal papers based on that faulty and
false premise than you can imagine. But such misinterpretations are
EVERYWHERE in the applied journals of economics and social sciences.
*All* those people seem to differ greatly from you, on the
nature of the problems in regressions.

EVERYWHERE ... it seems to me like this ought to have provoked
a response of curiosity. Do you *still* not wonder why?


[snip, some]
Post by Reef Fish
I've seen the "expected sign" MIS-interpreted every time I've seen
that term used in a multiple regression context; I've NEVER seen
anyone argue on why that sign is expected to be "positive" or
"negative" by arguing from a partial correlation point of view!
Artifact, Bob, artifact. We are avoiding artifacts that
willl not be consistent between samples. Here is some
background of why your rant does not move me, and
must have frustrated a good many good researchers whom
you have reviewed.


Psychometrics figured out a long time ago that rating
scales are not created by multiple regression of items.
(Certain ideas in making *scales*, I believe, have carried over
usefully to good intuitions while *using* multiple regression.)

The most common way to create additive scales in the
social sciences makes use of simple sums of items, or
of item responses "0,1,2,3". It takes a huge sample to
justify using differential weighting of items, or of scores
(for all items, or for single items).

These additive scales, it turns out, are usually okay for
all the simple analyses intended on moderate sample Ns.
But they may suffer from defects when used on unusual
samples (for the scale) or for very large N. There can
be basement and ceiling effects, in addition to a general
tendency toward Poisson or log-normal behavior of totals.
In more detail: a scale designed for outpatients may look
fairly Gaussian for outpatients, while bottoming out (half
zeroes for totals) for Normal subjects, and topping out
for the worst sets of inpatients.


The "bad scaling" also has an influence in multi-variable
contexts, such as multiple regression. So, if you use 10
Symptom variables, mostly rating scales, to predict Good
versus Bad outcome, there is an obvious direction for
prediction: Worse status leads to worse outcome. It is
a bad indicator for the *technique* when a variable
enters in the "wrong direction" because it usually indicates
(say) a ceiling effect for a good variables, which is being
compensated by another variable. "X^2 minus X", except
that second X is a correlated variable.

This bad scaling is akin to the scaling problem of using
raw Ranks as predictors, which everyone speaks against
because it can produce artifactual interaction terms.
- It is wrong to interpret the negative direction as
meaningful for the predictor, and
- it is not robust to use the prediction equation in
any other sample.

Now, if the opposite sign can replicate, I would certainly
search for the reason. However, these suppressors
are usually accidents.

Similar "bad scaling" has been prominent in social sciences.
It turns out to be the rational explanation for a suppressor,
most of the time -- often enough so that we have learned to
treat it as the default.

Hope this helps.
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Richard Ulrich
2005-04-19 23:59:25 UTC
Permalink
[I have put the SPSS group back into the Newsgroups list,
since most of the comments were posted there. The post
I am responding to - was not.]
- At the risk of repeating other people -

On 18 Apr 2005 15:15:02 -0700, "Reef Fish"
<***@Yahoo.com> wrote:

The overall conclusion that I draw, from this and his other posts,
is that Bob has a conception of *proper* social science aims
of multiple regression which varies widely from what social
scientists think they can achieve.

Bob may tell me whether this is a decent summary or not.

Social scientists, as I know them, want to use multiple
regression in order to

1) show a set of relations that are true and meaningful;
2) gain validity and robustness through the face-validity;
3) achieve good prediction.

Bob, as I read him so far, has no respect for (1) or (2) in MR.
(Appreciation of "reliability analysis" or other aspects
of psychometrics is also questionable, as he slings barbs
that direction, too.)

His only purpose for MR is (3). That puts him at odds
with most of us, who are hoping to learn something about
the underlying sciences, and think that it can be done.
This is a self-defined limit on his capacity or range
as a consultant -- He knows mathematical statistics,
but he is uninterested in the tough job of "data analysis"
when it comes to refining the variable themselves
(and their scalings) in order to find solutions that not
merely seem to "work" but which are intelligible.

Unfortunately, he exhibits no interest in discussing
(1) or (2), or listening to them, or admitting that he has
heard anything or read anything of potential interest.

Selected lines of Bob's last reply to me are shown below,
with a few interposed comments,
Post by Richard Ulrich
On 17 Apr 2005 20:28:12 -0700, "Reef Fish"
[ ... ]

RF's rhetorical overstatement, or evidence of his deafness -
Post by Richard Ulrich
Post by Reef Fish
Over the years, I have not found a SINGLE CASE in which a justification
was given (nor hinted) on where the "expectation" of the expected sign
came from.
[snip, SAT, etc., student's regression; 80 lines or so]
His intention for the regression -
The PREDICTIVE model is neither meant to be a CAUSAL model nor
a CONTROL model. To use it as such would just be another common
abuse of regression models.
Over the years, those predicted models (developed by MY graduate
students in the course, not the models actually used by the Admissions
Office) with a NEGATIVE sign for SAT Math, consistently stood
all tests of cross-validation, subsampling, and the rest of the
data-analytic techniques to see if a developed model is "stable"
and hold for future predictions.
- There is weak writing above, or weak statistics. Those
features of "subsampling .. to see if a developed model is
'stable' and hold" are largely irrelevant to my concern about
changed circumstances and samples. However, the opening
phrase, "Over the years," suggests convincing validation over
the years. Subsampling, etc., is method, not result.
I think I was echoing the conclusions of everyone who
has posted on the topic in the last 8 years - our usual data
sets don't give good replication. I went into reasons, later.


[snip 15 or so; RU - opposite-sign didn't hold up, Ridge.]
WRONG! I explained why they don't like it, because it's counter-
intuitive and people MIS-interpret such predictive models as if they
"explain" or "control" the GPR average in the students performance.
You are partly right. The lack of replication and the lack
of sense are both important. If we only want predictive results
for the short term, we can accept a lack-of-sense. That's not
usually what social scientists want.
Post by Richard Ulrich
Post by Reef Fish
I have rejected more submitted journal papers based on that faulty and
false premise than you can imagine. But such misinterpretations are
EVERYWHERE in the applied journals of economics and social sciences.
... and you never 'heard' a word of explanation. Yep.

[snip 35 lines or so, social sciences, partial correlation, ...]
I consider that my contribution to stop/lessen statistical ABUSE and
statistical POLLUTION, by those in the social and economic sciences
who are no more equipped to practice statistics as they are to
practice brain surgery or law, after reading a book or a chapter
of a computer manual and thinks they can practice statistics correctly.
By itself, that paragraph sounds okay, only a little rant-y.
[snip me, mentioning psychometrics, to describe the
alternative universe of social sciences, compared to Bob's
engineering statistics.]
Don't change the subject.
We are NOT talking about rating scales or any of what psychometricians
do. We are SPECIFICALLY talking about the CORRECT and PROPER
interpretation of the SIGN of multiple regression coefficients.
[snip me, on simple additive scales, for reliability, etc.]
I've had colleagues who talked about "unit weighting" too. They were
talking about STATISTICS. They were exercising psychometric voodoo and
quackery in the name of statistics!
Did you intend, "not" talking? Is all psychometry voodoo to you?
< iirrelevant tangent to the interpretation of SIGN of the multiple
regression coefficients snipped >
- That is the sum of Bob's considered response to my
moderately decent explanation of why social scientists try to
use MR the way they do, and how it can work.

me > >
Post by Richard Ulrich
Now, if the opposite sign can replicate, I would certainly
search for the reason.
RF >
The reasons would be PARTIAL correlation of one variable with another
in the PRESENCE of the remaining variables.
Simple! Tod bad you never learned that.
- That's Bob, proving he heard no reasons given this time, either.

[snip, a few more]
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-20 04:21:19 UTC
Permalink
Post by Richard Ulrich
[I have put the SPSS group back into the Newsgroups list,
since most of the comments were posted there. The post
I am responding to - was not.]
- At the risk of repeating other people -
On 18 Apr 2005 15:15:02 -0700, "Reef Fish"
The overall conclusion that I draw, from this and his other posts,
is that Bob has a conception of *proper* social science aims
of multiple regression which varies widely from what social
scientists think they can achieve.
Bob may tell me whether this is a decent summary or not.
It is over 90% true. I am leaving 10% room for those social
scientists who do NOT abuse the use of statistics and statistical
methods, including three Psychometric Society past Presidents
(whom I'll cite) below with whom I have had close professional
contact for at least 15 years, and none of them would EVER
make blunders like the ones YOU and others are making about
"expected sign" in a multiple regression problem!
Post by Richard Ulrich
Social scientists, as I know them, want to use multiple
regression in order to
1) show a set of relations that are true and meaningful;
2) gain validity and robustness through the face-validity;
3) achieve good prediction.
Save your breath! It doesn't matter WHAT they try to use
multiple regression for, as soon as they opened their mouths
about the "expected sign" of a multiple regression coefficient,
I see a very conspicuos FOOT dangling out of the same mouth.

It's that simple!

< snip Ulrich's pointless paragraphs about the issue at hand >

This is a COMPLETELY FALSE and ERRONEOUS characterization of
what I posted.

The ONLY aspect of multiple regression I addressed was the
PROPER and IMPROPER interpretation of the SIGN (positive or
negative) of the regression coefficient, completely independent
of Ulrich's (1) - (3). Three are many other uses for
multiple regression too. But they are ALL subject to the
proper interpretation of the SIGN of the coefficients, and
those who talk about "expected sign" (a priori) as if it
reflects the SIMPLE correlation sign, are simply DEFICIENT
in their understanding of Multiple Regression theory and
methodology.
Post by Richard Ulrich
I consider that my contribution to stop/lessen statistical ABUSE and
statistical POLLUTION, by those in the social and economic sciences
who are no more equipped to practice statistics as they are to
practice brain surgery or law, after reading a book or a chapter
of a computer manual and thinks they can practice statistics
correctly.
Post by Richard Ulrich
By itself, that paragraph sounds okay, only a little rant-y.
But RANT <tm> is my trademark. :-)

I RANTED 10 times as long on that same THEME and SUBJECT in JASA.

My professional opinion about this statistical "mal-practice" was
expressed in the flagship journal of the American Statistical
Association
(JASA 1982, 489-491) over two decades ago, about a book, titled
"Correlation and Causation" written by a social scientist, obviously
inadequately trained in the subject of Statistics:


*> "I am less perturbed by the poor substantive quality of this
*> book than by the fact that we are witnessing the emergence of
*> a subculture of economists and social scientists, who are no
*> more qualified or equipped to practice statistics than law
*> law or medicine, yet who nonetheless do practice it among
*> their circles of nonstatisticians, without much visible signs
*> of protest from the community of statisticians. I feel
*> obliged to register my strongest protest against this type of
*> malpractice, fostered by the title and content of this book."


This was after I toned down my wording when the editor of JASA
was getting a bit nervous because I've also used these terms to
describe those mal-practitioners -- "quacks" and "black magic" --
and those terms DID stay in the review. :-)
Post by Richard Ulrich
Is all psychometry voodoo to you?
Only those voodoo techniques such as your "expected signs" discussed
here, and many other abuses not discussed here. Some/MANY of my good
friends in STATISTICS are psychometricians, such as Doug Carroll
(ASA Fellow 1979), Jos Kruskal (ASA Fellow 1971), Phipps Arabie
(ASA Fellow 1989), to name just three. All three had been
President of the Psychometric Society: Doug Carroll (1974-1977).
Phipps Arabie (1990-1991). So was Joe Kruskal.
http://cm.bell-labs.com/cm/ms/departments/sia/kruskal/

I can assure you that NONE of them will EVER make the kind of ABUSE
and BLUNDERS committed by the "expected sign" folks.
Post by Richard Ulrich
[snip, a few more]
You snipped too much.

RF> Of all the textbooks on regression, the one that best articulates
RF> the FALLACY of the "expected sign" phenomenon (by people like
yourself
RF> and the other social scientists and economists) is the book by
RF> Mosteller and Tukey!! "Data Analysis and Regression" (1977).
RF>
RF> Get a copy of that book, and read the relevent chapters related to
RF> partial correlations (and the misnomer of "keeping the other
variables
RF> constnat" when speaking of the given variables in a partial corr.),

RF> and try to read it CAREFULLY and read it WELL.

I don't think you did. You were too busy ARGUING with your
irrelevance.

Let me try to help the OTHERS a bit more.

Chapter 13 of that book is titled "Woes of Regression Coefficients"
That was ONE of about 10 books I kept, of the HUNDREDS I've donated
to needy students and libraries in China and India.

It starts with "Meaning of Coefficients in Multiple Regression"

It was written in elementary math that even an 8th grader can
understand.

"The only general fact is that the coefficient of any one carrier -
in our example, x - depends on which other carriers are offered at
the same time," says Moteller and Tukey.

The term "carrier" is an independent variable or predictor variable
in the context of the quoted text.

That's saying what I've been saying in SEVERAL lengthy posts, to
deaf ears, that the SIGN of a regression coefficient depends on
what OTHER variables are in the multiple regression equation.

Mosteller and Tukey did better. They stipped the notion of PARTIAL
correlation and presented a DETERMINISTIC example in which both the
SIGN and the MAGNITUDE of the coefficient of x change depending on
what OTHER variables are in the model!!

QUOTE from Mosteller and Tukey:

" in our example,

-1 = coefficient of x when 1 and x^2 are also offered
+1 = coefficient of x when 1 and (x-1)^2 are also offered
+5 = coefficient of x when 1 and (x-2)^2 are also offered

a coefficient in a multiple regression, EITHER IN THEORY OR IN FIT
(caps my emphasis) depends on MORE (authors' emphasis) than just:

<> the set of data and the mothod of fitting
<> the carrier it multiples

It also depends on

<> what else is offered in part of the fit

END QUOTE.

"WHAT ELSE IS OFFERED IN PART OF THE FIT" (Mosteller and Tukey)

Now take any Y and any X your psychometricians had ever used
in ANY multiple regression fit, and ask them what SIGN they expect
the regression coefficient x to have, without knowing what OTHER
variables are in the equation.

I ask YOU the same question.

Then tell them to read Mosteller and Tukey, as I told you.

I should have brought out Mosteller and Tukey in the first place.
I believe what I quoted from two PAGES of their book effectively
TKO'd YOU and all of your "expected sign" social scientists and
psychometricians and anyone ELSE who does not understand the FIRST
principle about MULTIPLE regression COEFFICIENTS in a multiple
regression analysis!


Remember, that's only ONE of the woes in that chapter most
social scientists don't understand! But that one single woe was
sufficient to have created TONS of GARBAGE in the literature
of the social sciences, economics, psychometrics, and other
areas of statistical application of the MULTIPLE REGRESSION
theory and methodology.
Post by Richard Ulrich
--
http://www.pitt.edu/~wpilib/index.html
-- Bob.
Richard Ulrich
2005-04-20 19:21:19 UTC
Permalink
On 19 Apr 2005 21:21:19 -0700, "Reef Fish"
[... ]
Post by Reef Fish
Post by Richard Ulrich
On 18 Apr 2005 15:15:02 -0700, "Reef Fish"
The overall conclusion that I draw, from this and his other posts,
is that Bob has a conception of *proper* social science aims
of multiple regression which varies widely from what social
scientists think they can achieve.
Bob may tell me whether this is a decent summary or not.
It is over 90% true. I am leaving 10% room for those social
scientists who do NOT abuse the use of statistics and statistical
Okay, we agree on that much .

Bob, the social scientists believe that they can tinker
with the variables to get *clean* variables that measure
what they are supposed to, and do so with good scaling.
Once they do that, equations should reflect causation.

Nowhere have you explained why that is stupid or wrong.

- "There is a hazard of tautology, where you only get out
what you put in".... I *think* that is a proper criticism from
your position, if you were connecting (listening) enough
to be constructive. Yes, folks are aware of that, and the
competition from other modelers is one corrective to
such attempts. *Interesting* things emerge, in the process
of trying to statistically validate the face-valid model --
that is a not a corrective, but an explanation of why the
research programs continue.

[snip, a dozen lines]
Post by Reef Fish
< snip Ulrich's pointless paragraphs about the issue at hand >
This is a COMPLETELY FALSE and ERRONEOUS characterization of
what I posted.
The ONLY aspect of multiple regression I addressed was the
PROPER and IMPROPER interpretation of the SIGN (positive or
negative) of the regression coefficient, completely independent
of Ulrich's (1) - (3). Three are many other uses for
multiple regression too.
- yes, "other uses." Was that what I was talking about in
"Ulrich's pointless paragraphs about the issue at hand"?
Won't you talk about this as "another use"?
Post by Reef Fish
But they are ALL subject to the
proper interpretation of the SIGN of the coefficients, and
those who talk about "expected sign" (a priori) as if it
reflects the SIMPLE correlation sign, are simply DEFICIENT
in their understanding of Multiple Regression theory and
methodology.
We KNOW what you are talking about, I think, much better
than you KNOW what we are talking about.

The interest that social science has in a good prediction
that is not face-valid -- proper signs -- is in explaining
why that prediction works. The prediction "that just happens
to work" is considered risky, too, because it won't warn
of regime-change. (I don't know if anything could have
saved the two Nobelists whose investment company did
not predict a 7 SD excursion of rates -- that was a 'regime change'.)

[snip, a bunch]
RU >
Post by Reef Fish
Post by Richard Ulrich
Is all psychometry voodoo to you?
[snip, major name-dropping ]
Post by Reef Fish
I can assure you that NONE of them will EVER make the kind of ABUSE
and BLUNDERS committed by the "expected sign" folks.
Can you quote them on that?

[snip, about Tukey's books]

I really enjoyed Tukey's texts, and recommended them
to others. The first one was easier for non-statisticians than
for people who had taken a couple of courses, it seemed
to me, because Tukey uses his private terminology.

[ snip, some good quotation of a Tukey text.]
Post by Reef Fish
That's saying what I've been saying in SEVERAL lengthy posts, to
deaf ears, that the SIGN of a regression coefficient depends on
what OTHER variables are in the multiple regression equation.
And we have replied: THAT is why we care about
what the other variables are, and why we insist on
paying *close* attention to them -- choosing them,
rescaling them, looking for oddities in the sample-space.
Whereas, by contrast, you seem to are happy to accept
the variables offered, whatever they are.

Is playing with variables and equations a fruitless
enterprise? - THAT is a more precise criticism, I think,
than what you have been offering. I can agree that I
haven't seen much from path analysis or structural
equations. Nobody has enough data, the right data or
the right form of hypotheses, for the most part.

I can agree that folks read a bit of Kenney and become
over-enthusiastic about finding moderators and mediators,
which remains difficult.


[snip, more, QUOTE from Mosteller and Tukey:]
Post by Reef Fish
a coefficient in a multiple regression, EITHER IN THEORY OR IN FIT
<> the set of data and the mothod of fitting
<> the carrier it multiples
It also depends on
<> what else is offered in part of the fit
END QUOTE.
And that is why we care about them, all the variables.
Is it futile?
Post by Reef Fish
"WHAT ELSE IS OFFERED IN PART OF THE FIT" (Mosteller and Tukey)
Now take any Y and any X your psychometricians had ever used
in ANY multiple regression fit, and ask them what SIGN they expect
the regression coefficient x to have, without knowing what OTHER
variables are in the equation.
I ask YOU the same question.
(reprise) And that is why we care about them.
Is it futile?
Post by Reef Fish
Then tell them to read Mosteller and Tukey, as I told you.
We know all that. We don't stop at the place that
you do.... Is it a foregone conclusion that it is
futile to try to devise variables that will act the
way we expect? We can't learn anything along the way?


[snip, more recommendation]
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-20 20:43:53 UTC
Permalink
Post by Richard Ulrich
On 19 Apr 2005 21:21:19 -0700, "Reef Fish"
[... ]
Post by Reef Fish
Post by Richard Ulrich
On 18 Apr 2005 15:15:02 -0700, "Reef Fish"
The overall conclusion that I draw, from this and his other posts,
is that Bob has a conception of *proper* social science aims
of multiple regression which varies widely from what social
scientists think they can achieve.
Bob may tell me whether this is a decent summary or not.
It is over 90% true. I am leaving 10% room for those social
scientists who do NOT abuse the use of statistics and statistical
Okay, we agree on that much .
Bob, the social scientists believe that they can tinker
with the variables to get *clean* variables that measure
You're off track again! The "social scientist" (which is really
an oxymoron, IMHO) have not learned their regression methodology
well, made blunders about the SIGN of multiple regressions as
a result of it.

That's tinkering already. Blind tinkering without a statistical
leg to stand on.

Pay attantion: SIGN of the coefficients. Read Mosteller and Tukey
and see WHY you have all been making the blunder that is called one
of the WOES of those applying regression methods.

Learn your STATISTICS first, before tickering and MISAPPLYING it.
Post by Richard Ulrich
[snip, a dozen lines] < and more >
Post by Reef Fish
The ONLY aspect of multiple regression I addressed was the
PROPER and IMPROPER interpretation of the SIGN (positive or
negative) of the regression coefficient, completely independent
of Ulrich's (1) - (3). Three are many other uses for
multiple regression too.
- yes, "other uses." Was that what I was talking about in
"Ulrich's pointless paragraphs about the issue at hand"?
Won't you talk about this as "another use"?
Ulrich never understood why the SIGNS don't necessarily follow the
intuitive SIMPLE ocrrelation signs, because they do NOT contain
simple correlation information. You keep ignoring and overlooking
Post by Richard Ulrich
Post by Reef Fish
But they are ALL subject to the
proper interpretation of the SIGN of the coefficients, and
those who talk about "expected sign" (a priori) as if it
reflects the SIMPLE correlation sign, are simply DEFICIENT
in their understanding of Multiple Regression theory and
methodology.
We KNOW what you are talking about,
You bet!
Post by Richard Ulrich
I think, much better than you KNOW what we are talking about.
I also KNOW what you're talking about -- invalid and unjustibiable
practice of Multiple Regression Analysis, according to those who
do know what they are talking about -- the statisticians DOZENS of
them I've cited, including Mosteller and Tukey -- which you obviously
has not read or mastered.
Post by Richard Ulrich
[snip, a bunch]
RU >
Post by Reef Fish
Post by Richard Ulrich
Is all psychometry voodoo to you?
[snip, major name-dropping ]
To show you that SOME Presidents of Psychometric societies are trained
statisticians and the ones *I* know AFAIK do not practice your kind of
"expected sign" voodoo.
Post by Richard Ulrich
Post by Reef Fish
I can assure you that NONE of them will EVER make the kind of ABUSE
and BLUNDERS committed by the "expected sign" folks.
Can you quote them on that?
You can quote ME on that.

Anything I post is "public". Their work in classification, clustering,
correspondence analysis, and other pyschometric methods in which they
are well-known and respected, do not involve multiple regression sign-
guessing abuses.
Post by Richard Ulrich
[snip, about Tukey's books]
Too bad.
Post by Richard Ulrich
I really enjoyed Tukey's texts, and recommended them
to others. The first one was easier for non-statisticians than
for people who had taken a couple of courses, it seemed
to me, because Tukey uses his private terminology.
Quit beating around the bush! What about the chapter on WOES of
Regression, which I have taken excerpts (which you snipped) tells
your plainly and unmistakenly HOW and WHY you and the other
sign-guessers are WRONG, both in THEORY and in PRACTICE.

You are sweeping your dirt under a rug with a vengeance.
Post by Richard Ulrich
[ snip, some good quotation of a Tukey text.]
You shouldn't have. You should have faced the music!

< snip irrelevant tangents to the SIGN issue of MR coefficients >
Post by Richard Ulrich
[snip, more, QUOTE from Mosteller and Tukey:]
You snipped them all because you stubbornly refused to recognize
the TRUTH, explained to you plainly and straightforwardly, in
prefer to continue your off tangent obfuscation.
Post by Richard Ulrich
Post by Reef Fish
a coefficient in a multiple regression, EITHER IN THEORY OR IN FIT
<> the set of data and the mothod of fitting
<> the carrier it multiples
It also depends on
<> what else is offered in part of the fit
END QUOTE.
And that is why we care about them, all the variables.
Is it futile?
No, and a thousand times NO. That's WHY you can't GUESS the "expected
sign" when you have DOZENS of other variables which you haven't even
observed!

I think we have come to the end of our conversation on this.

You're hopeless, in your understanding of the proper application of
Multiple Regression Methodology, based on what you've posted, in
the light of explanations by myself, by Mosteller and Tukey, and
by dozens of emninent statisticians who have written papers and
text books on the subject, about the meaning of the SIGNS of the
MR coefficients.

You can QUOTE me on the preceding paragraph, to anyone, at any time.

It's really quite simple. Any statistics UNDERGRADUATE, who has
taken a good applied regression course, would have understood the
theory and method that escaped you.
Post by Richard Ulrich
[snip, more recommendation]
--
http://www.pitt.edu/~wpilib/index.html
-- Bob.
jim clark
2005-04-20 04:38:14 UTC
Permalink
Hi

Here is Bob's description of one study that he views as
illustrative of his point about the "expected sign" fallacy. Au
contraire (we're bilingual up here in Canada), I see it is a
fascinating clue as to some underlying theoretical model for the
observed effect, and multiple regression as a useful tool to try
and discern that underlying model, as illustrated below.

*That turned out to be a GOLDEN set of data to use for pedagogical
*purposes to demonstrate the "expected sign" fallacy as well as
*showing how the SIGN of the SAT math variable in the predictive
*models can be POSITIVE or NEGATIVE, statistically insignificantly
*negative, OR statistically significantly negative, ALL depending on
*what OTHER variables are in the predictive models. The variable
*that would make the sign of the SAT Math variable NEGATIVE in
*predicting GPR was the PRESENCE of the Math Achievement Score in
*the same model, in combination with certain other variables.

It always helps, of course, to actually know what the underlying
causal model is, so below is one possibility that reliably
produces Bob's reported effect. In this model, there is some
underlying factor (represented by #z2 in the equations) that has
a positive influence on SAT scores and a negative influence on
GPA. In essence, however, any model that has r_gs - r_ga*r_as
produce a negative value will produce the effect, where r_gs is
the simple correlation between gpa and sat, r_ga is the simple
correlation between gpa and achievement, and r_as is the simple
correlation between achievement and sat. As shown below, the
simple slope between gpa and sat is positive, whereas the slope
for sat with ach also in the equation is negative.

SET WIDTH 120 SEED = RANDOM.
INPUT PROGRAM.
LOOP SUBJ = 1 TO 12000.
COMPUTE #z1 = NORM(1).
COMPUTE #z2 = NORM(1).
COMPUTE sat = #z1*.5 + #z2*.5 + NORM(1)*.7071.
COMPUTE ach = #z1*.7 + NORM(1)*SQRT(1-.7**2).
COMPUTE gpa = #z1*.7 - #z2*.4 + NORM(1)*.5916.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.

corr gpa ach sat /stat.

Mean Std. Deviation N
GPA -.001792 .9940461 12000
ACH .009514 1.0043003 12000
SAT .013515 1.0028287 12000

GPA ACH SAT
ACH Pearson Correlation .485 1 .343
Sig. (2-tailed) .000 . .000
N 12000 12000 12000

SAT Pearson Correlation .151 .343 1
Sig. (2-tailed) .000 .000 .
N 12000 12000 12000

regre /vari = gpa ach sat /dep = gpa /enter sat /enter.

Model R R Square Adjusted R Square Std. Error of the
Estimate
1 .151(a) .023 .023 .9827038
2 .485(b) .235 .235 .8694894

Model Sum of Squares df Mean Square F Sig.
1 Regression 269.994 1 269.994 279.582 .000(a)
Residual 11586.550 11998 .966
Total 11856.544 11999

2 Regression 2786.670 2 1393.335 1843.006 .000(b)
Residual 9069.875 11997 .756
Total 11856.544 11999

Unstandardized Coeff Standardized Coeff t Sig.
Model B Std. Error Beta
1 (Constant) -3.813E-03 .009 -.425 .671
SAT .150 .009 .151 16.721 .000
ACH

2 (Constant) -6.181E-03 .008 -.779 .436
SAT -1.691E-02 .008 -.017 -2.008 .045
ACH .485 .008 .490 57.696 .000

What might #z2 represent in reality and how might we determine if
that is the case? One possibility is that weaker students i.e.,
those predicted to obtain lower gpas) are more likely to receive
intensive coaching on the sat, which "inflates" their sat scores.
We could obtain self-reports about coaching to test this
hypothesis. Or we could actually undertake an experiment in which
we coached people to varying degrees on the sat to see whether
experimental coaching produces the naturally observed effect.
Others could undoubtedly come up with other hypotheses and ways
to test them.

As Rich stated, I find it strange that someone would NOT be
motivated to ask why the observed effect occurs, but would be
satisfied with the tautological answer that it represents the
effect of partialling out other variables. Certainly those
interested in theoretical models would want more. And even if
one is interested only in prediction, is it not intuitively the
case that the more complete your causal model, the better your
prediction will ultimately become? It seems extremely unlikely,
for example, that blind empiricism could ever have led engineers
to adequately predict (and control) machines that can put people
on the moon or explore the deep reaches of space. It is the
underlying causal model that allows such precise prediction. And
that should be the ultimate goal of social science researchers.

Best wishes
Jim

============================================================================
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
Winnipeg, Manitoba R3B 2E9 ***@uwinnipeg.ca
CANADA http://www.uwinnipeg.ca/~clark
============================================================================
L***@Yahoo.com
2005-04-20 05:14:28 UTC
Permalink
jim clark wrote:

Your post was logged by google at Apr 19, 9:38 pm (Pacific).

I don't think you had enough time to read, let alone digest
my post and reply to R. Ulrich, logged at Apr 19, 9:21 pm

What was said by Mosteller and Tukey in their book in my post
would have TKO'd you.

You not only wasted bandwidth, you wasted computer time in an
attempt to resolve a questoin of THEORY and METHOD you don't
understand.

The SIGN of a multiple regression coefficient depends on

RF> "WHAT ELSE IS OFFERED IN PART OF THE FIT" (Mosteller and Tukey)

=========== excerpt from my Apr 19, 9:21 pm post:
Now take any Y and any X your psychometricians had ever used
in ANY multiple regression fit, and ask them what SIGN they expect
the regression coefficient x to have, without knowing what OTHER
variables are in the equation.


I ask YOU the same question.


Then tell them to read Mosteller and Tukey, as I told you.


I should have brought out Mosteller and Tukey in the first place.
I believe what I quoted from two PAGES of their book effectively
TKO'd YOU and all of your "expected sign" social scientists and
psychometricians and anyone ELSE who does not understand the FIRST
principle about MULTIPLE regression COEFFICIENTS in a multiple
regression analysis!

Remember, that's only ONE of the woes in that chapter most
social scientists don't understand! But that one single woe was
sufficient to have created TONS of GARBAGE in the literature
of the social sciences, economics, psychometrics, and other
areas of statistical application of the MULTIPLE REGRESSION
theory and methodology.

============ end excerpt from my Apr 19, 9:21 pm post
Post by jim clark
Hi
< much noise snipped >
Post by jim clark
Here is Bob's description of one study that he views as
illustrative of his point about the "expected sign" fallacy. Au
contraire (we're bilingual up here in Canada),
Au contraire, mon frère. (That's George Carlin's line)

Vous pouvez être bilingue en anglais et français, mais vous êtes
illettré dans l'analyse de régression, j'ai peur.

<You may be bilingual in English and French, but you are
illiterate in regression analysis, I am afraid. >

-- Bob.
Post by jim clark
Jim
============================================================================
Post by jim clark
James M. Clark (204) 786-9757
Department of Psychology (204) 774-4134 Fax
University of Winnipeg 4L05D
CANADA http://www.uwinnipeg.ca/~clark
============================================================================

-- Bob.
d***@autobox.com
2005-04-27 12:26:12 UTC
Permalink
Jim,

Whereas I just thought I would reflect on the issue ...

With thanks to previous posters ...


One of my clients. a the major insurance company who wishes to predict
the number
of new contracts based upon various forms of media expense, such as
National Print Media, Local Print Media and other possible explanatory
variables.

Now what would you say is the "expected sign" of the coefficients of
the National Print Media and Local Print Media variables? The sign of
coefficient in a mirm regression (correlation) equation has the sign of
the simple regression (correlation) between that X and Y, when the
independent variables are completely and mutually orthogonal to each
which is almost never the case in the observational data used in
multiple regression analysis.

As George Box pointed out, if you don't physically change the X
values ( as you do in a designed experiment ) you can't unambiguously
state what would happen if you do !. The regression coefficient in a
mirm reflects the expected change in Y if and only if when you change
that 1 X none of the other X's change (orthogonality assumption ).
Analysts who endeavor to do marginal analysis of a mirm result need to
be aware of this.

I recall as a young man I asked Prof. Kendall about the phenomena where
after adding 1 more variable to my mirm the sign of one of my other
(all significant ) x's changed sign. He responded that these were
simply a simultaneous set of weights (optimized to minimize the error
sum of squares ) and as such each had no meaning whatsoever just that
together they best predicted (fit) the observed y values from the set
of X's. I asked what about the marginal impact of each of the input
series and he reflected that no such marginal anlysis was possible if
the input series were cross-correlated with each other.

Another point suggested by Prof. Harry Roberts was that regression
was a particular case of time series (Transfer Function ) and that all
that was relevant to regression was also relevant to time series
analysis thus it is quite possible to have "bad signs" for
coefficients in a Transfer Function Model because it simply reflects
the "simultaneous" response of the Y variable to the collective
impact of the X variables..


Regards

Dave Reilly
Automatic Forecasting Systems
http://www.autobox.com
Reef Fish
2005-04-27 16:22:31 UTC
Permalink
Post by d***@autobox.com
Jim,
Whereas I just thought I would reflect on the issue ...
With thanks to previous posters ...
One of my clients. a the major insurance company who wishes to
predict
Post by d***@autobox.com
the number
of new contracts based upon various forms of media expense, such as
National Print Media, Local Print Media and other possible
explanatory
Post by d***@autobox.com
variables.
Now what would you say is the "expected sign" of the coefficients of
the National Print Media and Local Print Media variables? The sign of
coefficient in a mirm regression (correlation) equation has the sign of
the simple regression (correlation) between that X and Y, when the
independent variables are completely and mutually orthogonal to each
which is almost never the case in the observational data used in
multiple regression analysis.
As George Box pointed out, if you don't physically change the X
values ( as you do in a designed experiment ) you can't unambiguously
state what would happen if you do !. The regression coefficient in a
mirm reflects the expected change in Y if and only if when you change
that 1 X none of the other X's change (orthogonality assumption ).
Analysts who endeavor to do marginal analysis of a mirm result need to
be aware of this.
I recall as a young man I asked Prof. Kendall about the phenomena where
after adding 1 more variable to my mirm the sign of one of my other
(all significant ) x's changed sign. He responded that these were
simply a simultaneous set of weights (optimized to minimize the error
sum of squares ) and as such each had no meaning whatsoever just that
together they best predicted (fit) the observed y values from the set
of X's. I asked what about the marginal impact of each of the input
series and he reflected that no such marginal anlysis was possible if
the input series were cross-correlated with each other.
Another point suggested by Prof. Harry Roberts was that regression
was a particular case of time series (Transfer Function ) and that all
that was relevant to regression was also relevant to time series
analysis thus it is quite possible to have "bad signs" for
coefficients in a Transfer Function Model because it simply reflects
the "simultaneous" response of the Y variable to the collective
impact of the X variables..
Regards
Dave Reilly
Automatic Forecasting Systems
http://www.autobox.com
Dave,

Thanks for your cogent and on-the-mark remarks on the SUBJECT of this
thread and your informative references to George Box, Kendall, and
Harry Hoberts on the topic.

Although your comments were more directly related to time series, the
idea of expected sign IN THE PRESENCE of OTHER (correlated) variables
in a Multiple Regression are imbedded in all your references. Harry
Roberts used Multiple Regression methods to analyze time-series data
using Box-Jenkins' ARIMA models!

Harry Roberts was my former colleague at the University of Chicago.
We co-authored a textbook, in elementary statistics, and was heavily
"multiple regression methods" oriented. Harry's "Conversational
Statistics" book, and later our "Conversational Statistics with
IDA: An Introduction to Data Analysis and Regression", McGraw-hill/
Scientific Press (1982) were used for many years by grad students
at the GSB of Chicago. Harry also wrote a textbook on the Box Jenkins
ARIMA models using Multiple Regression methods of lagged and
differenced
variables as the independent variables in the multiple regression
models.

Nearly ALL of the students who took Harry's courses were NOT
statisticians.
let alone mathematicians, but who are ENLIGHTENED consumers of
Statistics.
They included many presidents and high level officers of business
corporations who analyzed their company data THEMSELVES, and did them
WELL.

The MBA studenst at Chicago make excellent examples of statistics
consumers and USERS (those who actually DO statistical analyses) who
fit these descriptions:

1. I am quite certain none of them are the "expected sign" ABUSERS
such as those we have seen in this forem and elsewhere.

2. The "expected sign" fallacy is not only TECHNICALLY flawed in terms
of mistaking a simple correlation for a partial correlation, but
is obviously promoted by the faulty notion that a regression model
necessarily EXPLAIN any phenomenon, rather than merely FITTING data
to some model which may or may not PREDICT a future outcome well.

The "price of eggs in China" may indeed FIT many observational data
in Y well, without EXPLAINING anything about the phenomenon of FIT.

3. Most of them do not NEED to know anything about "Measure Theory"
or measure-theoretic Probability Theory to do APPLIED statistics,
and do them WELL. In fact, I would bet Herman Rubin my farm that
NONE of the MBA students dod. :-)



My comment (3) above is directed toward Herman Rubin, on our many
very strong disagreement as to what is "applied statistics" and what
is needed to apply statistics intelligently and apply it well. Herman
and I have no problem with each other, standing AGREE TO DISAGREE on
our OPINIONS about applied statistics.

My comments (1) and (2) above is directed toward Richard Ulrich and
other "expected sign" ABUSERS in multiple regression. There, it's not
a matter of opinion that Richard and the others are WRONG, but an
unequivocal FACT that they are WRONG, and DEFICIENT in their knowledge
about the theory and methodology of Multiple Regression. There is NO
ROOM in their cases for me to point out their errors as BLUNDERS,
and no room to "agree to disagree".

In spite of the tedious rhetoric and obfuscation provided by Richard
Ulrich and others, they are 100% WRONG in their notion they can
argue a priori about the "expected sign" without reference to, or
without perfect knowledge about, the OTHER variables in the regression
model.

Thank you again for your valuable contribution to this "discussion"
which was beginning to turn from a discussion to full-blown flamewars,
between myself and Richard and a couple others in the Blunder Group.

-- Bob.
Reef Fish
2005-04-27 16:52:20 UTC
Permalink
The mmission typo was MINE (My keyboard Devil was not to be blamed).
The omission of one single word "BUT" changed the meaning of a key
sentence to its opposite. :-) Although I think most readers would
have inferred my omission typo, but it's sufficiently important for
me to correct it, to leave no room for misunderstanding.
Post by Reef Fish
My comments (1) and (2) above is directed toward Richard Ulrich and
other "expected sign" ABUSERS in multiple regression. There, it's not
a matter of opinion that Richard and the others are WRONG, but an
unequivocal FACT that they are WRONG, and DEFICIENT in their
knowledge
Post by Reef Fish
about the theory and methodology of Multiple Regression. There is NO
ROOM in their cases for me to point out their errors as BLUNDERS,
and no room to "agree to disagree".
There is NO ROOM in their cases *** BUT *** for me to point out their
errors as BLUNDERS, and no room to "agree to disagree".

-- Bob.
Richard Ulrich
2005-04-28 21:55:27 UTC
Permalink
- commenting on Dave's post and Bob's reply.

On 27 Apr 2005 09:22:31 -0700, "Reef Fish"
[snip, autoregression problem; sign behavior. ]
Post by Reef Fish
Post by d***@autobox.com
As George Box pointed out, if you don't physically change the X
values ( as you do in a designed experiment ) you can't unambiguously
state what would happen if you do !. The regression coefficient in a
mirm reflects the expected change in Y if and only if when you change
that 1 X none of the other X's change (orthogonality assumption ).
Analysts who endeavor to do marginal analysis of a mirm result need
to
Post by d***@autobox.com
be aware of this.
"Be aware of this" is a minimal condition.

Starting from a simple, small model, it is frequently possible
to tell what is going on. You can track down confounding,
sometimes, by looking at the behavior of particular residuals
between one step and another. That is probably more difficult
in cross-correlated time series, compared to flat models.

Whether you want to bother refining variables may depend on
whether your aim is "accurate prediction" (for a small universe)
or scientific generalization.
Post by Reef Fish
Post by d***@autobox.com
I recall as a young man I asked Prof. Kendall about the phenomena
where
Post by d***@autobox.com
after adding 1 more variable to my mirm the sign of one of my other
(all significant ) x's changed sign. He responded that these were
I notice that *one* of them changed sign, not dozens.

For a scientist trying to write a valid generalization, that
is a warning about a couple of variables.

For a statistical technician with efforts limited to prediction,
I think it *can* provide the same warning. That would be so,
especially, if it is seen a lot of times; and if those predictions
(pragmatically) turn out to be less robust to changes. "This variable
is frequently confounded: WHY?" - That check can reveal
undesirable artifacts, and lead to improvement in measurement,
choice of variables, what-have-you.

However, some of that "predictive modeling" (stocks? etc.) is
going to be re-done every year, in any case.
Post by Reef Fish
Post by d***@autobox.com
simply a simultaneous set of weights (optimized to minimize the error
sum of squares ) and as such each had no meaning whatsoever just that
together they best predicted (fit) the observed y values from the set
of X's. I asked what about the marginal impact of each of the input
series and he reflected that no such marginal anlysis was possible if
the input series were cross-correlated with each other.
Another point suggested by Prof. Harry Roberts was that regression
was a particular case of time series (Transfer Function ) and that
all
Post by d***@autobox.com
that was relevant to regression was also relevant to time series
analysis thus it is quite possible to have "bad signs" for
coefficients in a Transfer Function Model because it simply reflects
the "simultaneous" response of the Y variable to the collective
impact of the X variables..
[snip, sig]

Time series in business applications - I keep thinking
to myself that, surely, economists keep arguing "causation"
from their primary factors, such as what happens when the
Fed changes the discount rate. But then they keep getting
surprised by new phenomena, so it is wiser for them to
consistently refer back to "this is what we have seen in the
past" instead of claiming knowledge of "causation."

Still, economics seems to fall into a "predictive/science"
framework, I think. That is, when an expectation *fails*, from
changing an interest rate or something else,
a slew of "explanations" are generated which provide
new variables for new models.


Bob/Reef Fish >
Post by Reef Fish
Thanks for your cogent and on-the-mark remarks on the SUBJECT of this
thread and your informative references to George Box, Kendall, and
Harry Hoberts on the topic.
Although your comments were more directly related to time series, the
idea of expected sign IN THE PRESENCE of OTHER (correlated) variables
in a Multiple Regression are imbedded in all your references. Harry
Roberts used Multiple Regression methods to analyze time-series data
using Box-Jenkins' ARIMA models!
[snip, for space. Harry Roberts...
Post by Reef Fish
Nearly ALL of the students who took Harry's courses were NOT
statisticians.
let alone mathematicians, but who are ENLIGHTENED consumers of
Statistics.
They included many presidents and high level officers of business
corporations who analyzed their company data THEMSELVES, and did them
WELL.
This is useful exposition to me, Bob.

It says something beyond your basic lecture; and beyond your
one-word replies whenever I have asked a question, or laid
out a counter-example.
Post by Reef Fish
The MBA studenst at Chicago make excellent examples of statistics
consumers and USERS (those who actually DO statistical analyses) who
1. I am quite certain none of them are the "expected sign" ABUSERS
such as those we have seen in this forem and elsewhere.
Okay. I see that they are not doing science, either, so
that is just fine. I don't object to their using regression
that way -- I don't think I ever did, did I?

I still don't see your *principle* for opposing regression in
another way than your own - and you sloughed off my
suggestions when I suggested a couple of specific practical
reasons.


Today, I understand a news report that stuck in my mind for
years. - There was statistical testimony in a discrimination case
about banks' "red-lining" certain communities (drawing a red line
on the map, around the areas where loans would not be offered).

The testimony claimed, basically, what you do: Areas were
defined by something like discriminant function. Nobody was
crediting the coefficients with "meaning anything," so they did
not *mind* that they were refusing to loan, based somewhat on
the percentage of minority population.
- i think the judge told them to stop.
Post by Reef Fish
2. The "expected sign" fallacy is not only TECHNICALLY flawed in terms
of mistaking a simple correlation for a partial correlation, but
is obviously promoted by the faulty notion that a regression model
necessarily EXPLAIN any phenomenon, rather than merely FITTING data
to some model which may or may not PREDICT a future outcome well.
Okay, let's be more precise. It isn't a notion that a regression
model *will* explain; it is a notion that a regression model
*might* explain. It is a *different* application that pure
prediction. There is a procedure of modeling, within a research
program of refining concepts and measures. If it were always
successful, it would probably be too easy to be useful.

In epidemiology, there is external model-confirmation from
biological studies. I think that there are plenty of good examples
(which you have not dared to face) of using the coefficients
to confirm "which model to believe."


Here is a book review from the Journal of Marketing Research,
Feb. 2002, on a couple of books on causality and causation.
http://bayes.cs.ucla.edu/BOOK-2K/rigdon-review.pdf
One book is by Pearl, the other by Spirtes, Glymour, and
Scheines.

The review is relevant enough that it cites Bob's 1982
review of Kenny, before moving on the substance that
(I figure) Bob must continue to object to.
Post by Reef Fish
The "price of eggs in China" may indeed FIT many observational data
in Y well, without EXPLAINING anything about the phenomenon of FIT.
And, for that reason, YOUR school would use "the price
of eggs" in China; without noting that it is not "reasonable"
and therefore, it is not likely to be *robust* or scientifically
useful. (Obviously, your school should be preferred on all
counts, if there are no possible 'reasonable' models in the
universe.... ) And, "the price of eggs" might be worth
looking at closer, for refinement, if the prediction did not
fail as fast as most specious correlations do. (surrogate for
weather?)
Post by Reef Fish
3. Most of them do not NEED to know anything about "Measure Theory"
or measure-theoretic Probability Theory to do APPLIED statistics,
and do them WELL. In fact, I would bet Herman Rubin my farm that
NONE of the MBA students dod. :-)
My comment (3) above is directed toward Herman Rubin, on our many
very strong disagreement as to what is "applied statistics" and what
is needed to apply statistics intelligently and apply it well. Herman
and I have no problem with each other, standing AGREE TO DISAGREE on
our OPINIONS about applied statistics.
My comments (1) and (2) above is directed toward Richard Ulrich and
other "expected sign" ABUSERS in multiple regression. There, it's not
a matter of opinion that Richard and the others are WRONG, but an
unequivocal FACT that they are WRONG, and DEFICIENT in their knowledge
about the theory and methodology of Multiple Regression. There is NO
ROOM in their cases for me to point out their errors as BLUNDERS,
and no room to "agree to disagree".
What I find unequivocal is that I have replied, in sentences,
to every *point* that I could discern Bob making.

In contrast, Bob has failed to articulate - beyond
single words like "irrelevant" - any response to
(a) direct questions, and (b) counter-examples to his points.

Oh, yes, besides single words, he comes out with personal
calumnies. That's given me new referents for literary
descriptions that I never appreciated as widely as now,
"inarticulate rage" and "incoherent mumbling."
Post by Reef Fish
In spite of the tedious rhetoric and obfuscation provided by Richard
Ulrich and others, they are 100% WRONG in their notion they can
argue a priori about the "expected sign" without reference to, or
without perfect knowledge about, the OTHER variables in the regression
model.
Try this: You are grabbing hold of the wrong end of the stick.
We would say: If the sign is wrong, then the model is wrong,
or the variables are wrong; so something needs more work.

The "research programs" are in progress - still trying to
avoid tautology. The review in the reference I cited (above)
faults SEM - fairly, I think - for being a practice that has
been largely tautological. I forget exactly how they said that.
Post by Reef Fish
Thank you again for your valuable contribution to this "discussion"
which was beginning to turn from a discussion to full-blown flamewars,
between myself and Richard and a couple others in the Blunder Group.
In 10 years of reading and posting, I don't remember any
"full-blown flame wars" in the sci.stat.* hierarchy.
W. Chambers attacked anyone, occasionally (even) the folks
siding with him. I responded in various ways, as did others,
but not at all "tit for tat" or showing vituperation. That would
have to be #1, such as it was, if you count his periodic
re-appearances as continuations.

What you have posted recently, mild compared to his,
combined with my even-milder comments,
is probably #2 on our "flame war list" already.
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
2005-04-29 00:41:30 UTC
Permalink
Post by Richard Ulrich
- commenting on Dave's post and Bob's reply.
On 27 Apr 2005 09:22:31 -0700, "Reef Fish"
Dave > I recall as a young man I asked Prof. Kendall about the
phenomena
Post by Richard Ulrich
Post by Reef Fish
where
Dave > after adding 1 more variable to my mirm the sign of one of my
other
Dave > (all significant ) x's changed sign. He responded that these
were
Post by Richard Ulrich
I notice that *one* of them changed sign, not dozens.
Richard, you're missing Dave's point, Kendall's point, my point,
and Mosteller and Tukey's point, and the point that everyone who
understands Multiple Regression would know -- that you CANNOT "expect
any sign" to be positive or negative. How many signs change is
IRRELEVENT <that's why you are irrelevant> and beside the point!
Post by Richard Ulrich
However, some of that "predictive modeling" (stocks? etc.) is
going to be re-done every year, in any case.
There are NO valid predictive models in stocks based on historical
prices of a single stock, a group of stocks or the entire stock
market, whether it's on a daily, weekly, monthly or yearly.
Chicago GSB is the stronghold of the Efficient Market Hypothesis,
where several of my former colleague are Nobel winners in economics
including Mert Miller who was even NOT an economist but one in
Finance, and strong contributor to the "monkey and the dart" method
of picking and predicting stocks.
Post by Richard Ulrich
Time series in business applications - I keep thinking
to myself that, surely, economists keep arguing "causation"
from their primary factors, such as what happens when the
Fed changes the discount rate. But then they keep getting
surprised by new phenomena, so it is wiser for them to
consistently refer back to "this is what we have seen in the
past" instead of claiming knowledge of "causation."
They are making EXACTLY the same blunders as you and other social
scientist are, in your use of Multiple Regression to "explain" a
phenomenon or to assertain CAUSE, without any valid basis. That's
exactly why I indicted the economists and social scientists in my
book review on "Correlation and Causation" as Quacks advocating
black magic.
Post by Richard Ulrich
Still, economics seems to fall into a "predictive/science"
framework, I think. That is, when an expectation *fails*, from
changing an interest rate or something else,
a slew of "explanations" are generated which provide
new variables for new models.
Your few comments on economics pretty much shows that your knowledge
about economics is on par with your knowledge about Multiple Regression
-- next to ZERO. But why should that keep you from posting on subjects
you know nothing about?
Post by Richard Ulrich
Bob/Reef Fish > <thanked Dave Reilly>
Post by Reef Fish
Thanks for your cogent and on-the-mark remarks on the SUBJECT of this
thread and your informative references to George Box, Kendall, and
Harry Hoberts on the topic.
Although your comments were more directly related to time series, the
idea of expected sign IN THE PRESENCE of OTHER (correlated)
variables
Post by Richard Ulrich
Post by Reef Fish
in a Multiple Regression are imbedded in all your references.
Harry
Post by Richard Ulrich
Post by Reef Fish
Roberts used Multiple Regression methods to analyze time-series data
using Box-Jenkins' ARIMA models!
[snip, for space. Harry Roberts...
Post by Reef Fish
Nearly ALL of the students who took Harry's courses were NOT
statisticians.
let alone mathematicians, but who are ENLIGHTENED consumers of
Statistics.
They included many presidents and high level officers of business
corporations who analyzed their company data THEMSELVES, and did them
WELL.
This is useful exposition to me, Bob.
I wish I knew how, in the light of your foregoing and subsequent
comments.
Post by Richard Ulrich
It says something beyond your basic lecture; and beyond your
one-word replies whenever I have asked a question, or laid
out a counter-example.
I've taught some of the same students Harry and others at Chicago
taught. The students there UNDERSTOOD the theory and methods as
presented to them; or once an answer is given, they don't ARGUE ad
infinitum as you do because you are too obtuse to understand the
simple principles. My "Data Analysis" course (an advanced course)
at UC GSB used to get students who rated that they (on average)
worked 13 or 14 hours on that course a week (harder than they
worked on any other course, whose average was about 4-8 per week
for other courses) and consistently rated my course and my
teaching around 4.5 (5 would be perfect; on a scale of 1 to 5) --
about 5 out of every 100 courses at the UC GSB got that kind of
high rating; in professor rating, course rating; or time spent.

In most OTHER universities, the "expected sign" of the SIMPLE
correlation between professor rating and "difficulty of course
of time actually spent outside class per week" would be strongly
NEGATIVE.

Not so at the Chicago GSB! Some of the worse professor ratings
were on those who taught "gut" courses requiring no more than 2 or
3 hours a week outside of class. The GSB ratings reflect both
the QUALITY and the MATURITY of the STUDENTS there.

That's why thay are among the highest in demand when they graduate.


You are getting freebie lectures here. You waste ALL your time
resisting some simple statistical facts in theory and methods, instead
of trying to LEARN them.
Post by Richard Ulrich
Post by Reef Fish
The MBA studenst at Chicago make excellent examples of statistics
consumers and USERS (those who actually DO statistical analyses) who
1. I am quite certain none of them are the "expected sign" ABUSERS
such as those we have seen in this forem and elsewhere.
Okay. I see that they are not doing science, either,
Science has NOTHING to do with using Multiple Regression methodology
CORRECTLY? How obtuse can you get, Richard? Here's it would have
been another appropriate "IRRELEVANCE" label to your remark.
Post by Richard Ulrich
I still don't see your *principle* for opposing regression in
another way than your own - and you sloughed off my
suggestions when I suggested a couple of specific practical
reasons.
It's not "my own". It's THE only correct way for anyone -- to
understand the methodology and the "expect sign" fallacy and BLUNDER.
Post by Richard Ulrich
Today, I understand a news report that stuck in my mind for
years. - There was statistical testimony in a discrimination case
about banks' "red-lining" certain communities (drawing a red line
on the map, around the areas where loans would not be offered).
That's still another one of your IRRELEVANT tangent.
Post by Richard Ulrich
Post by Reef Fish
2. The "expected sign" fallacy is not only TECHNICALLY flawed in terms
of mistaking a simple correlation for a partial correlation, but
is obviously promoted by the faulty notion that a regression model
necessarily EXPLAIN any phenomenon, rather than merely FITTING data
to some model which may or may not PREDICT a future outcome well.
Okay, let's be more precise. It isn't a notion that a regression
model *will* explain; it is a notion that a regression model
*might* explain.
NO. A regression model can NEVER explain, nor is it ever intended to.

< More IRRELEVANCE snipped >
Post by Richard Ulrich
Here is a book review from the Journal of Marketing Research,
Feb. 2002, on a couple of books on causality and causation.
http://bayes.cs.ucla.edu/BOOK-2K/rigdon-review.pdf
One book is by Pearl, the other by Spirtes, Glymour, and
Scheines.
The Journal of Marketing Research is not exactly the poster boy for
good or correct application of statistics.
Post by Richard Ulrich
The review is relevant enough that it cites Bob's 1982
review of Kenny, before moving on the substance that
(I figure) Bob must continue to object to.
The reviewer was merely paying lip-service to my "scathing review",
and proceeded on the same sentence to say Kenny's book as "a classic
text well worth reading". He should have told the readers to read
MY review instead of Kenny's quackery.

The reviewer is just another "causal inderence" QUACK, though
through his rhetoric, he tried to give the impression that he knew
something about the proper use of Regression methodology.


What I said in my 1982 JASA review will stand FOREVER. Nothing will
ever change it. It says exactly the same thing as I am saying now,
that any "expected sign" fallacy, and the invalid use of multiple
regression (in the absence of a controlled experiment) is QUACKERY
of the social "scientists", economist, and others who practice that
kind of voodoo and call it statistics.
Post by Richard Ulrich
Post by Reef Fish
The "price of eggs in China" may indeed FIT many observational data
in Y well, without EXPLAINING anything about the phenomenon of FIT.
Your comment is beneath contempt, all because of your complete lack of
understanding on how Multiple Regression methodology SHOULD be
practiced,
according to EVERY competent statistician I know -- including Box,
Kendall, Robers, Mosteller, Tukey -- those mentioned in this post.
Post by Richard Ulrich
Post by Reef Fish
My comments (1) and (2) above is directed toward Richard Ulrich and
other "expected sign" ABUSERS in multiple regression. There, it's not
a matter of opinion that Richard and the others are WRONG, but an
unequivocal FACT that they are WRONG, and DEFICIENT in their
knowledge
Post by Richard Ulrich
Post by Reef Fish
about the theory and methodology of Multiple Regression. There is NO
ROOM in their cases for me to point out their errors as BLUNDERS,
and no room to "agree to disagree".
What I find unequivocal is that I have replied, in sentences,
to every *point* that I could discern Bob making.
See the above. You missed EVERY POINT I've ever made on the subject,
and you kept making the SAME ERROR and thought you discerned my points.
Post by Richard Ulrich
In contrast, Bob has failed to articulate - beyond
single words like "irrelevant" -
I explained WHY some of your remarks are IRRELEVANT above, relative
to what you argued this round.
Post by Richard Ulrich
Post by Reef Fish
Thank you again for your valuable contribution to this "discussion"
which was beginning to turn from a discussion to full-blown
flamewars,
Post by Richard Ulrich
Post by Reef Fish
between myself and Richard and a couple others in the Blunder Group.
< More IMPERTINENCE <like that better? <g>> snipped.

-- Bob.
Post by Richard Ulrich
--
http://www.pitt.edu/~wpilib/index.html
Bruce Weaver
2005-04-18 14:53:16 UTC
Permalink
Post by Richard Ulrich
On Sun, 17 Apr 2005 15:51:27 +0100, tHatDudeUK
Post by tHatDudeUK
Hi,
My sample size is 149. I have one dependent variable and 10
independent (or predictor) variables which I'm analysing using
multiple linear regression (with the enter method).
Stepwise regression is seldom a good idea. None of the tests
are good. For a typical problem, the wrong variables are chosen.
Where is it good? -- When you know there will be a prediction
from any set of the variables, and you want a short equation.
I don't think the OP was doing stepwise, were they? The "enter" method
(in SPSS) means that the OP chose which variables to include, and forced
them into the model.
--
Bruce Weaver
***@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
Loading...