- commenting on Dave's post and Bob's reply.

On 27 Apr 2005 09:22:31 -0700, "Reef Fish"

[snip, autoregression problem; sign behavior. ]

*Post by Reef Fish**Post by d***@autobox.com*As George Box pointed out, if you don't physically change the X

values ( as you do in a designed experiment ) you can't unambiguously

state what would happen if you do !. The regression coefficient in a

mirm reflects the expected change in Y if and only if when you change

that 1 X none of the other X's change (orthogonality assumption ).

Analysts who endeavor to do marginal analysis of a mirm result need

to

"Be aware of this" is a minimal condition.

Starting from a simple, small model, it is frequently possible

to tell what is going on. You can track down confounding,

sometimes, by looking at the behavior of particular residuals

between one step and another. That is probably more difficult

in cross-correlated time series, compared to flat models.

Whether you want to bother refining variables may depend on

whether your aim is "accurate prediction" (for a small universe)

or scientific generalization.

*Post by Reef Fish**Post by d***@autobox.com*I recall as a young man I asked Prof. Kendall about the phenomena

where

*Post by d***@autobox.com*after adding 1 more variable to my mirm the sign of one of my other

(all significant ) x's changed sign. He responded that these were

I notice that *one* of them changed sign, not dozens.

For a scientist trying to write a valid generalization, that

is a warning about a couple of variables.

For a statistical technician with efforts limited to prediction,

I think it *can* provide the same warning. That would be so,

especially, if it is seen a lot of times; and if those predictions

(pragmatically) turn out to be less robust to changes. "This variable

is frequently confounded: WHY?" - That check can reveal

undesirable artifacts, and lead to improvement in measurement,

choice of variables, what-have-you.

However, some of that "predictive modeling" (stocks? etc.) is

going to be re-done every year, in any case.

*Post by Reef Fish**Post by d***@autobox.com*simply a simultaneous set of weights (optimized to minimize the error

sum of squares ) and as such each had no meaning whatsoever just that

together they best predicted (fit) the observed y values from the set

of X's. I asked what about the marginal impact of each of the input

series and he reflected that no such marginal anlysis was possible if

the input series were cross-correlated with each other.

Another point suggested by Prof. Harry Roberts was that regression

was a particular case of time series (Transfer Function ) and that

all

*Post by d***@autobox.com*that was relevant to regression was also relevant to time series

analysis thus it is quite possible to have "bad signs" for

coefficients in a Transfer Function Model because it simply reflects

the "simultaneous" response of the Y variable to the collective

impact of the X variables..

[snip, sig]

Time series in business applications - I keep thinking

to myself that, surely, economists keep arguing "causation"

from their primary factors, such as what happens when the

Fed changes the discount rate. But then they keep getting

surprised by new phenomena, so it is wiser for them to

consistently refer back to "this is what we have seen in the

past" instead of claiming knowledge of "causation."

Still, economics seems to fall into a "predictive/science"

framework, I think. That is, when an expectation *fails*, from

changing an interest rate or something else,

a slew of "explanations" are generated which provide

new variables for new models.

Bob/Reef Fish >

*Post by Reef Fish*Thanks for your cogent and on-the-mark remarks on the SUBJECT of this

thread and your informative references to George Box, Kendall, and

Harry Hoberts on the topic.

Although your comments were more directly related to time series, the

idea of expected sign IN THE PRESENCE of OTHER (correlated) variables

in a Multiple Regression are imbedded in all your references. Harry

Roberts used Multiple Regression methods to analyze time-series data

using Box-Jenkins' ARIMA models!

[snip, for space. Harry Roberts...

*Post by Reef Fish*Nearly ALL of the students who took Harry's courses were NOT

statisticians.

let alone mathematicians, but who are ENLIGHTENED consumers of

Statistics.

They included many presidents and high level officers of business

corporations who analyzed their company data THEMSELVES, and did them

WELL.

This is useful exposition to me, Bob.

It says something beyond your basic lecture; and beyond your

one-word replies whenever I have asked a question, or laid

out a counter-example.

*Post by Reef Fish*The MBA studenst at Chicago make excellent examples of statistics

consumers and USERS (those who actually DO statistical analyses) who

1. I am quite certain none of them are the "expected sign" ABUSERS

such as those we have seen in this forem and elsewhere.

Okay. I see that they are not doing science, either, so

that is just fine. I don't object to their using regression

that way -- I don't think I ever did, did I?

I still don't see your *principle* for opposing regression in

another way than your own - and you sloughed off my

suggestions when I suggested a couple of specific practical

reasons.

Today, I understand a news report that stuck in my mind for

years. - There was statistical testimony in a discrimination case

about banks' "red-lining" certain communities (drawing a red line

on the map, around the areas where loans would not be offered).

The testimony claimed, basically, what you do: Areas were

defined by something like discriminant function. Nobody was

crediting the coefficients with "meaning anything," so they did

not *mind* that they were refusing to loan, based somewhat on

the percentage of minority population.

- i think the judge told them to stop.

*Post by Reef Fish*2. The "expected sign" fallacy is not only TECHNICALLY flawed in terms

of mistaking a simple correlation for a partial correlation, but

is obviously promoted by the faulty notion that a regression model

necessarily EXPLAIN any phenomenon, rather than merely FITTING data

to some model which may or may not PREDICT a future outcome well.

Okay, let's be more precise. It isn't a notion that a regression

model *will* explain; it is a notion that a regression model

*might* explain. It is a *different* application that pure

prediction. There is a procedure of modeling, within a research

program of refining concepts and measures. If it were always

successful, it would probably be too easy to be useful.

In epidemiology, there is external model-confirmation from

biological studies. I think that there are plenty of good examples

(which you have not dared to face) of using the coefficients

to confirm "which model to believe."

Here is a book review from the Journal of Marketing Research,

Feb. 2002, on a couple of books on causality and causation.

http://bayes.cs.ucla.edu/BOOK-2K/rigdon-review.pdf

One book is by Pearl, the other by Spirtes, Glymour, and

Scheines.

The review is relevant enough that it cites Bob's 1982

review of Kenny, before moving on the substance that

(I figure) Bob must continue to object to.

*Post by Reef Fish*The "price of eggs in China" may indeed FIT many observational data

in Y well, without EXPLAINING anything about the phenomenon of FIT.

And, for that reason, YOUR school would use "the price

of eggs" in China; without noting that it is not "reasonable"

and therefore, it is not likely to be *robust* or scientifically

useful. (Obviously, your school should be preferred on all

counts, if there are no possible 'reasonable' models in the

universe.... ) And, "the price of eggs" might be worth

looking at closer, for refinement, if the prediction did not

fail as fast as most specious correlations do. (surrogate for

weather?)

*Post by Reef Fish*3. Most of them do not NEED to know anything about "Measure Theory"

or measure-theoretic Probability Theory to do APPLIED statistics,

and do them WELL. In fact, I would bet Herman Rubin my farm that

NONE of the MBA students dod. :-)

My comment (3) above is directed toward Herman Rubin, on our many

very strong disagreement as to what is "applied statistics" and what

is needed to apply statistics intelligently and apply it well. Herman

and I have no problem with each other, standing AGREE TO DISAGREE on

our OPINIONS about applied statistics.

My comments (1) and (2) above is directed toward Richard Ulrich and

other "expected sign" ABUSERS in multiple regression. There, it's not

a matter of opinion that Richard and the others are WRONG, but an

unequivocal FACT that they are WRONG, and DEFICIENT in their knowledge

about the theory and methodology of Multiple Regression. There is NO

ROOM in their cases for me to point out their errors as BLUNDERS,

and no room to "agree to disagree".

What I find unequivocal is that I have replied, in sentences,

to every *point* that I could discern Bob making.

In contrast, Bob has failed to articulate - beyond

single words like "irrelevant" - any response to

(a) direct questions, and (b) counter-examples to his points.

Oh, yes, besides single words, he comes out with personal

calumnies. That's given me new referents for literary

descriptions that I never appreciated as widely as now,

"inarticulate rage" and "incoherent mumbling."

*Post by Reef Fish*In spite of the tedious rhetoric and obfuscation provided by Richard

Ulrich and others, they are 100% WRONG in their notion they can

argue a priori about the "expected sign" without reference to, or

without perfect knowledge about, the OTHER variables in the regression

model.

Try this: You are grabbing hold of the wrong end of the stick.

We would say: If the sign is wrong, then the model is wrong,

or the variables are wrong; so something needs more work.

The "research programs" are in progress - still trying to

avoid tautology. The review in the reference I cited (above)

faults SEM - fairly, I think - for being a practice that has

been largely tautological. I forget exactly how they said that.

*Post by Reef Fish*Thank you again for your valuable contribution to this "discussion"

which was beginning to turn from a discussion to full-blown flamewars,

between myself and Richard and a couple others in the Blunder Group.

In 10 years of reading and posting, I don't remember any

"full-blown flame wars" in the sci.stat.* hierarchy.

W. Chambers attacked anyone, occasionally (even) the folks

siding with him. I responded in various ways, as did others,

but not at all "tit for tat" or showing vituperation. That would

have to be #1, such as it was, if you count his periodic

re-appearances as continuations.

What you have posted recently, mild compared to his,

combined with my even-milder comments,

is probably #2 on our "flame war list" already.

--

Rich Ulrich, ***@pitt.edu

http://www.pitt.edu/~wpilib/index.html