Post by y***@hotmail.comHello Rich,
it was very nice of you to reply my question, as I actually didn't expect any response since it was a 6 year-old post.
The OP's question basically hit most of the concerns I have with my current data analysis. unfortunately, I don't have a strong background in statistic and am in a process in self-learning most of the statistic knowledge. So i was having hard time understanding both your and Bruce's comments.
I have used the Kolmogorov-Smirnov normality test.
the reason the data was not normal largely because there are a lot of zeros in the data (continuous numeric data with absolute zero), which make it skewed positively. My supervisor advice me to try clean up the outlier, if possible, and then try data transformation. however, I have done both and yet the most of the dependent
variables still violate assumption of normality.
Post by y***@hotmail.comso I turned to non-parametric solution. my research has between subject component (2 groups), and time (3 time points) as within subject component.
Since there is no non-parametric equivalence of mixed design ANOVA, I have to find a solution that is similar to what parametric ANOVA does and self-learn how to do that on SPSS.
I have examined Friedman's test, Mann-Whitney U test, Kruskal-Wallis H test, and Wilcoxon signed rank test. as i saw it, most of them are based on the rank of the dataset and is only partial solution to my analyzing goal. So I was trying to find if there is any way to work around that. i wonder if i still can analyze the
interaction effect (group x time) under this context?
Post by y***@hotmail.comIf run the between group and within group tests separately on my data, what problems/issue would follow by doing so?
that's why I post the question and try to see how other researchers usually deal with these kind of situation.
The most common thing that researchers and statisticians
do about non-normality is IGNORE IT. And for pretty
good reasons.
Taking a rank-transformation of the scores is the starting
point for (all?) those tests you mention. When you replace
your scores with their rank-transformed versions ... DO you
get a set of numbers that improve the "interval" distances
between what you think those original scores should
represent? If the original scores look better, more "equal
interval", then you don't want the loss of detail from converting
to ranks.
Non-parametric tests.
- By the way, their complicated formulas (in their simple form)
include the assumption that there are no ties. So, your
data, with many zeroes, also fail to meet the assumptions
for the "exact" nonparamentric tests. However, there are
"approximations" available. About them -
WJ Conover showed in the 1980s that most of the rank-
order tests with complicated formulas can be replaced by
performing ANOVA on the rank-transformed numbers.
Conover showed that the ANOVA on ranks can be better
(more accurate tests) than the approximations in use by
stat packages, especially when there are many ties.
Outliers and zeroes.
You mention "outlier" as if (maybe) you have just one.
Should you (maybe) drop that case, and mention it only
as a case report, because the score is so very atypical?
Or should one (or more) extreme be drawn in, scored as
the next-highest value? What makes sense?
"Many zeroes" is sometimes the justification to rescore
everything as 0/1. Does that lose much sense in your
data? Do those other values matter? I don't know of
anybody giving advice on this topic, but if half your
scores are zero, I think you have a good case to (at
least) try out this alternate scoring.
Non-normality.
The problem with non-normality in the residuals of the
model-fit is that the resulting F-test might not be accurate;
it might reject too often, or it might reject too seldom.
But ANOVA is pretty robust. Analyzing a 0/1 variable is
not a problem between 20%-80%. And it is not a
problem for rank-transformed data, if the transformation
does not screw up the intervals more than it helps them.
Now, if you have one or more cases where all their scores
are zero, that could distort the picture. "Too many zeroes"
in repeated measures is where the Greenhouse–Geisser
correction is used.
Hope this helps.
--
Rich Ulrich