Computing a new Variable: 5 categorical variables into 1 categorical variable

Discussion:

(too old to reply)

Joy Nico

2020-04-02 13:31:11 UTC

Hi :)

I want to do a cross-sectional analysis
1st) to see what life domains (categorical variable) people chose. I am thinking of a frequency analysis.

2nd) to see, what life domains people from different age-groups chose (once in 2 age groups, once in 3 --> so a categorical variable with 2 or 3 values). Still unsure, a frequency analysis seperately for each age-group...? Was trying Binomial and Chi2-Test but did not find out, how I can compute it seperately for different values in my group-variable.

3rd) I want to compare, if age groups do differ significantly in the most often chosen life domains or least chosen respectively. I think a Chi2-Test would be suitable. Unsure yet.

Now to my question:

I did generate 1 variable for the age groups, this worked. For the domains it did not work so far: I somehow have to generate a new variable, containing 5 values.

My domains are now in 5 variables as follows:
Aspect_1_t19
Aspect_2_t19
Aspect_3_t19
Aspect_4_t19
Aspect_5_t19

Each participant has those 5 variables for the aspects (since I wanted to get their 5 most important life aspects).
Every Aspect-variable contains a value from 1 to 19, representing an aspect (family, friendship, work, finances etc.).

I do not want to add them up, since the info would then be lost. I tried the code provided from spss but it did lead to error messages and absurdly high values instead of a list or string of variables:
https://www.spss-tutorials.com/combine-categorical-variables/

I would be very grateful for some tips/advise!

Joy

Bruce Weaver

2020-04-02 15:36:20 UTC

Permalink

Post by Joy Nico
Hi :)
I want to do a cross-sectional analysis
1st) to see what life domains (categorical variable) people chose. I am thinking of a frequency analysis.
2nd) to see, what life domains people from different age-groups chose (once in 2 age groups, once in 3 --> so a categorical variable with 2 or 3 values). Still unsure, a frequency analysis seperately for each age-group...? Was trying Binomial and Chi2-Test but did not find out, how I can compute it seperately for different values in my group-variable.
3rd) I want to compare, if age groups do differ significantly in the most often chosen life domains or least chosen respectively. I think a Chi2-Test would be suitable. Unsure yet.
I did generate 1 variable for the age groups, this worked. For the domains it did not work so far: I somehow have to generate a new variable, containing 5 values.
Aspect_1_t19
Aspect_2_t19
Aspect_3_t19
Aspect_4_t19
Aspect_5_t19
Each participant has those 5 variables for the aspects (since I wanted to get their 5 most important life aspects).
Every Aspect-variable contains a value from 1 to 19, representing an aspect (family, friendship, work, finances etc.).

What do the values 1-19 represent? Are they scores of some kind? Are means & SDs sensible?

--- snip the rest ---

Rich Ulrich

2020-04-02 17:55:45 UTC

Permalink

Here is my guess as to what your data looks like.
You do have age, which you have successfully recoded into
two new variables for groups.

You have 5 variables which show what participants
consider important, among 19 "domains". For each person,
Aspect_1 ranks as their most important, Aspect_5 ranks 5th.

You can do Freq to see counts for each.
You can do Mult-response to see counts for each, and
for counts per age group -- but you can't do statistical
testing on those numbers because they are not independent.

You can do a meaningful chi-squared contingency test
on Age_gr by Aspect_1, "most important". It is not very
meaningful to consider the same test for Aspect_2 to _5.
This test might lack statistical power if some of the 19
domains have very low counts. In that case, you might
consider combining some categories, creating a new variable
using RECODE ... / into= ... .

If you want to see testing about the 19 domains, the obvious
route is to create 19 new variables, one for each domain.

That would be something like this (untested) --

comment create 19 variables with format as F2.
vector Dom(19, F2).
Loop # = 1 to 19.
compute Dom(#) = ANY( #, Aspect_1_t19 to Aspect_5_t19).
end loop.

value labels Dom1 to Dom19 "0" no "1" yes.
var labels Dom1 "family" ....etc....

Then you can do testing on whether age groups are
similar in their profiles for how often they used each
domain, comparing each one of the 19.

crosstabs vars= age_gr by dom1 to dom19.

You pointed to the "Combining variables" help that was
not useful to you. I would consider using that technique
for creating a new variable using only a few of the most
frequent among the 0/1 variables. If there are patterns
that could be interesting.

Of course, if I have misunderstood what your basic data
look like, please post a revised description.

--
Rich Ulrich

Joy Nico

2020-04-03 08:51:48 UTC

Permalink

Hi Rich Ulrich,
Thank you for your reply!

I used an altered version of the SEIQoL-DW to asses Quality of Life. The SEIQoL-DW consists of 3 steps, I am using the first step (and in other analyses the overall score).
In the first step the respondents chose their 5 most important aspects (in any order, so aspect 1 does not necessarily need to be the most important).
This is unfortunate now, since it means, that I can just look overall, which domains do occur the most often in the entire sample and whithin different groups (you are right, I did compute new variables containing a coding for age).
This is why I would love to be able to compute a new variable, which contains all 5 Aspects for each respondent in a list or something. So I could see the frequency of a domain at 1 timepoint over all 5 answers/aspects a respondent gave.

Did I make my data-structure and problems stemming from it clear? If not, I gladly try again :)

I will think more about the option, you provided, the problem I see with it, is that percentage and everything will be for only one variable, which is only 1/5th of the answer.
If you have any further tips, I will take them gladly! :)
Thank you so much for your time!

Sincerely,
Joy Tieg

Post by Rich Ulrich

Here is my guess as to what your data looks like.
You do have age, which you have successfully recoded into
two new variables for groups.
You have 5 variables which show what participants
consider important, among 19 "domains". For each person,
Aspect_1 ranks as their most important, Aspect_5 ranks 5th.
You can do Freq to see counts for each.
You can do Mult-response to see counts for each, and
for counts per age group -- but you can't do statistical
testing on those numbers because they are not independent.
You can do a meaningful chi-squared contingency test
on Age_gr by Aspect_1, "most important". It is not very
meaningful to consider the same test for Aspect_2 to _5.
This test might lack statistical power if some of the 19
domains have very low counts. In that case, you might
consider combining some categories, creating a new variable
using RECODE ... / into= ... .
If you want to see testing about the 19 domains, the obvious
route is to create 19 new variables, one for each domain.
That would be something like this (untested) --
comment create 19 variables with format as F2.
vector Dom(19, F2).
Loop # = 1 to 19.
compute Dom(#) = ANY( #, Aspect_1_t19 to Aspect_5_t19).
end loop.
value labels Dom1 to Dom19 "0" no "1" yes.
var labels Dom1 "family" ....etc....
Then you can do testing on whether age groups are
similar in their profiles for how often they used each
domain, comparing each one of the 19.
crosstabs vars= age_gr by dom1 to dom19.
You pointed to the "Combining variables" help that was
not useful to you. I would consider using that technique
for creating a new variable using only a few of the most
frequent among the 0/1 variables. If there are patterns
that could be interesting.
Of course, if I have misunderstood what your basic data
look like, please post a revised description.
--
Rich Ulrich

Bruce Weaver

2020-04-03 14:12:04 UTC

Permalink

Post by Joy Nico
Hi Rich Ulrich,
Thank you for your reply!
I used an altered version of the SEIQoL-DW to asses Quality of Life.

If Rich (or anyone else) has time to help you, it might help them to have some information about how to score this thing. E.g.,

https://www.researchgate.net/publication/237753111_Schedule_for_the_Evaluation_of_Individual_Quality_of_Life_SEIQoL_a_Direct_Weighting_procedure_for_Quality_of_Life_Domains_SEIQoL-DW

Joy Nico

2020-04-03 15:53:32 UTC

Permalink

Thank you for the advise! I will keep that in mind, for the case that I have to come back again. It seems that I have found a solution: I needed to transform my dataset, so that 1 person had several lines. In that changed set I had to compute a new variable.... Thank you all for the help!

Post by Bruce Weaver

Post by Joy Nico
Hi Rich Ulrich,
Thank you for your reply!
I used an altered version of the SEIQoL-DW to asses Quality of Life.

If Rich (or anyone else) has time to help you, it might help them to have some information about how to score this thing. E.g.,
https://www.researchgate.net/publication/237753111_Schedule_for_the_Evaluation_of_Individual_Quality_of_Life_SEIQoL_a_Direct_Weighting_procedure_for_Quality_of_Life_Domains_SEIQoL-DW

Rich Ulrich

2020-04-04 04:03:33 UTC

Permalink

Post by Joy Nico
Thank you for the advise! I will keep that in mind, for the case that I have to come back again. It seems that I have found a solution: I needed to transform my dataset, so that 1 person had several lines. In that changed set I had to compute a new variable.... Thank you all for the help!

The big problem with that solution is that you have 5
lines for every person, so every tabulation has 5 times
the count for the total number of people. - You can
ask for statistical tests which will highlight which
differences are highest, but they are not reportable
as valid tests.

That also does not get you any "combination" of domains
which you were interested in.

I'll repeat what I suggested -
If you use "Mult response", you can get the fraction
of responses "per person" as well as "per total responses."

I've looked at the source Bruce cited. The "most frequently
used" seems to offer a start for looking at combination of
domains. Computing combinations will also be made
easier if you follow what I suggested by creating 19
new variables, No/Yes for each of the 19 domains.

Combine just 3 (say) of them at a time. Use the most
frequently mentioned domains.

If a particular domain is relevant to age, you might
use that, instead. Or start with the 3 most relevant to
age.

--
Rich Ulrich