Discussion:
computing new variable with lots of 'IF's
(too old to reply)
Erin Holloway
2020-08-20 06:08:06 UTC
Permalink
So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other variable names change the same way _1 to _30.
I assume its a loop? not sure how to write it though?
The variables are all numeric

DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
COMPUTE Participation_1 = $SYSMIS.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
COMPUTE Participation_1 = 0.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
COMPUTE Participation_1 = 1.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 2.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 3.
END IF.
EXECUTE.
Bruce Weaver
2020-08-20 13:03:44 UTC
Permalink
Post by Erin Holloway
So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other variable names change the same way _1 to _30.
I assume its a loop? not sure how to write it though?
The variables are all numeric
DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
COMPUTE Participation_1 = $SYSMIS.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
COMPUTE Participation_1 = 0.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
COMPUTE Participation_1 = 1.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 2.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 3.
END IF.
EXECUTE.
Hello Erin. To answer one of your questions, you can use a DO-REPEAT structure to loop through the 30 variables.

Regarding your DO-IF structure, I found myself wondering if it was going to work the way you intended. E.g., I wondered if you needed more parentheses, as follows:

ELSE IF (a EQ value) OR (b eq value) OR (c EQ value).

If it is that basic structure each time, you can use the ANY function instead, like this:

ELSE IF ANY(a EQ value, b EQ value, c EQ value).

With that in mind, I wondered if this is close to what you want:

DO REPEAT
cs = CurrentlyStudying_status_1 to CurrentlyStudying_status_30 /
ce = CurrentlyEmployed_1 to CurrentlyEmployed_30 /
lw = LookingForWork_1 to LookingForWork_30 /
part = Participation_1 to Participation_30.
DO IF NMISS(cs, ce, lfw) EQ 3.
COMPUTE part = -999. /* Better not to use $SYSMIS.
ELSE IF ce EQ 0 AND cs EQ 0 and lw EQ 6.
COMPUTE part = 0.
ELSE IF ce EQ 0 AND cs EQ 0 AND lw EQ 9.
COMPUTE part = 1.
ELSE IF ANY(ce EQ 2 AND cs EQ 0, ce EQ 1 AND cs EQ 0, ce EQ 0 AND cs EQ 1).
COMPUTE part = 2.
ELSE IF ANY(ce EQ 2 AND cs EQ 1, ce EQ 1 AND cs EQ 1, ce EQ 3 AND cs EQ 1).
COMPUTE part = 3.
END IF.
END REPEAT.
MISSING VALUES Participation_1 to Participation_30 (-999).
Rich Ulrich
2020-08-20 19:02:08 UTC
Permalink
On Wed, 19 Aug 2020 23:08:06 -0700 (PDT), Erin Holloway
Post by Erin Holloway
So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other variable names change the same way _1 to _30.
I assume its a loop? not sure how to write it though?
The variables are all numeric
DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
COMPUTE Participation_1 = $SYSMIS.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
COMPUTE Participation_1 = 0.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
COMPUTE Participation_1 = 1.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 2.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 3.
END IF.
EXECUTE.
I agree with Bruce, that you need parentheses.
Damned if I know how SPSS will parse a long string
of AND and OR statements,

But, instead of all that --
I suggest that you create a temporary variable so that
you have a logical table that combines Emp and Study,

#Temp= Study*3 + Emp, yields values in this table -
Emp 0 1 2 3
Study 0 0 1 2 3
Study 1 4 5 6 7

(That does depend on not-having values out of those ranges.)

Then recode,
RECODE #Temp(0=0), (1,2,4= 2)(else=3) INTO= partic.

Better, you could name the values explicitly so that 3,5,6,7 will be
partic=3, and use ELSE to define some version of MISSING.

This leaves the simple test for whether partic=0 should
remain 0 or be set to 1, depending on the value of Look.
*partic stays set of 0 if Look is 6 -- or anything but 9.

IF (partic=0) AND (Look = 9) partic=1.

Make sure of the MISSING (which the Recode of #Temp using
ELSE may have changed), and you are set.
--
Rich Ulrich
Rich Ulrich
2020-08-20 19:09:06 UTC
Permalink
- corrected version of my reply, with Study*4 in place of Study*3.

On Wed, 19 Aug 2020 23:08:06 -0700 (PDT), Erin Holloway
Post by Erin Holloway
So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other variable names change the same way _1 to _30.
I assume its a loop? not sure how to write it though?
The variables are all numeric
DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
COMPUTE Participation_1 = $SYSMIS.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
COMPUTE Participation_1 = 0.
ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
COMPUTE Participation_1 = 1.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 2.
ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
COMPUTE Participation_1 = 3.
END IF.
EXECUTE.
I agree with Bruce, that you need parentheses.
Damned if I know how SPSS will parse a long string
of AND and OR statements,

But, instead of all that --
I suggest that you create a temporary variable so that
you have a logical table that combines Emp and Study,

#Temp= Study*4 + Emp, yields values in this table -
Emp 0 1 2 3
Study 0 0 1 2 3
Study 1 4 5 6 7

(That does depend on not-having values out of those ranges.)

Then recode,
RECODE #Temp(0=0), (1,2,4= 2)(else=3) INTO= partic.

Better, you could name the values explicitly so that 3,5,6,7 will be
partic=3, and use ELSE to define some version of MISSING.

This leaves the simple test for whether partic=0 should
remain 0 or be set to 1, depending on the value of Look.
*partic stays set of 0 if Look is 6 -- or anything but 9.

IF (partic=0) AND (Look = 9) partic=1.

Make sure of the MISSING (which the Recode of #Temp using
ELSE may have changed), and you are set.
--
Rich Ulrich
Erin Holloway
2020-08-21 04:10:14 UTC
Permalink
Thanks to you both

@Bruce I have run the syntax, it runs fine...but coded 22.4% as '2' and the rest as either Missing -999 or system missing. No 0's 1's or 3's.
I cannot work out why. I will keep having a play with it.

@Rich, that makes sense to me, and I have created new variables that have values only in the required range for the temp table, but I am not familar with a logical table or how to create one?

Variables are:
CurrentlyEmployed (0,1,2,3)
CurrentlyStudying_status (0,1,2)
LookingForWork_status (0,1)
Erin Holloway
2020-08-21 04:34:18 UTC
Permalink
Found the error, I had messed up the calc for one of the variables.
So great! thank you
Erin Holloway
2020-08-21 05:17:50 UTC
Permalink
Thanks to you both

@Bruce I have run the syntax, I can get the syntax to run, but it is not coding the '3''s correctly. I assume its because of the ce=3 and cs=2 being included in the 'ANY' command and needs to be separated?

ce EQ 2 AND cs EQ 1 or
ce EQ 1 AND cs EQ 1 or
ce EQ 3 or
cs EQ 2.

DO REPEAT
cs = CurrentlyStudying_status_1 to CurrentlyStudying_status_10 /
ce = CurrentlyEmployed_1 to CurrentlyEmployed_10 /
lw = LookingForWork_status_1 to LookingForWork_status_10 /
part = Participation_1 to Participation_10.
DO IF NMISS(cs, ce, lw) EQ 3.
COMPUTE part = -999.
ELSE IF ce EQ 0 AND cs EQ 0 and lw EQ 0.
COMPUTE part = 0.
ELSE IF ce EQ 0 AND cs EQ 0 AND lw EQ 1.
COMPUTE part = 1.
ELSE IF ANY(ce EQ 2 AND cs EQ 0, ce EQ 1 AND cs EQ 0, ce EQ 0 AND cs EQ 1).
COMPUTE part = 2.

ELSE IF ANY(ce EQ 2 AND cs EQ 1, ce EQ 1 AND cs EQ 1, ce EQ 3, cs EQ 2).
COMPUTE part = 3.

END IF.
END REPEAT.
MISSING VALUES Participation_1 to Participation_10 (-999).


@Rich, that makes sense to me, and I have created new variables that have values only in the required range for the temp table, but I am not familar with a logical table or how to create one? Im keen to give it a go.

Variables are:
CurrentlyEmployed (0,1,2,3)
CurrentlyStudying_status (0,1,2)
LookingForWork_status (0,1)
Rich Ulrich
2020-08-21 18:11:31 UTC
Permalink
On Thu, 20 Aug 2020 22:17:50 -0700 (PDT), Erin Holloway
Post by Erin Holloway
@Rich, that makes sense to me, and I have created new variables that have values only in the required range for the temp table, but I am not familar with a logical table or how to create one? Im keen to give it a go.
What I wrote before --

#Temp= Study*4 + Emp, yields values in this table -
Emp 0 1 2 3
Study 0 0 1 2 3
Study 1 4 5 6 7

#Temp has all the combinations that you care about
except for the split needed for for (emp=0, Study=0) --

RECODE #Temp into a new var with desired codes.
Then split out the (0,0) instances according to the third variable.


By the way -- I see a simplification available in what you
posted for the solution with many IFs -

ELSE IF ce EQ 0 AND cs EQ 0 and lw EQ 0.
COMPUTE part = 0.
ELSE IF ce EQ 0 AND cs EQ 0 AND lw EQ 1.
COMPUTE part = 1.

Assuming "lw" is defined as 0 or 1 (always), that says:

ELSE IF ce EQ 0 AND cs EQ 0 .
COMPUTE part = lw .
--
Rich Ulrich
Bruce Weaver
2020-08-21 18:50:39 UTC
Permalink
Post by Erin Holloway
Thanks to you both
--- snip ---
Post by Erin Holloway
CurrentlyEmployed (0,1,2,3)
CurrentlyStudying_status (0,1,2)
LookingForWork_status (0,1)
Here is another variation on Rich's theme which *might* be a bit easier to follow. I start with a self-contained version to demonstrate the concept--hence the "toy" dataset.

* Generate a toy dataset with all combinations of those variables.
NEW FILE.
DATA LIST LIST / ce cs lw (3F1.0).
BEGIN DATA
0 0 0
0 0 1
0 0 .
0 1 0
0 1 1
0 1 .
0 2 0
0 2 1
0 2 .
0 . 0
0 . 1
0 . .
1 0 0
1 0 1
1 0 .
1 1 0
1 1 1
1 1 .
1 2 0
1 2 1
1 2 .
1 . 0
1 . 1
1 . .
2 0 0
2 0 1
2 0 .
2 1 0
2 1 1
2 1 .
2 2 0
2 2 1
2 2 .
2 . 0
2 . 1
2 . .
3 0 0
3 0 1
3 0 .
3 1 0
3 1 1
3 1 .
3 2 0
3 2 1
3 2 .
3 . 0
3 . 1
3 . .
. 0 0
. 0 1
. 0 .
. 1 0
. 1 1
. 1 .
. 2 0
. 2 1
. 2 .
. . 0
. . 1
. . .
END DATA.

FREQUENCIES ce cs lw.

* [1] Recode SYSMIS to 9, but do not (yet) treat it as missing.
* This will allow step 2 below to work properly.

RECODE ce cs lw (SYSMIS=9).
FREQUENCIES ce cs lw.

* [2] Combine the 3 variables into a single variable.

COMPUTE cecslw = ce*100 + cs*10 + lw.
FORMATS cecslw (F3.0).
FREQUENCIES cecslw.

* [3] RECODE cecslw into new variable Part.
* You'll have to fill in the value lists and values according to your rules.
RECODE cecslw
(999 = 999)
(value list = value)
(value list = value)
etc.
(value list = value) INTO Part.
FORMATS Part (F5.0).
MISSING VALUES Part (999) / ce cs lw (9).
FREQUENCIES Part.

* --------------------------------------.

Now here it is inserted in a DO-REPEAT structure that allows you to loop through your 30 variables. Notice that I have changed variable cecslw to #cecslw. The # makes it a scratch variable that will not be written to your dataset. This bit is untested, as I do not have your dataset. But I think it ought to work. But as before, you have to fill in the details of the RECODE command, because only you know the rules for that.


DO REPEAT
cs = CurrentlyStudying_status_1 to CurrentlyStudying_status_30 /
ce = CurrentlyEmployed_1 to CurrentlyEmployed_30 /
lw = LookingForWork_1 to LookingForWork_30 /
part = Participation_1 to Participation_30.

RECODE ce cs lw (SYSMIS=9).
COMPUTE #cecslw = ce*100 + cs*10 + lw.
RECODE #cecslw
(999 = 999)
(value list = value)
(value list = value)
etc.
(value list = value) INTO Part.
END REPEAT.
MISSING VALUES Participation_1 to Participation_30 (999).


HTH.
Rich Ulrich
2020-08-24 19:18:04 UTC
Permalink
On Fri, 21 Aug 2020 11:50:39 -0700 (PDT), Bruce Weaver
Post by Bruce Weaver
Post by Erin Holloway
Thanks to you both
--- snip ---
Post by Erin Holloway
CurrentlyEmployed (0,1,2,3)
CurrentlyStudying_status (0,1,2)
LookingForWork_status (0,1)
Here is another variation on Rich's theme which *might* be a bit easier to follow. I start with a self-contained version to demonstrate the concept--hence the "toy" dataset.
* Generate a toy dataset with all combinations of those variables.
NEW FILE.
DATA LIST LIST / ce cs lw (3F1.0).
BEGIN DATA
0 0 0
0 0 1
0 0 .
0 1 0
0 1 1
0 1 .
0 2 0
0 2 1
0 2 .
0 . 0
0 . 1
0 . .
1 0 0
1 0 1
1 0 .
1 1 0
1 1 1
1 1 .
1 2 0
1 2 1
1 2 .
1 . 0
1 . 1
1 . .
2 0 0
2 0 1
2 0 .
2 1 0
2 1 1
2 1 .
2 2 0
2 2 1
2 2 .
2 . 0
2 . 1
2 . .
3 0 0
3 0 1
3 0 .
3 1 0
3 1 1
3 1 .
3 2 0
3 2 1
3 2 .
3 . 0
3 . 1
3 . .
. 0 0
. 0 1
. 0 .
. 1 0
. 1 1
. 1 .
. 2 0
. 2 1
. 2 .
. . 0
. . 1
. . .
END DATA.
FREQUENCIES ce cs lw.
* [1] Recode SYSMIS to 9, but do not (yet) treat it as missing.
* This will allow step 2 below to work properly.
RECODE ce cs lw (SYSMIS=9).
FREQUENCIES ce cs lw.
* [2] Combine the 3 variables into a single variable.
COMPUTE cecslw = ce*100 + cs*10 + lw.
FORMATS cecslw (F3.0).
FREQUENCIES cecslw.
* [3] RECODE cecslw into new variable Part.
* You'll have to fill in the value lists and values according to your rules.
RECODE cecslw
(999 = 999)
(value list = value)
(value list = value)
etc.
(value list = value) INTO Part.
FORMATS Part (F5.0).
MISSING VALUES Part (999) / ce cs lw (9).
FREQUENCIES Part.
* --------------------------------------.
Now here it is inserted in a DO-REPEAT structure that allows you to loop through your 30 variables. Notice that I have changed variable cecslw to #cecslw. The # makes it a scratch variable that will not be written to your dataset. This bit is untested, as I do not have your dataset. But I think it ought to work. But as before, you have to fill in the details of the RECODE command, because only you know the rules for that.
DO REPEAT
cs = CurrentlyStudying_status_1 to CurrentlyStudying_status_30 /
ce = CurrentlyEmployed_1 to CurrentlyEmployed_30 /
lw = LookingForWork_1 to LookingForWork_30 /
part = Participation_1 to Participation_30.
RECODE ce cs lw (SYSMIS=9).
COMPUTE #cecslw = ce*100 + cs*10 + lw.
RECODE #cecslw
(999 = 999)
(value list = value)
(value list = value)
etc.
(value list = value) INTO Part.
END REPEAT.
MISSING VALUES Participation_1 to Participation_30 (999).
HTH.
Bruce,
You are right, it is more transparent to create the temporary
variable usling multiples of 10.

For this problem, the "lw" is used only for one decision, so I
would probably ignore it at first, and just ...
COMPUTE #cecs= ce*10 + cs.

That gives legal values of 0, 10, 20, 30, 1, 11, 21, 31, --
which are easy to interpret directly on reading. Any other values
must be bad data (presumably, no correction was not done earlier).

All in all, the question of what is "missing" and how they should be
dealt with is more taxing than the original classification problem.
--
Rich Ulrich
Erin Holloway
2020-09-02 23:24:39 UTC
Permalink
Alright, its working! I have re-calculated the variables so css and ces = 0,1,2 and lws = 0,1. So its clearer with the culculation which is which.
Thank you so so much. This has been driving me nuts for ages.
Erin
Bruce Weaver
2020-09-03 14:28:12 UTC
Permalink
Post by Erin Holloway
Alright, its working! I have re-calculated the variables so css and ces = 0,1,2 and lws = 0,1. So its clearer with the culculation which is which.
Thank you so so much. This has been driving me nuts for ages.
Erin
Good to hear. Please consider posting your code so that others who find this discussion later on can see the finished product.

Cheers,
Bruce
Erin Holloway
2020-09-03 22:16:43 UTC
Permalink
No worries. I did the 'Looking or work variable separately' as I couldnt make the ce*100 + cs*10 + lw work.
Thanks again.

DO REPEAT
css = CurrentlyStudying_status_1 to CurrentlyStudying_status_10 /
ces = CurrentlyEmployed_status_1 to CurrentlyEmployed_status_10 /
part = Participation_1 to Participation_10.
RECODE css ces (SYSMIS=-999).
COMPUTE #cecs = ces*10 + css.
RECODE #cecs
(0=0) (10=2) (1=2) (2=3) (11 thru Highest =3) INTO Part.
END REPEAT.

Loading...