Discussion:
Finding a number in a string in SPSS
(too old to reply)
e***@gmail.com
2020-01-31 03:56:37 UTC
Permalink
I have a two string variables (DiagnosisPrimary & DiagnosisOther) that have diagnosis coded into numerical values and delineated with '~' .
So person one was diagnosed with '20' on their first visit and then nothing on their second visit.

DiagnosisPrimary
20~
12~~~
20~20~
12~12~
12~~
~~~~~
12~12

If 20 represents having been diagnoised with depression, I want to see if there is a '20' among the string. Basically, find out who was diagnosed with dpression at any of their visits. In the past I have used the formula below, but now the data is a mix of numerical and text it is no longer working.

COMPUTE Dx_Depression =ANY(20, DiagnosisPrimary) OR ANY(20, DiagnosisOther).

I want to create a new variable with a true/false response.

Any help much appreicated!
Rich Ulrich
2020-01-31 05:59:31 UTC
Permalink
Post by e***@gmail.com
I have a two string variables (DiagnosisPrimary & DiagnosisOther) that have diagnosis coded into numerical values and delineated with '~' .
So person one was diagnosed with '20' on their first visit and then nothing on their second visit.
DiagnosisPrimary
20~
12~~~
20~20~
12~12~
12~~
~~~~~
12~12
If 20 represents having been diagnoised with depression, I want to see if there is a '20' among the string. Basically, find out who was diagnosed with dpression at any of their visits. In the past I have used the formula below, but now the data is a mix of numerical and text it is no longer working.
COMPUTE Dx_Depression =ANY(20, DiagnosisPrimary) OR ANY(20, DiagnosisOther).
Why not two logical comparisons where you use ANY( ) ?
COMPUTE Dx_Depression=
(20 eq DiagnosisPrimary) or (20 eq DiagnosisOther).

The original also could be simplified by writing
COMPUTE Dx_Depression = ANY(20, DiagnosisPrimary, DiagnosisOther).

ANY( ) tests if the first item matches whatever follows.
One value followed by several variables is common.
One variable followed by several values is common.

I've never had cause to check other possibilities, and I
don't remember if the documentation says anything.
Post by e***@gmail.com
I want to create a new variable with a true/false response.
Any help much appreicated!
It sounds like the only difference is that you now have
strings instead of numbers.

So you would write ANY("20", ...) .
--
Rich Ulrich
e***@gmail.com
2020-02-06 21:58:37 UTC
Permalink
Thanks Rich,
Tried:
COMPUTE Dx_Depression =ANY("20", DiagnosisPrimary, DiagnosisOther).

It did run fine, but did not find the 20's from within the string.

Any other tricks? Do you think I will need to remove the "~"?

Erin
e***@gmail.com
2020-02-06 22:02:35 UTC
Permalink
This also works, however it will take far to long to run on the big data set...

COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or
(CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or
(CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
(CHAR.INDEX(DiagnosisOther, '~20~') >0) or
(CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or
(CHAR.INDEX(DiagnosisOther, '20~')=1).
Bruce Weaver
2020-02-06 22:41:55 UTC
Permalink
Post by e***@gmail.com
This also works, however it will take far to long to run on the big data set...
COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or
(CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or
(CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
(CHAR.INDEX(DiagnosisOther, '~20~') >0) or
(CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or
(CHAR.INDEX(DiagnosisOther, '20~')=1).
Does this work?

COMPUTE Dx_Depression = SUM(CHAR.INDEX(DiagnosisPrimary,'20'),CHAR.INDEX(DiagnosisOther,'20')) GT 0.
FORMATS Dx_Depression (F1).
FREQUENCIES Dx_Depression.

Or are there cases you have not shown us where '20' appears in ways that do not indicate depression. E.g., are there codes like ~201~ or ~120~ that would louse things up? If so, how about this variation on the theme?

STRING NewDx (A50).
COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~").
COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
FORMATS Dx_Depression (F1).
FREQUENCIES Dx_Depression.

When you are done with variable NewDx, delete it if you wish.

HTH.
e***@gmail.com
2020-02-07 00:04:11 UTC
Permalink
Ah thats great Bruce!
Also cutes down the code I need for the other diagnoses.
Thanks all
Post by Bruce Weaver
Post by e***@gmail.com
This also works, however it will take far to long to run on the big data set...
COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or
(CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or
(CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
(CHAR.INDEX(DiagnosisOther, '~20~') >0) or
(CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or
(CHAR.INDEX(DiagnosisOther, '20~')=1).
Does this work?
COMPUTE Dx_Depression = SUM(CHAR.INDEX(DiagnosisPrimary,'20'),CHAR.INDEX(DiagnosisOther,'20')) GT 0.
FORMATS Dx_Depression (F1).
FREQUENCIES Dx_Depression.
Or are there cases you have not shown us where '20' appears in ways that do not indicate depression. E.g., are there codes like ~201~ or ~120~ that would louse things up? If so, how about this variation on the theme?
STRING NewDx (A50).
COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~").
COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
FORMATS Dx_Depression (F1).
FREQUENCIES Dx_Depression.
When you are done with variable NewDx, delete it if you wish.
HTH.
Rich Ulrich
2020-02-07 06:27:46 UTC
Permalink
On Thu, 6 Feb 2020 14:41:55 -0800 (PST), Bruce Weaver
Post by Bruce Weaver
Post by e***@gmail.com
This also works, however it will take far to long to run on the big data set...
COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or
(CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or
(CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
(CHAR.INDEX(DiagnosisOther, '~20~') >0) or
(CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or
(CHAR.INDEX(DiagnosisOther, '20~')=1).
Does this work?
COMPUTE Dx_Depression = SUM(CHAR.INDEX(DiagnosisPrimary,'20'),CHAR.INDEX(DiagnosisOther,'20')) GT 0.
FORMATS Dx_Depression (F1).
FREQUENCIES Dx_Depression.
Or are there cases you have not shown us where '20' appears in ways that do not indicate depression. E.g., are there codes like ~201~ or ~120~ that would louse things up? If so, how about this variation on the theme?
I think I was distracted by the strange use of ANY( ) , so I
failed to respond to the example of the STRINGs to be searched.
- Sorry -
Post by Bruce Weaver
STRING NewDx (A50).
COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~").
COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
FORMATS Dx_Depression (F1).
FREQUENCIES Dx_Depression.
When you are done with variable NewDx, delete it if you wish.
I like this solution because I figure that concatenating and
using one CHAR.INDEX( ) search is both easier to read and
faster to execute that doing two searches.

I'd probably use the name TempDx for NewDx, to reduce
the chance that it gets included in other places.
--
Rich Ulrich
Bruce Weaver
2020-02-07 19:46:29 UTC
Permalink
On Friday, February 7, 2020 at 1:27:52 AM UTC-5, Rich Ulrich wrote:

--- snip ---
Post by Rich Ulrich
Post by Bruce Weaver
STRING NewDx (A50).
COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~").
COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
FORMATS Dx_Depression (F1).
FREQUENCIES Dx_Depression.
When you are done with variable NewDx, delete it if you wish.
I like this solution because I figure that concatenating and
using one CHAR.INDEX( ) search is both easier to read and
faster to execute that doing two searches.
Good point about speed, Rich. I suspect you're right.

Loading...