Aggregate and count by date syntax help

Discussion:

(too old to reply)

e***@lakeheadu.ca

2019-06-03 20:26:55 UTC

I have a set of data and am trying to assign a count by chronological date, grouped by ID.

For example this data set:

ID1 02/01/2018
ID1 02/10/2018
ID1 04/04/2017

ID2 01/10/2015
ID2 03/04/2016

Would give me a new variable like this:

ID1 02/01/2018 2
ID1 02/10/2018 3
ID1 04/04/2017 1

ID2 01/10/2015 1
ID2 03/04/2016 2

I am having a hard time figuring out the syntax for this. I think it's Aggregate but I'm not sure the rest.

Any help is greatly appreciated!

Bruce Weaver

2019-06-03 20:59:08 UTC

Permalink

Post by e***@lakeheadu.ca
I have a set of data and am trying to assign a count by chronological date, grouped by ID.
ID1 02/01/2018
ID1 02/10/2018
ID1 04/04/2017
ID2 01/10/2015
ID2 03/04/2016
ID1 02/01/2018 2
ID1 02/10/2018 3
ID1 04/04/2017 1
ID2 01/10/2015 1
ID2 03/04/2016 2
I am having a hard time figuring out the syntax for this. I think it's Aggregate but I'm not sure the rest.
Any help is greatly appreciated!

Something like this (untested), perhaps? Change the variable names as needed.

AGGREGATE
/BREAK=IDvar DATEvar
/N = NU.

For more info:

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_25.0.0/statistics_reference_project_ddita/spss/base/syn_aggregate_functions.html

Notice this bullet point re N and NU functions:

- The N and NU functions do not require arguments. Without arguments, they return the number of weighted and unweighted valid cases in a break group. If you supply a variable list, they return the number of weighted and unweighted valid cases for the variables specified.

HTH.

Rich Ulrich

2019-06-04 17:26:48 UTC

Permalink

On Mon, 3 Jun 2019 13:59:08 -0700 (PDT), Bruce Weaver

Post by Bruce Weaver

Something like this (untested), perhaps? Change the variable names as needed.
AGGREGATE
/BREAK=IDvar DATEvar
/N = NU.
https://www.ibm.com/support/knowledgecenter/en/SSLVMB_25.0.0/statistics_reference_project_ddita/spss/base/syn_aggregate_functions.html
- The N and NU functions do not require arguments. Without arguments, they return the number of weighted and unweighted valid cases in a break group. If you supply a variable list, they return the number of weighted and unweighted valid cases for the variables specified.
HTH.

The question by itself says "count" which suggests N( ).
I read the example differently. Isn't it asking for this?

RANK DATEvar by IDvar into Session.

--
Rich Ulrich

Bruce Weaver

2019-06-04 21:20:35 UTC

Permalink

On Tuesday, June 4, 2019 at 1:26:53 PM UTC-4, Rich Ulrich wrote:

--- snip ---

Post by Rich Ulrich
I read the example differently. Isn't it asking for this?
RANK DATEvar by IDvar into Session.
--
Rich Ulrich

Right you are, Rich--although INTO doesn't appear to work with RANK.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / ID (F1) Date(date) x(F1).
BEGIN DATA
1 02/01/2018 2
1 02/10/2018 3
1 04/04/2017 1
2 01/10/2015 1
2 03/04/2016 2
END DATA.

RANK VARIABLES=Date (A) BY ID.
RENAME VARIABLES (Rdate=Session).
FORMATS Session(F2.0).
LIST.
* x = desired result, Session = obtained result.

OUTPUT from LIST:

ID Date x Session

1 02-JAN-2018 2 2
1 02-OCT-2018 3 3
1 04-APR-2017 1 1
2 01-OCT-2015 1 1
2 03-APR-2016 2 2

Number of cases read: 5 Number of cases listed: 5

Elyse Cottrell-Martin

2019-06-04 23:02:04 UTC

Permalink

Rank worked for what I needed. Thank you both so much.

Rich Ulrich

2019-06-06 04:23:14 UTC

Permalink

On Tue, 4 Jun 2019 14:20:35 -0700 (PDT), Bruce Weaver

Post by Bruce Weaver
--- snip ---

Post by Rich Ulrich
I read the example differently. Isn't it asking for this?
RANK DATEvar by IDvar into Session.
--
Rich Ulrich

Right you are, Rich--although INTO doesn't appear to work with RANK.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / ID (F1) Date(date) x(F1).
BEGIN DATA
1 02/01/2018 2
1 02/10/2018 3
1 04/04/2017 1
2 01/10/2015 1
2 03/04/2016 2
END DATA.
RANK VARIABLES=Date (A) BY ID.
RENAME VARIABLES (Rdate=Session).
FORMATS Session(F2.0).
LIST.
* x = desired result, Session = obtained result.
ID Date x Session
1 02-JAN-2018 2 2
1 02-OCT-2018 3 3
1 04-APR-2017 1 1
2 01-OCT-2015 1 1
2 03-APR-2016 2 2
Number of cases read: 5 Number of cases listed: 5

Well, INTO is available. I got INTO from the syntax manual,
but I didn't go far enough to get reminded of the complication.

The proper syntax seems to be,

RANK Datevar by IDvar/ rank into Session.

The second "rank" -- "rank into" -- is parsed as
<function> INTO

where a set of functions is available, as documented at
https://www.ibm.com/support/knowledgecenter/en/SSLVMB_24.0.0/spss/base/syn_rank_function.html#syn_rank_function

"The functions assign default names to the new variables unless
keyword INTO is specified."

--
Rich Ulrich

Bruce Weaver

2019-06-06 14:54:44 UTC

Permalink

Post by Rich Ulrich
On Tue, 4 Jun 2019 14:20:35 -0700 (PDT), Bruce Weaver

Post by Bruce Weaver
--- snip ---

Post by Rich Ulrich
I read the example differently. Isn't it asking for this?
RANK DATEvar by IDvar into Session.
--
Rich Ulrich

Right you are, Rich--although INTO doesn't appear to work with RANK.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / ID (F1) Date(date) x(F1).
BEGIN DATA
1 02/01/2018 2
1 02/10/2018 3
1 04/04/2017 1
2 01/10/2015 1
2 03/04/2016 2
END DATA.
RANK VARIABLES=Date (A) BY ID.
RENAME VARIABLES (Rdate=Session).
FORMATS Session(F2.0).
LIST.
* x = desired result, Session = obtained result.
ID Date x Session
1 02-JAN-2018 2 2
1 02-OCT-2018 3 3
1 04-APR-2017 1 1
2 01-OCT-2015 1 1
2 03-APR-2016 2 2
Number of cases read: 5 Number of cases listed: 5

Well, INTO is available. I got INTO from the syntax manual,
but I didn't go far enough to get reminded of the complication.
The proper syntax seems to be,
RANK Datevar by IDvar/ rank into Session.
The second "rank" -- "rank into" -- is parsed as
<function> INTO
where a set of functions is available, as documented at
https://www.ibm.com/support/knowledgecenter/en/SSLVMB_24.0.0/spss/base/syn_rank_function.html#syn_rank_function
"The functions assign default names to the new variables unless
keyword INTO is specified."
--
Rich Ulrich

Well done, Rich. That does it. Here is the revised syntax for the complete example.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / ID (F1) Date(date) x(F1).
BEGIN DATA
1 02/01/2018 2
1 02/10/2018 3
1 04/04/2017 1
2 01/10/2015 1
2 03/04/2016 2
END DATA.

RANK VARIABLES=Date (A) BY ID /RANK INTO Session.
FORMATS Session(F2.0).
LIST.

Output from LIST:

ID Date x Session

1 02-JAN-2018 2 2
1 02-OCT-2018 3 3
1 04-APR-2017 1 1
2 01-OCT-2015 1 1
2 03-APR-2016 2 2

Number of cases read: 5 Number of cases listed: 5