Stratified Random Sample Syntax

Rich Ulrich

2020-02-19 19:18:46 UTC

Post by a***@gmail.com
Hi Rich,
I am trying to do something similar. I want to create a stratified random sample of 72 students who have an average of 60% full-time, and are also all Latino, Male, and receive financial aid.
Can you offer advice or a general code to accomplish this?

Post by Rich Ulrich
On Thu, 19 Apr 2018 16:05:23 -0700 (PDT),

Post by b***@sloperesearch.com
This post saved my life! I don't know if you're still on here, but if you are Rich I owe you big!

You're welcome.
The Giganews archive does not go back to 2000, but Google
shows me the thread. That was a clever solution by me.
--
Rich Ulrich

The example in 2000 had groups, not a continuous variable.
I suppose you can create work_pct ranges to create discrete groups,
and sample within those groups. Three groups, like (0, 1-99, 100)?
A further bunch of groups within the 1-99 range?

Having said that: I don't follow your precise problem.
Are you trying to sample FROM the 72 students, who all are
Latino, Male, and receive aid? Or pull out 72 cases with only
those characteristics?

"Average of 60% full-time" seems to be the only thing to
stratify by, if you are selecting 72 from a larger sample. So,
what fraction is that of the total? Or, what fraction do you
want?

The description I provided before seems to serve. Create a
randomized number; rank within this work_pct grouping;
select the appropriate fraction.

Google still shows me my post from 2000, which I copy here -

Post by a***@gmail.com
I have a 9,000+ employee file from which I want to draw a stratified
random sample based on race and gender. I want to draw about 900 cases
from the file that includes approximately the same percentage of White-
Males, White Females, Black Males, Black Females, Asian Males, Asian
Females, Hispanic Males, Hispanic Females, and Indian, etc., as there
are in the total workforforce.
Does anyone have systax for this? The only way I can see to do this is
through the menu where I'd have to select a particular gender-race
combo first, then subselect the appropriate workforce %, then write
these cases to a new file, delete them from the 9,000+ file, and start
again with a new gender-race combo.
Anyone have a simpler way? Or syntax file I can edit to my application?

< SET SEED for Random >
< compute to create a random variable RVAR >
Comment use Rank to create a var with % of cum-dist for RVAR.
RANK vars= RVAR by RACE,ETHNIC,SEX /RFRACTION into ORDVAR.
Comment select 10% of each subgroup, using new order-var.
SELECT IF ORDVAR LT .10.

--
Rich Ulrich