proc sql;

create table newdataset as

select *, count(*) as cty

from olddata

group by gvkey

having cty > 6;

quit;

* In this data set, the two ID variables are GVKEY and YEAR. I want to count the number of rows

of data for each value of GVKEY and only keep those GVKEYs that have at least 7 rows of data.

This code takes the existing data set named OLDDATA and creates a new dataset named NEWDATATSET

that has in it all the variables in OLDDATA (that is what select * does) *and* it creates a

variable named CTY that is the count of the number of rows of data and names it CTY.

It does this counting for each value of the ID GVKEY the number of rows of data. Finally, it keeps in

the NEWDATASET only those GVKEYs that have at least 7 rows of data.