Estimating yearly trends for the Comprehensive File

Francisco Perez Arce
May 6, 2024
6 min read

Updated: May 23, 2024

Estimating yearly trends for variables in the Comprehensive File

Francisco Perez-Arce

(Center for Economic and Social Research, University of Southern California)

This note describes how to produce nationally representative statistics by year using the UAS Comprehensive File and the newly created CF-yearly weight file. The file can be accessed in the UAS file [ADD LINK]

OVERVIEW

Since 2023, the UAS team added a new datafile containing weights that can be used to create nationally representative values for the core surveys in the UAS.

As the reader likely knows, the UAS core surveys are fielded on a rolling basis for periods that span approximately two years. Hence, the standard weights produced for those surveys (and the ones produced for the Comprehensive File (CF)) can be used to calculate estimates of population quantities for a 2-year period.

DESCRIPTION

Respondents answer the core surveys in the Comprehensive File on a rolling basis. Respondents are invited to complete new versions of the surveys approximately two years from the time of invitation to the prior round. Because panel members join the panel at different times, responses to the core surveys are added continually. Hence, in any given year there will be new observations for about half the sample, which makes it possible to produce nationally representative statistics per calendar year.

The Comprehensive File includes wave-specific weights (r12final_weight r13final_weight r14final_weight) that allow the sample to be representative of the reference population along several demographic dimensions. These dimensions include gender, race/ethnicity, age, education, household size, household income, and census region and urban/rural characteristics of the area of residence. You can find a complete description of the UAS weighting procedure here. These weights, however, are designed to allow the production of nationally-representatives estimates for the (approximately) two-year period that each wave spans.

Since 2023, the UAS also includes weights specifically created to produce nationally representative statistics of certain variables for every calendar year. Details on how these weights are computed can be found here.

The yearly weights are constructed separately for each “topic” in the Comprehensive File. There are separate weight variables for each topic because different panelists respond to different surveys each calendar year. For instance, a respondent may have answered the “What do people know about Social Security” survey (the source for the k-prefix variables) in 2015 and 2017 but answered the HRS surveys (the basis for the r-prefix variables) in 2016 and 2018. Hence, that respondent will have a non-zero weight for kweight in 2015 and 2017 and a non-zero rweight in 2016 and 2018.

This guide intends to help users who want to quickly produce estimates at the national level per calendar year. For instance, suppose a researcher seeks to build a graph like Figure 1 below.

Figure 1. Example of a graph tracking average knowledge of Social Security

Expert researchers may want to follow their preferred way to program this. But the steps below provide an easy-to-use guide when using STATA software, and the provided code (hyperlink here)

• Step 1. Select the variables you want to use to produce the statistics by calendar year.

• Step 2. Organize them by topic (prefix). Add them in lists as indicated in the do-file.

• Step 3. Change the directory where you have the Comprehensive File saved

• Step 4. Run the do-file

The do-file will create a dataset that looks like this:

UASID Year r_iearn k_KS_ssret_comp rweight kweight

1 2016 #### . #### .

1 2017 . ##### . #####

1 2018 #### . #### .

2 2016 . #### . #####

2 2017 ##### . ##### .

2 2018 , #### , #####

Note that the variable prefix no longer has the wave prefix since the file is now in long-form. That is, each row represents a year for each uasid. Also, note that there are now separate weights for each topic (in this case, k and r).

Now you can use this dataset to estimate the statistics you want. For instance, suppose you want to graph the mean of earnings or social security knowledge score, for which you need the weighted average of those variables by year. In that case, you could type:

• for Social Security knowledge score:

collapse (mean) k_KS_ssret_comp (semean) seKS_ssret_comp=k_KS_ssret_comp [aweight=kweight], by(year)

And then follow with your preferred graph type (for Figure 1, I used: twoway line KS_ssret_comp year, lcolor(black) lwid(thick) lpat(solid)||line uc year, lcolor(red) lpat(dash) lwid(dash)||line lc year, lcolor(red) lpat(dash) lwid(dash) graphregion(color(white))

for earnings:

collapse (mean) iearn (semean) seiearn=iearn [aweight=rweight], by(year)

Example Stata Code

See [INSERT LINK] to download in .do format

set more off

capture log close

graph drop _all

global counter=0

cd "/Users/perezarc/Dropbox/projects/SSA projects/visualization/"

*Francisco Perez-Arce

*January 25, 2023

***Uses yearly-weight file to produce by-year estimates for Comprehensive File variables

*As an example, the do-file produces two graphs ,for Social Sercurity knowledge socres (KS_ssret_comp) and for earnigns (iearn)

*Step 2. Add the variables of interest below

global kVars KS_ssret_comp

global rVars iearn

global iVars

global fVars

global pVars

global wVars

global aVars

global nVars

global vVars

/*TBA

global dVars

*Step 3:

*Choose the directory.

*Data folder should contain the Comprehensive File and the weights-by-year data file (cf_weights_byyear_wide.dta

global data "/Users/perezarc/Dropbox/datasets"

log using "$analytic/graph_expamples", replace

use "$data/cf_weights_byyear_October2022.dta", clear

reshape wide *in *weight, i(uasid) j(year)

tempfile weights_wide

save `weights_wide'

use "$data/uas_comprehensive922.dta", clear

joinby uasid using `weights_wide', unmatched(master)

clonevar k12in=inuas16

clonevar k13in=inuas94

clonevar k14in=inuas231

gen k12year=year(r12_uas16iwbeg)

gen k13year=year(r13_uas94iwbeg)

gen k14year=year(r14_uas231iwbeg)

clonevar i12in=inuas26

clonevar i13in=inuas113

clonevar i14in=inuas238

gen i12year=year(r12_uas26iwbeg)

gen i13year=year(r13_uas113iwbeg)

gen i14year=year(r14_uas238iwbeg)

clonevar f12in=inuas18

clonevar f13in=inuas119

clonevar f14in=inuas239

gen f12year=year(r12_uas18iwbeg)

gen f13year=year(r13_uas119iwbeg)

gen f14year=year(r14_uas239iwbeg)

clonevar n12in=inuas42

clonevar n13in=inuas83

clonevar n14in=inuas292

gen n12year=year(r12_uas42iwbeg)

gen n13year=year(r13_uas83iwbeg)

gen n14year=year(r14_uas292iwbeg)

clonevar v12in=inuas43

clonevar v13in=inuas84

clonevar v14in=inuas293

gen v12year=year(r12_uas43iwbeg)

gen v13year=year(r13_uas84iwbeg)

gen v14year=year(r14_uas293iwbeg)

clonevar a12in=inuas44

clonevar a13in=inuas85

clonevar a14in=inuas294

gen a12year=year(r12_uas44iwbeg)

gen a13year=year(r13_uas85iwbeg)

gen a14year=year(r14_uas294iwbeg)

clonevar p12in=inuas1

clonevar p13in=inuas121

clonevar p14in=inuas237

gen p12year=year(r12_uas1iwbeg)

gen p13year=year(r13_uas121iwbeg)

gen p14year=year(r14_uas237iwbeg)

clonevar w12in=inuas2

clonevar w13in=inuas121

clonevar w14in=inuas237

gen w12year=year(r12_uas2iwbeg)

gen w13year=year(r13_uas121iwbeg)

gen w14year=year(r14_uas237iwbeg)

*topic c only has two rounds so far

clonevar c12in=inuas38

clonevar c13in=inuas177

gen c14in=.

gen c12year=year(r12_uas38iwbeg)

gen c13year=year(r13_uas177iwbeg)

gen c14year=.

*topic r only has two rounds so far, but will change with 09/2021 release I believe

clonevar r12in=inuas20

clonevar r13in=inuas95

clonevar r14in=inuas185

gen r12year=year(r12_uas20iwbeg)

gen r13year=year(r13_uas95iwbeg)

gen r14year=year(r14_uas185iwbeg)

********The following lines create the variable by years.

*Uncomment the relevant routines

*Run only the ones for which you are specificying variables

foreach var in $kVars {