Skip past navigation to main part of page
 
Melbourne Institute Homepage
---

Weights

Cross–Sectional Weights

Wave 1

In wave 1, we essentially had a complex cross–sectional survey. The initial (or design) weights are derived from the probability of selecting the households into the sample. These household weights are initially adjusted according to information collected about all selected households (both responding and non–responding) and further adjusted so that weighted household estimates from the HILDA Survey match several known household–level benchmarks.

The person–level weights are based on the household–level weights, with adjustments made based on information collected about all the people listed in the responding households. These weights are also adjusted to ensure that the weighted person estimates match several known person–level benchmarks.

The benchmarks were reviewed for Release 4.0 and these changes have been carried over to later releases. The changes made to the weighting process include:

  • The household and enumerated person weights are determined at the same time (rather than sequentially as was done in earlier releases). This is known as integrated weighting. The weights are adjusted to the household benchmarks at the same time as they are adjusted to the enumerated person benchmarks. The household weight will be the same as the enumerated weight for each person in the household, resulting in identical estimates where the same concept can be determined from the two files.27.
  • Due to the demands placed on the weights through the integrated weighting process, some of the benchmarks used have been simplified. Also, following some concerns about the representativeness of the sample, some additional benchmarks on marital status and occupation have been included (based on the ABS Labour Force Survey).

In summary, the household benchmarks have been revised to:

  • Number of adults by number of children; and
  • State by part of State28.

The enumerated person benchmarks have been revised to:

  • Sex by broad age;
  • State by part of State;
  • Labour force status; and
  • Marital status29.

The responding person benchmarks were simplified in Release 6. They have been revised to:

  • State by part of State;
  • State by broad age;
  • State by labour force status;
  • Marital status; and
  • Occupation30.

The person benchmarks for State, part of State, sex and age are from the Estimated Residential Population figures produced by the ABS based on the 2001 Census, updated for births, deaths, immigration, emigration and interstate migration. The household benchmarks are now also based on the 2001 Census and are similarly updated from that time point.31 The remaining benchmarks come from the ABS Labour Force Survey.

From Release 5.0 onwards, the very remote parts of New South Wales, Queensland, South Australia, Western Australia and the Northern Territory have been excluded from the benchmarks, which is in line with the practice adopted in similar large-scale surveys run by the ABS.32 Information about the other aspects of the weighting procedure can be found in Watson and Fry (2002).

Wave 2 onwards

From wave 2 onwards, the ‘selection’ of the sample is dependent on the wave 1 responding sample and the household and individual attrition after waves 1. The cross–sectional weights for wave 2 onwards opportunistically include temporary members into the sample (i.e., those people who are part of the sample only because they currently live with a continuing sample member). The underlying probability of selection for these households is amended to account for the various pathways from wave 1 into the relevant wave household. Following this, non–response adjustments are made which require within–sample modelling of non–response probabilities and benchmarking to known population estimates at both the household and person level.

The benchmarks used in Release 4 and later have been amended as described above. Other aspects of the weighting process for wave 2 onwards are detailed in Watson (2004b).33

Longitudinal Weights

By comparison, the construction of the longitudinal weights is more straightforward and only include an adjustment for attrition and benchmarking back to wave 1 characteristics.

In Release 4 and later, the benchmarks for the longitudinal weights have been modified to mirror those used in the cross-section weights.

The longitudinal enumerated person benchmarks have been revised to:

  • Sex by broad age;
  • State by part of State;
  • Labour force status; and
  • Marital status34.

The longitudinal responding person benchmarks have been revised to mirror the Release 6 changes made to the cross–sectional responding person weights. The benchmarks are:

  • Sex by broad age;
  • State by part of State;
  • State by labour force status;
  • Marital status; and
  • Occupation35.

From Release 6, we have provided longitudinal weights for the balanced panel of responding persons or enumerated persons from every wave to every other wave and for the balanced panel of any combination of a pair of waves.36 These weights adjust for attrition from the initial wave and are benchmarked back to the key characteristics of the initial wave. For instance if you were interested in a panel of respondents from waves 2 through 6, the weight provided for this panel would adjust for attrition from the balanced panel from wave 2 to 6 and would ensure key characteristics of the wave 2 population are matched.

Other aspects of the longitudinal weights are described in Watson (2004b).

Replicate Weights

Replicate weights have been provided for users to calculate standard errors that take into account the complex sample design of the HILDA Survey. These weights can be used by the SAS GREGWT macro, the STATA ‘svy jackknife’ commands (more detail is provided below on Calculating Standard Errors), or you can write your own routine to use these weights. As of Release 6, weights for 45 replicate groups are provided.

Weights Provided on the Data Files

Table 36 below provides a list of the weights provided on the data files together with a description of those weights. The longitudinal weights provided on the enumerated and responding person files are the ones you are most likely to use, though other longitudinal weights are provided on the Longitudinal Weights File.

Irrespective of the changes made to the construction of the weights, some changes are expected to the weights with each new release. There are three reasons for this. Firstly, corrections may be made to age and sex variables when these are confirmed with individuals in subsequent wave interviews. Secondly, the benchmarks are updated from time to time. Thirdly, duplicate or excluded people in the sample may be identified after the release (very occasionally).

Table 36: Weights
File Weights Description
Household File _hhwth The household weight is the cross-section population weight for all households responding in the relevant wave. Note the sum of these household weights for wave 1 is approximately 7.4 million.
  _hhwths This is the cross-section household population weight rescaled to the sum of the sample size for the relevant wave (i.e. 7682 responding households in wave 1). Use this weight when the statistical package requires the weights to sum to the sample size.
  _hhwte01 to _hhwte16 The enumerated person weights are provided on both the household file and the enumerated person file. See description below.
  _rwh1 to _rwh45 Cross-section household population replicate weights.
Enumerated Person File _hhwte The enumerated person weight is the cross-section population weight for all people who are usual residents of the responding households in the relevant wave (this includes children, non-respondents and respondents). The sum of these enumerated person weights for wave 1 is 19.0 million.
  _hhwtes This is the cross-section enumerated person population weight rescaled to the sum of the sample size for the relevant wave (i.e. for wave 1, 19914 enumerated persons). Use this weight when the statistical package requires the weights to sum to the sample size.
  _lnwte This longitudinal enumerated person weight is the longitudinal population weight for all people who were enumerated (i.e. in responding households) each wave from wave 1 to the wave where this variable resides. This weight applies to children, non-respondents, intermittent respondents, and full respondents in responding households.
    blnwte is for the balanced panel of enumerated persons from wave 1 to 2; clnwte is for the balanced panel from wave 1 to 3; dlnwte is for the balanced panel from wave 1 to 4, etc.
    These variables are also on the Longitudinal Weights File, but are named differently: wlea_b; wlea_c; wlea_d, etc. We expect to drop _lnwte in future.
  _rwe1 to _rwe45 Cross-section enumerated person population replicate weights.
  _rwlne1 to _rwlne45 Longitudinal enumerated person population replicate weights
Responding Person File _hhwtrp The responding person weight is the cross-section population weight for all people who responded in the relevant wave (i.e. they provided a personal interview). The sum of these responding person weights for wave 1 is 15.0 million.
  _hhwtrps This is the cross-section responding person population weight rescaled to sum to the number of responding persons in the relevant wave (i.e. 13,969 in wave 1). Use this weight when the statistical package requires the sum of the weights to be the sample size.
  _lnwtrp This longitudinal responding person weight is the longitudinal population weight for all people responding (i.e. provided an interview) each wave from wave 1 to the wave where this variable resides.
    blnwtrp is for the balanced panel of respondents from wave 1 to 2; clnwtrp is for the balanced panel from wave 1 to 3; dlnwtrp is for the balanced panel from wave 1 to 4, etc.
    These variables are also on the Longitudinal Weights File, but are named differently: wlra_b; wlra_c; wlra_d, etc. We expect to drop _lnwtrp in future.
  _rwrp1 to _rwrp45 Cross-sectional responding person population replicate weights
  _rwlnr1 to _rwlnr45 Longitudinal responding person population replicate weights.
Longitudinal Weights File* wlet1_tn Longitudinal enumerated person weight for the balanced panel of all people who were enumerated (i.e. part of a responding household) each wave from wave t1 to tn. Wave letters are used in place to t1 and tn. For example, wlec_f is the longitunal enumerated person weight for the balanced panel from wave 3 to 6.
  wlet1tn Longitudinal enumerated person weight for the balanced panel of all people who were enumerated (i.e. part of a responding household) in wave t1 and tn. Wave letters are used in place to t1 and tn. The paired longitudinal weights do not restrict individuals in any way based on their response status in waves between t1 and tn. For example, wlecf is the longitunal enumerated person weight for the balanced panel of enumerated people in wave 3 and 6 (they may or may not have been enumerated in other waves).
  wlrt1_tn Longitudinal responding person weight for the balanced panel of all people who were interviewed each wave from wave t1 to tn. Wave letters are used in place to t1 and tn. For example, wlrc_f is the longitunal responding person weight for the balanced panel of respondents from wave 3 to 6.
  wlrt1tn Longitudinal responding person weight for the balanced panel of all people who were interviewed in wave t1 and tn. Wave letters are used in place to t1 and tn. The paired longitudinal weights do not restrict individuals in any way based on their response status in waves between t1 and tn. For example, wlrcf is the longitunal responding person weight for the balanced panel of respondents in wave 3 and 6 (they may or may not have been responding in other waves).
* Replicate weights for the weights provided on the Longitudinal Weights File are available on request. Email hilda-inquiries@unimelb.edu.au.

 

Advice on Using Weights

Which Weight to Use

For some users, the array of weights on the dataset may seem confusing. This section provides examples of when it would be appropriate to use the different types of weights.

If you want to make inferences about the Australian population from frequencies or cross–tabulations of the HILDA sample then you will need to use weights. If you are only using information collected during the wave 4 interviews (either at the household level or person level) then you would use the wave 4 cross–section weights. Similarly, if you are only using wave 3 information, then you would use the wave 3 cross–section weights, and so on. If you want to infer how people have changed across the five years between waves 1 and 6, then you would use the longitudinal weights for waves 1 through 6.

The following five examples show how the various weights may be used to answer questions about the population:

  • What proportion of households rent in 2006? We would use the cross–section household weight for wave 6 and obtain a weighted estimate of proportion of households that were renting as at the time of interview.
  • How many people live in poor households in 2002? We are interested in the number of individuals with a certain household characteristic, such as having low equivalised household incomes. We would use the cross-section enumerated person weight for wave 2 and count the number of enumerated people in households with poorest 10 per cent of equivalised household incomes. (We do not need to restrict our attention to responding persons only as total household incomes are available for all households after the imputation process. We also want to include children in this analysis and not just limit our analysis to those aged 15 year or older.)
  • What is the average salary of professionals in 2003? This is a question that can only be answered from the responding person file using the cross–section responding person weight for wave 3. We would identify those reportedly working in professional occupations and take the weighted average of their wages and salaries.
  • For how many years have people been poor between 2001 and 2006? We might define the ‘poorest’ 10 per cent of households as having the lowest equivalised household incomes in each wave. We could then calculated how many years people were poor between wave 1 and wave 6, and apply the longitudinal enumerated person weight (flnwte or equivalently wlea_f) for those people enumerated every wave between wave 1 and 6.
  • What proportion of people have changed their employment status between 2002 and 2006? This question can only be answered by considering the responding persons in both waves. We would use the longitudinal responding person weight for the pair of waves extracted from the Longitudinal Weight File (wlrbf) and construct a weighted cross–tabulation of the employment status of respondents in wave 2 against the employment status of respondents in wave 6.

When constructing regression models, the researcher needs to be aware of the sample design and non–response issues underlying the data and will need to take account of this in some way.

Calculating Standard Errors

The HILDA survey has a complex survey design that needs to be taken into account when calculating standard errors. It is:

  • clustered – 488 areas were originally selected from which households were chosen and people are clustered within households;
  • stratified – the 488 areas were selected from a frame of areas stratified by State and part of State; and
  • unequally weighted – the households and individuals have unequal weights due to some irregularities in the selection of the sample in wave 1 and the non–random non–response in wave 1 and the non–random attrition in waves 2 to 4.

Some options available for the calculation of appropriate standard errors and confidence intervals include:

  • Standard Error Tables – Based on the wave 1 data, approximate standard errors have been constructed for a range of estimates (see Horn (2004)). Similar tables for wave 2 to 4 have not been produced.
  • Use of the SPSS Release 12 add-on module "SPSS Complex Samples&quuot;. The add–on module produces standard errors via the Taylor Series approximation. SPSS does not have a built in feature to handle replicates weights.
  • Use of SAS procedures SURVEYMEANS, SURVEYREG, SURVEYFREQ and SURVEYLOGISTIC (the last two only in version 9 onwards). The SAS procedures produce standard errors via the Taylor Series approximation. SAS does not have a built in feature to handle replicates weights.
  • Use of GREGWT macro in SAS – Some users within FaHCSIA, ABS and other organisations may have access to the GREGWT macro that can be used to construct various population estimates. The macro uses the jackknife method to estimate standard errors using the replicate weights.
  • Use of ‘svy’ commands in STATA – Stata has a set of survey commands that deal with complex survey designs. Using the ‘svyset’ commands, the clustering, stratification and weights can be assigned. You can request the standard errors be calculated using the Jackknife method using ‘svy jackknife’ and the replicate weights. Various statistical procedures are available within the suite of ‘svy’ commands including means, proportions, tabulations, linear regression, logistic regression, probit models and a number of other commands.

A User Guide for calculating the standard errors in HILDA is provided as part of our technical paper series, see Hayes (2008). Example code is provided in SAS, SPSS and STATA.

To assist you in the calculation of appropriate standard errors, the wave 1 area (cluster), and proxy stratification variables have been included on the master file. These are listed in Table 37 and need to be specified for the SPSS, SAS and Stata Taylor Series approximation standard error calculations suggested above. Any new entrants to the household are assigned to the same sample design information as the permanent sample member. As of Release 6 the proxy stratification variable (ahhstrat) has replaced major statistical region (ahhmsr) on the master file as the variable to be used in the Taylor Series approximation method. The new stratification variable is essentially a collapsed area unit variable that approximates the effect of both the systematic selection and stratification of the survey selection better than only using the variable for the major statistical region.

Table 37: Sample design variables
Variable Description Design element
AHHRAID DV: randomised area id Cluster
AHHSTRAT DV: Wave 1 Strata Proxy stratification

 

Also, a few users may be interested in the sample design weight in wave 1 before any benchmark or non–response adjustments have been made. This is available on the household file as ahhwtdsn.


Endnotes:

27 For example, the number of people living in a household with two people can be derived by two methods. Firstly, this can be calculated from the household file by estimating the number of two person households and multiplying by two. Secondly, it can be estimated from the enumerated file by summing the weights of people living in two person households. Back to where you were
28 Prior to Release 4, the household benchmarks were number of adults by number of children by broad geography and State by part of State (the bolded text indicates what has been dropped). Back to where you were
29 Prior to Release 4, the enumerated person benchmarks were State by part of State by sex by broad age, and State by part of State by labour force status (the bolded text indicates what has been dropped, but note that State by part of State is now included as a separate benchmark). The marital status benchmark has been added from Release 4. Back to where you were
30 Prior to Release 6, the responding person benchmarks were State by part of State by sex by broad age; State by part of State by labour force status; marital status by broad age; and occupation by broad geography (the bolded text indicates what has been dropped, but note that State by part of State is now included as a separate benchmark). From Release 4 the marital status and occupation benchmarks have been included. Back to where you were
31 Prior to Release 5, only household estimates based on the 1996 Census were available. Back to where you were
32 Prior to Release 5, only the sparsely settled parts of the Northern Territory were excluded. Back to where you were
33 While this paper is written in relation to the wave 2 weighting, the process in later waves follows the same methodology. Back to where you were
34 Prior to Release 4, the longitudinal enumerated person benchmarks were State by part of State by sex by age broad, and State by part of State by labour force status (the bolded text indicates what has been dropped, but note that State by part of State is now included as a separate benchmark). The marital status benchmark has been added from Release 4 onwards. Back to where you were
35 Prior to Release 6, the longitudinal responding person benchmarks were State by part of State by sex by broad age; State by part of State by labour force status; marital status by broad age; and occupation by broad geography (the bolded text indicates what has been dropped, but note that State by part of State is now included as a separate benchmark). From Release 4 the marital status and occupation benchmarks have been included. Back to where you were
36 Prior to Release 6, weights were only provided for the balanced panel of respondents or enumerated persons from wave 1 to every other wave. Back to where you were

 

top of pagetop of page

HILDA Contact us

Contact the University : Disclaimer & Copyright : Privacy : Accessibility