Skip past navigation to main part of page
 
Melbourne Institute Homepage
---

Income Variables and Income Imputation

Income, Tax and Family Benefits Model

Figure 19, Figure 20 and Figure 21 show how the numerous income questions in the Person Questionnaire are combined together to form several financial year income components and a windfall income component on the responding person file, enumerated person file and household file respectively. The Family Tax Benefit and Maternity Allowance are calculated on the interim income to produce a total financial year income.12 The Child Care Benefit is also calculated but not included in total financial year income (as it is considered a social transfer in kind rather than a cash benefit).13

Current wages and salaries and current benefits are asked about separately from the financial year questions.

Since Release 4, the income components have been imputed for both respondents and non-respondents within responding households. The enumerated file, as a result, contains component level data (rather than just total financial year income and windfall income as occurred in earlier releases). This has also permitted the calculation of these components at the household level as detailed in Figure 19. Market income, private income and Australian public transfers have also been calculated.

The HILDA income tax model calculates the financial year tax typically payable for an Australian taxpayer in the circumstances akin to those of the respondent. It does not attempt to calculate every individual variation in tax available under the Australian taxation system. Only the major components (income tax, business income tax, Medicare Levy, private pensions tax, deductions and offsets) contributing to income tax are estimated for the individual. When aggregated, these variables compare favourably with the national aggregates. The following key points should be noted about the income tax model:

  • The input data are the imputed income variables and the data collected in the personal questionnaire. The components which the Australian Tax Office (ATO) treats as taxable income are summed: wages and salaries, business income, investment income and Australian pensions and benefits.
  • Deductions are calculated as a percentage of income for 20 income ranges, the average deductions for each income range ranging from 6% for low incomes to 4% for the highest incomes (Taxation Statistics 1999-2000, ATO, 2002, CD Table s3.8). Gross income is reduced by deductions.
  • Business income is separated from general income and then business tax is calculated. Business incomes up to $50,000 are taxed at the same rate as labour incomes. For business income exceeding $50,000 the rates applied are 15 percent up to $100,000, 10 percent up to $500,000 and 6 percent beyond $500,000. These rates reflect what is actually paid on business incomes (Taxation Statistics 1999-2000, ATO, 2002, CD Table s3.10).
  • The four standard marginal tax rates are applied for non-retired people who earn just labour incomes (Table 28). A low income offset is incorporated into the rates for those earning up to $20,000.
  • Low tax rates are applied to retired people. The rates we impute reflect what is actually paid by retired people on different levels of income (Taxation Statistics 1999-2000, ATO, 2002, CD Table s3.11). Non–respondents are presumed to be retired if aged over 65.
  • The Medicare Levy is estimated as a flat 1.5%. HILDA does not collect private health insurance status, so the Medicare surcharge cannot be applied. An adjustment is made for seniors.
  • As an approximation, low income pension and benefit recipients (taxable income less than $20,000) are deemed to pay no income tax.
  • The largest offsets are dividend imputation and eligible termination payments, but these are not collected in HILDA. As an approximation, an average national offset of 2% of taxable income is applied as a flat rate to all taxpayers.
  • As an approximation, private pensions are taxed at a flat rate of 5%. The same rate is applied to Workers’ Compensation.
  • Total income tax is calculated as the sum of income tax, business tax, Medicare Levy and private pensions tax less offsets.
Table 28: Australian Resident Income Tax Rates, Waves 1-6
Wave Income Tax Rate
1, 2, 3 (Financial Years 2000–01, 2001–02, 2002–03) $0 – $6000 Nil
$6001 – $20000 Nil plus 17c for each $ over $6000
$20001 – $50000 $2380 plus 30c for each $ over $20000
$50001 – $60000 $11380 plus 42c for each $ over $50000
$60001 and over $15580 plus 47c for each $ over $60000
4 (Financial Year 2003-04) $0 – $6000 Nil
$6001 – $21600 Nil plus 17c for each $ over $6000
$21601 – $52000 $2652 plus 30c for each $ over $21600
$52001 – $62500 $11772 plus 42c for each $ over $52000
$62501 and over $16182 plus 47c for each $ over $62500
5 (Financial Year 2004-05) $0 – $6000 Nil
$6001 – $21600 Nil plus 17c for each $ over $6000
$21601 – $58000 $2652 plus 30c for each $ over $21600
$58001 – $70000 $13572 plus 42c for each $ over $58000
$70001 and over $18612 plus 47c for each $ over $70000
6 (Financial Year 2005-06) $0 – $6000 Nil
$6001 – $21600 Nil plus 15c for each $ over $6000
$21601 – $63000 $2340 plus 30c for each $ over $21600
$63001 – $95000 $14760 plus 42c for each $ over $63000
$95001 and over $28200 plus 47c for each $ over $95000

 

Figure 19: Financial Year Income Model: Household

 

Figure 20: Financial Year Income Model: Enumerated Person

 

Figure 21: Financial Year Income Model: Responding Person

A list of additional derived income variables are provided in Table 29 (those that are directly related to the income imputation are provided later in Table 31). There are several issues to take note of in this table:

  • Wages and salaries were asked of respondents for their main job, then for all their other jobs combined. The suffix ‘g’ and ‘e’ refer to gross and estimated gross incomes – where the respondent didn’t know their gross income, their after tax income was asked for and this was translated back into an estimated gross income. The ‘e’ variables will have fewer cases with missing wages and salaries than the ‘g’ variables, as the ‘e’ variables include all the known ‘g’ values.
  • The variable labels indicate when top-coding has occurred. The actual value replacing the top–coded value will be the weighted mean of the top–coded units (see section on Confidentialisation).
  • Child support is calculated from the questions asked about the children in the family formation grid, rather than from the single category listed in the ‘other income’ question in the income section. This is because it is more likely the respondent would provide a more accurate response to the detailed questions rather than the broad ‘catch all’ question.
  • The components feeding into the ‘windfall’ income are those thought irregular (such as inheritances, redundancies, payments from parents).
  • In wave 1, respondents were asked how different their current wage and salary income was from one year ago. This has been provided in dollar terms in awsly.

The imputation method and derived variables are discussed in the following sections.

Table 29: Other derived income variables
Variable Description

Current wages and salaries and current benefits
_WSCG DV: All jobs, current gross wages per week ($). Weighted topcode.*
_WSCMG DV: Main job, current weekly gross wages & salary ($). Weighted topcode.*
_WSCOG DV: Other jobs, current weekly gross wages & salary ($). Weighted topcode.*

Financial year income – Unimputed variables
AWSLY DV: Gross weekly current wages & salary (from all jobs) one year ago ($)
_WSFG DV: Financial year gross wages & salary ($).Weighted topcode.*
_WSFG DV: Financial year gross wages & salary ($).Weighted topcode.*
_OIINT DV: Financial year interest including nil ($)
_OIRNTN DV: Financial year rental income including nil ($) Negative value
_OIRNTP DV: Financial year rental income including nil ($) Positive value
_OIDIV DV: Financial year dividends including nil ($)
_OIROY DV: Financial year royalties including nil ($)
_OIDVRY DV: Financial year dividends plus royalties including nil ($)
_TIFMKTP DV: Financial year market (factor) income ($) Positive values. Weighted topcode.*
_TIFMKTN DV: Financial year market (factor) income ($) Negative values
_TIFPRIP DV: Financial year private income ($). Positive values. Weighted topcode.*
_TIFPRIN DV: Financial year private income ($). Negative values

Financial year income – Estimated CCB, FTB A, FTB B, income tax and medicare levy
_HIFCCB DV: Household Child Care Benefit ($) financial year
_BNCCBF1 DV: Family number 1 Child Care Benefit ($) financial year
_BNCCBF2 DV: Family number 2 Child Care Benefit ($) financial year
_BNCCBF3 DV: Family number 3 Child Care Benefit ($) financial year
_BNFTAF1 DV: Family number 1 Family Tax Benefit A ($) financial year
_BNFTAF2 DV: Family number 2 Family Tax Benefit A ($) financial year
_BNFTAF3 DV: Family number 3 Family Tax Benefit A ($) financial year
_BNFTBF1 DV: Family number 1 Family Tax Benefit B ($) financial year
_BNFTBF2 DV: Family number 2 Family Tax Benefit B ($) financial year
_BNFTBF3 DV: Family number 3 Family Tax Benefit B ($) financial year
_BNMATF1 DV: Family number 1 Maternity Allowance ($) financial year
_BNMATF2 DV: Family number 2 Maternity Allowance ($) financial year
_BNMATF3 DV: Family number 3 Maternity Allowance ($) financial year
_TXINC DV: Financial year taxes – income tax - estimate ($). Weighted topcode.*
_TXMED DV: Financial year taxes - medicare - estimate ($). Weighted topcode.*
* See the section on Confidentialisation for an explanation of top–coding

 

Imputation Method

Since Release 3, the primary method for imputing income is based on a method developed by Little and Su (1989). This longitudinal imputation method incorporates trend and individual level information into the imputed amounts by using a multiplicative model based on row (person) and column (wave) effects. The model is of the form:

    imputation = (row effect) x (column effect) x (residual).

Ideally, the record with missing information (called the recipient) should be imputed using information from a record with complete information (called the donor) that has similar characteristics for the variable of interest. The Little and Su methodology was improved by extending it to take into account additional characteristics of the donors and recipients. Donors and recipients are matched within imputation classes which have similar characteristics. The imputation classes used were age groups defined by the following ranges: 15-19, 20-24, 25-34, 35-44, 45-54, 55-64, 65+.14 The formulae for the Little and Su method are provided in Appendix 3, together with a worked example.

For some cases, such as new entrants interviewed in the latest wave who did not respond to some income questions, the imputation method used was the nearest neighbour regression method adopted in Release 2 (Watson, 2004a).

For respondents with item non–response (that is, where some questions during their interview were not answered), the income components have been imputed and the totals are the sum of the relevant components. These components and totals are available on the responding person file.

The income components for non–respondents within responding households have also been imputed. Prior to using the Little and Su method for non–respondents, income components were determined to be zero or non–zero using a population carryover method.15 The Little and Su method was used to determine the non–zero amounts. However, for some cases, the Little and Su method could not be used (such as a non-responding new entrant in the latest wave). For these cases, the income totals were imputed first using the nearest neighbour regression method and then the income components were taken from the same donor. The components and totals for non-respondents are available on the enumerated person file (along with the components and totals for responding persons).

Imputed components and totals are also available at the household level on the household file.

Table 30 shows the proportion of missing cases that were imputed by each imputation method.16 The proportions are summarized across all income variables that have been imputed. Ideally all records would be imputed by the Little and Su method and although this is mostly the case for responding persons sufficient information is not always available. The nearest neighbor regression method is a fall–back method that is used when the Little and Su method cannot be applied and is used more among enumerated persons as this group includes non–respondents within responding households. Non–responding cases are the only group that undergoes the additional population carryover imputation step.

Less information is available for non-responding persons within responding households when compared to responding persons and as a result the quality of the imputation is slightly poorer. However, the income components are still provided to enable these components to be available at the household level.

Improvements to the income imputation methodology are ongoing.17 Further revisions to the income imputation methodology are expected.

Table 30: Proportion of missing cases imputed by imputation method
Imputation Method Wave
1 2 3 4 5 6
Responding Persons
 Nearest Neighbour 13.7 4.3 5.4 4.6 4.3 7.2
 Little & Su 86.3 95.7 94.6 95.4 95.7 92.8
Enumerated Persons
 Nearest Neighbour 54.7 37.4 40.3 39.6 41.2 49.1
 Little & Su 32.9 38.8 39.5 36.2 40.5 39.2
 Carryover 12.4 23.8 20.2 24.2 18.4 11.7

 

Imputed Income Variables

All income imputation was undertaken at the derived variable level, leaving the original data unchanged. In the main, both the pre–imputed and post–imputed variables are available in the datasets, along with an imputation flag, so that it is easy to choose between using the pre–imputed data or the post–imputed data.

An overview of the pre– and post–imputed income variables is provided in Table 31. We have deviated from the general style of presenting the derived variables in this manual in the hope that it is clearer from the following table how the post–imputed variables and flags relate to the pre–imputed variables.

Table 31: Person imputed income variables
  Pre-imputed Post-imputed Flag
Responding person file
Current income
 Wages and salaries – all jobs _wsce _wscei _wscef
 Wages and salaries – main job _wscme _wscmei _wscmef
 Wages and salaries – other jobs _wscoe _wscoei _wscoef
 Benefits _bncaup _bncaupi _bncaupf
Financial year income
Wages and salaries _wsfe _wsfei _wsfef
Australian govt pensions _bnfaup _bnfaupi _bnfaupf
Foreign govt pensions _bnffp _bnffpi _bnffpf
Business income _bifn, _bifp _bifin, _bifip _biff
Investments _oifinvn, _oifinvp _oifinin, _oifinip _oifinf
Private pensions _oifpp _oifppi _oifppf
Private transfers _oifpt _oifpti _oifptf
Total FY incomeA Not provided _tifefn, _tifefp _tifeff
Windfall income _oifwfl _oifwfli _oifwflf
Enumerated person file
Current income
Wages and salaries – all jobs Not provided _wscei _wscef
Wages and salaries – main job Not provided _wscmei _wscmef
Wages and salaries – other jobs Not provided _wscoei _wscoef
Benefits Not provided _bncaupi _bncaupf
Financial year income
Wages and salaries Not provided _wsfei _wsfef
Australian govt pensions Not provided _bnfaupi _bnfaupf
Foreign govt pensions Not provided _bnffpi _bnffpf
Business income Not provided _bifin, _bifip _biff
Investments Not provided _oifinin, _oifinip _oifinf
Private pensions Not provided _oifppi _oifppf
Private transfers Not provided _oifpti _oifptf
Total FY incomeA Not provided _tifefn, _tifefp _tifeff
Windfall income Not provided _oifwfli _oifwflf
Household file
Current income
Wages and salaries – all jobs Not provided _hiwscei _hifwscef
Wages and salaries – main job Not provided _hiwscmi _hifwscmf
Wages and salaries – other jobs Not provided _hiwscoi _hifwscof
Benefits Not provided _hicaupi _hicaupf
Financial year income
Wages and salaries Not provided _hiwsfei _hifwsfef
Australian govt pensions Not provided _hifaupi _hifaupf
Foreign govt pensions Not provided _hiffpi _hiffpf
Business income Not provided _hibifin, _hibifip _hifbiff
Investments Not provided _hifinin, _hifinip _hifinf
Private pensions Not provided _hifppi _hifppf
Private transfers Not provided _hifpti _hifptf
Total FY income Not provided _hifefn, _hifefp _hifeff
Windfall income Not provided _hifwfli _hifwflf
A The following variables use total person financial year income (_tifefn, _tifefp) in their calculations: income tax (_txinc), and medicare (_txmed). Use _tifeff as imputation flag for these variables.

 


Endnotes:

12 The Maternity Allowance is allocated to all families with newborn children and included in the Australian pensions and benefits. Back to where you were
13 This change has been in place since Release 4.0. In earlier Releases, the Child Care Benefit was included in the total financial year income and the Maternity Allowance was only recorded if the respondent reported it. Back to where you were
14 Age groups were used to create the imputation classes because it is a simple characteristic and it is known for almost all donors and recipients. For a few cases, age was missing and was therefore imputed from a person with a similar relationship structure to the missing case. Not all income variables were imputed using imputation classes. The variables where donors and recipients were matched with imputation classes were current wages and salaries, current benefits, financial year wages and salaries, Australian Government pensions and rent income. Back to where you were
15 Zeros are carried forward or backward from the surrounding waves with the same probability as that observed in the complete cases. Back to where you were
16 For the proportion of cases which are missing, see Table 39. Back to where you were
17 A detailed review of the performance of the imputation methods is reported in Starick and Watson (2007). Back to where you were

 

top of pagetop of page

HILDA Contact us

Contact the University : Disclaimer & Copyright : Privacy : Accessibility