|
|
|
Income Variables and Income Imputation
Income, Tax and Family Benefits Model
Figure 19, Figure 20 and Figure 21 show how the numerous income questions in the Person Questionnaire are combined together to form several financial year income components and a windfall income component on the responding person file, enumerated person file and household file respectively. The Family Tax Benefit and Maternity Allowance are calculated on the interim income to produce a total financial year income.12 The Child Care Benefit is also calculated but not included in total financial year income (as it is considered a social transfer in kind rather than a cash benefit).13
Current wages and salaries and current benefits are asked about separately from the financial year questions.
Since Release 4, the income components have been imputed for both respondents and non-respondents within responding households. The enumerated file, as a result, contains component level data (rather than just total financial year income and windfall income as occurred in earlier releases). This has also permitted the calculation of these components at the household level as detailed in Figure 19. Market income, private income and Australian public transfers have also been calculated.
The HILDA income tax model calculates the financial year tax typically payable for an Australian taxpayer in the circumstances akin to those of the respondent. It does not attempt to calculate every individual variation in tax available under the Australian taxation system. Only the major components (income tax, business income tax, Medicare Levy, private pensions tax, deductions and offsets) contributing to income tax are estimated for the individual. When aggregated, these variables compare favourably with the national aggregates. The following key points should be noted about the income tax model:
- The input data are the imputed income variables and the data collected in the personal questionnaire. The components which the Australian Tax Office (ATO) treats as taxable income are summed: wages and salaries, business income, investment income and Australian pensions and benefits.
- Deductions are calculated as a percentage of income for 20 income ranges, the average deductions for each income range ranging from 6% for low incomes to 4% for the highest incomes (Taxation Statistics 1999-2000, ATO, 2002, CD Table s3.8). Gross income is reduced by deductions.
- Business income is separated from general income and then business tax is calculated. Business incomes up to $50,000 are taxed at the same rate as labour incomes. For business income exceeding $50,000 the rates applied are 15 percent up to $100,000, 10 percent up to $500,000 and 6 percent beyond $500,000. These rates reflect what is actually paid on business incomes (Taxation Statistics 1999-2000, ATO, 2002, CD Table s3.10).
- The four standard marginal tax rates are applied for non-retired people who earn just labour incomes (Table 28). A low income offset is incorporated into the rates for those earning up to $20,000.
- Low tax rates are applied to retired people. The rates we impute reflect what is actually paid by retired people on different levels of income (Taxation Statistics 1999-2000, ATO, 2002, CD Table s3.11). Non–respondents are presumed to be retired if aged over 65.
- The Medicare Levy is estimated as a flat 1.5%. HILDA does not collect private health insurance status, so the Medicare surcharge cannot be applied. An adjustment is made for seniors.
- As an approximation, low income pension and benefit recipients (taxable income less than $20,000) are deemed to pay no income tax.
- The largest offsets are dividend imputation and eligible termination payments, but these are not collected in HILDA. As an approximation, an average national offset of 2% of taxable income is applied as a flat rate to all taxpayers.
- As an approximation, private pensions are taxed at a flat rate of 5%. The same rate is applied to Workers’ Compensation.
- Total income tax is calculated as the sum of income tax, business tax, Medicare Levy and private pensions tax less offsets.
Table 28: Australian Resident Income Tax Rates, Waves 1-6
| Wave |
Income |
Tax Rate |
| 1, 2, 3 (Financial Years 2000–01, 2001–02, 2002–03) |
$0 – $6000 |
Nil |
| $6001 – $20000 |
Nil plus 17c for each $ over $6000 |
| $20001 – $50000 |
$2380 plus 30c for each $ over $20000 |
| $50001 – $60000 |
$11380 plus 42c for each $ over $50000 |
| $60001 and over |
$15580 plus 47c for each $ over $60000 |
| 4 (Financial Year 2003-04) |
$0 – $6000 |
Nil |
| $6001 – $21600 |
Nil plus 17c for each $ over $6000 |
| $21601 – $52000 |
$2652 plus 30c for each $ over $21600 |
| $52001 – $62500 |
$11772 plus 42c for each $ over $52000 |
| $62501 and over |
$16182 plus 47c for each $ over $62500 |
| 5 (Financial Year 2004-05) |
$0 – $6000 |
Nil |
| $6001 – $21600 |
Nil plus 17c for each $ over $6000 |
| $21601 – $58000 |
$2652 plus 30c for each $ over $21600 |
| $58001 – $70000 |
$13572 plus 42c for each $ over $58000 |
| $70001 and over |
$18612 plus 47c for each $ over $70000 |
| 6 (Financial Year 2005-06) |
$0 – $6000 |
Nil |
| $6001 – $21600 |
Nil plus 15c for each $ over $6000 |
| $21601 – $63000 |
$2340 plus 30c for each $ over $21600 |
| $63001 – $95000 |
$14760 plus 42c for each $ over $63000 |
| $95001 and over |
$28200 plus 47c for each $ over $95000 |
Figure 19: Financial Year Income Model: Household

Figure 20: Financial Year Income Model: Enumerated Person

Figure 21: Financial Year Income Model: Responding Person 

A list of additional derived income variables are provided in Table 29 (those that are directly related to the income imputation are provided later in Table 31). There are several issues to take note of in this table:
- Wages and salaries were asked of respondents for their main job, then for all their other jobs combined. The suffix ‘g’ and ‘e’ refer to gross and estimated gross incomes – where the respondent didn’t know their gross income, their after tax income was asked for and this was translated back into an estimated gross income. The ‘e’ variables will have fewer cases with missing wages and salaries than the ‘g’ variables, as the ‘e’ variables include all the known ‘g’ values.
- The variable labels indicate when top-coding has occurred. The actual value replacing the top–coded value will be the weighted mean of the top–coded units (see section on Confidentialisation).
- Child support is calculated from the questions asked about the children in the family formation grid, rather than from the single category listed in the ‘other income’ question in the income section. This is because it is more likely the respondent would provide a more accurate response to the detailed questions rather than the broad ‘catch all’ question.
- The components feeding into the ‘windfall’ income are those thought irregular (such as inheritances, redundancies, payments from parents).
- In wave 1, respondents were asked how different their current wage and salary income was from one year ago. This has been provided in dollar terms in awsly.
The imputation method and derived variables are discussed in the following sections.
Table 29: Other derived income variables
| Variable |
Description |
Current wages and salaries and current benefits
|
| _WSCG |
DV: All jobs, current gross wages per week ($). Weighted topcode.* |
| _WSCMG |
DV: Main job, current weekly gross wages & salary ($). Weighted topcode.* |
| _WSCOG |
DV: Other jobs, current weekly gross wages & salary ($). Weighted topcode.* |
Financial year income – Unimputed variables
|
| AWSLY |
DV: Gross weekly current wages & salary (from all jobs) one year ago ($) |
| _WSFG |
DV: Financial year gross wages & salary ($).Weighted topcode.* |
| _WSFG |
DV: Financial year gross wages & salary ($).Weighted topcode.* |
| _OIINT |
DV: Financial year interest including nil ($) |
| _OIRNTN |
DV: Financial year rental income including nil ($) Negative value |
| _OIRNTP |
DV: Financial year rental income including nil ($) Positive value |
| _OIDIV |
DV: Financial year dividends including nil ($) |
| _OIROY |
DV: Financial year royalties including nil ($) |
| _OIDVRY |
DV: Financial year dividends plus royalties including nil ($) |
| _TIFMKTP |
DV: Financial year market (factor) income ($) Positive values. Weighted topcode.* |
| _TIFMKTN |
DV: Financial year market (factor) income ($) Negative values |
| _TIFPRIP |
DV: Financial year private income ($). Positive values. Weighted topcode.* |
| _TIFPRIN |
DV: Financial year private income ($). Negative values |
Financial year income – Estimated CCB, FTB A, FTB B, income tax and medicare levy
|
| _HIFCCB |
DV: Household Child Care Benefit ($) financial year |
| _BNCCBF1 |
DV: Family number 1 Child Care Benefit ($) financial year |
| _BNCCBF2 |
DV: Family number 2 Child Care Benefit ($) financial year |
| _BNCCBF3 |
DV: Family number 3 Child Care Benefit ($) financial year |
| _BNFTAF1 |
DV: Family number 1 Family Tax Benefit A ($) financial year |
| _BNFTAF2 |
DV: Family number 2 Family Tax Benefit A ($) financial year |
| _BNFTAF3 |
DV: Family number 3 Family Tax Benefit A ($) financial year |
| _BNFTBF1 |
DV: Family number 1 Family Tax Benefit B ($) financial year |
| _BNFTBF2 |
DV: Family number 2 Family Tax Benefit B ($) financial year |
| _BNFTBF3 |
DV: Family number 3 Family Tax Benefit B ($) financial year |
| _BNMATF1 |
DV: Family number 1 Maternity Allowance ($) financial year |
| _BNMATF2 |
DV: Family number 2 Maternity Allowance ($) financial year |
| _BNMATF3 |
DV: Family number 3 Maternity Allowance ($) financial year |
| _TXINC |
DV: Financial year taxes – income tax - estimate ($). Weighted topcode.* |
| _TXMED |
DV: Financial year taxes - medicare - estimate ($). Weighted topcode.* |
Imputation Method
Since Release 3, the primary method for imputing income is based on a method developed by Little and Su (1989). This longitudinal imputation method incorporates trend and individual level information into the imputed amounts by using a multiplicative model based on row (person) and column (wave) effects. The model is of the form:
imputation = (row effect) x (column effect) x (residual).
Ideally, the record with missing information (called the recipient) should be imputed using information from a record with complete information (called the donor) that has similar characteristics for the variable of interest. The Little and Su methodology was improved by extending it to take into account additional characteristics of the donors and recipients. Donors and recipients are matched within imputation classes which have similar characteristics. The imputation classes used were age groups defined by the following ranges: 15-19, 20-24, 25-34, 35-44, 45-54, 55-64, 65+.14 The formulae for the Little and Su method are provided in Appendix 3, together with a worked example.
For some cases, such as new entrants interviewed in the latest wave who did not respond to some income questions, the imputation method used was the nearest neighbour regression method adopted in Release 2 (Watson, 2004a).
For respondents with item non–response (that is, where some questions during their interview were not answered), the income components have been imputed and the totals are the sum of the relevant components. These components and totals are available on the responding person file.
The income components for non–respondents within responding households have also been imputed. Prior to using the Little and Su method for non–respondents, income components were determined to be zero or non–zero using a population carryover method.15 The Little and Su method was used to determine the non–zero amounts. However, for some cases, the Little and Su method could not be used (such as a non-responding new entrant in the latest wave). For these cases, the income totals were imputed first using the nearest neighbour regression method and then the income components were taken from the same donor. The components and totals for non-respondents are available on the enumerated person file (along with the components and totals for responding persons).
Imputed components and totals are also available at the household level on the household file.
Table 30 shows the proportion of missing cases that were imputed by each imputation method.16 The proportions are summarized across all income variables that have been imputed. Ideally all records would be imputed by the Little and Su method and although this is mostly the case for responding persons sufficient information is not always available. The nearest neighbor regression method is a fall–back method that is used when the Little and Su method cannot be applied and is used more among enumerated persons as this group includes non–respondents within responding households. Non–responding cases are the only group that undergoes the additional population carryover imputation step.
Less information is available for non-responding persons within responding households when compared to responding persons and as a result the quality of the imputation is slightly poorer. However, the income components are still provided to enable these components to be available at the household level.
Improvements to the income imputation methodology are ongoing.17 Further revisions to the income imputation methodology are expected.
Table 30: Proportion of missing cases imputed by imputation method
| Imputation Method |
Wave |
| 1 |
2 |
3 |
4 |
5 |
6 |
| Responding Persons |
| Nearest Neighbour |
13.7 |
4.3 |
5.4 |
4.6 |
4.3 |
7.2 |
| Little & Su |
86.3 |
95.7 |
94.6 |
95.4 |
95.7 |
92.8 |
| Enumerated Persons |
| Nearest Neighbour |
54.7 |
37.4 |
40.3 |
39.6 |
41.2 |
49.1 |
| Little & Su |
32.9 |
38.8 |
39.5 |
36.2 |
40.5 |
39.2 |
| Carryover |
12.4 |
23.8 |
20.2 |
24.2 |
18.4 |
11.7 |
Imputed Income Variables
All income imputation was undertaken at the derived variable level, leaving the original data unchanged. In the main, both the pre–imputed and post–imputed variables are available in the datasets, along with an imputation flag, so that it is easy to choose between using the pre–imputed data or the post–imputed data.
An overview of the pre– and post–imputed income variables is provided in Table 31. We have deviated from the general style of presenting the derived variables in this manual in the hope that it is clearer from the following table how the post–imputed variables and flags relate to the pre–imputed variables.
Table 31: Person imputed income variables
| |
Pre-imputed |
Post-imputed |
Flag |
| Responding person file |
| Current income |
| Wages and salaries – all jobs |
_wsce |
_wscei |
_wscef |
| Wages and salaries – main job |
_wscme |
_wscmei |
_wscmef |
| Wages and salaries – other jobs |
_wscoe |
_wscoei |
_wscoef |
| Benefits |
_bncaup |
_bncaupi |
_bncaupf |
| Financial year income |
| Wages and salaries |
_wsfe |
_wsfei |
_wsfef |
| Australian govt pensions |
_bnfaup |
_bnfaupi |
_bnfaupf |
| Foreign govt pensions |
_bnffp |
_bnffpi |
_bnffpf |
| Business income |
_bifn, _bifp |
_bifin, _bifip |
_biff |
| Investments |
_oifinvn, _oifinvp |
_oifinin, _oifinip |
_oifinf |
| Private pensions |
_oifpp |
_oifppi |
_oifppf |
| Private transfers |
_oifpt |
_oifpti |
_oifptf |
| Total FY incomeA |
Not provided |
_tifefn, _tifefp |
_tifeff |
| Windfall income |
_oifwfl |
_oifwfli |
_oifwflf |
| Enumerated person file |
| Current income |
| Wages and salaries – all jobs |
Not provided |
_wscei |
_wscef |
| Wages and salaries – main job |
Not provided |
_wscmei |
_wscmef |
| Wages and salaries – other jobs |
Not provided |
_wscoei |
_wscoef |
| Benefits |
Not provided |
_bncaupi |
_bncaupf |
| Financial year income |
| Wages and salaries |
Not provided |
_wsfei |
_wsfef |
| Australian govt pensions |
Not provided |
_bnfaupi |
_bnfaupf |
| Foreign govt pensions |
Not provided |
_bnffpi |
_bnffpf |
| Business income |
Not provided |
_bifin, _bifip |
_biff |
| Investments |
Not provided |
_oifinin, _oifinip |
_oifinf |
| Private pensions |
Not provided |
_oifppi |
_oifppf |
| Private transfers |
Not provided |
_oifpti |
_oifptf |
| Total FY incomeA |
Not provided |
_tifefn, _tifefp |
_tifeff |
| Windfall income |
Not provided |
_oifwfli |
_oifwflf |
| Household file |
| Current income |
| Wages and salaries – all jobs |
Not provided |
_hiwscei |
_hifwscef |
| Wages and salaries – main job |
Not provided |
_hiwscmi |
_hifwscmf |
| Wages and salaries – other jobs |
Not provided |
_hiwscoi |
_hifwscof |
| Benefits |
Not provided |
_hicaupi |
_hicaupf |
| Financial year income |
| Wages and salaries |
Not provided |
_hiwsfei |
_hifwsfef |
| Australian govt pensions |
Not provided |
_hifaupi |
_hifaupf |
| Foreign govt pensions |
Not provided |
_hiffpi |
_hiffpf |
| Business income |
Not provided |
_hibifin, _hibifip |
_hifbiff |
| Investments |
Not provided |
_hifinin, _hifinip |
_hifinf |
| Private pensions |
Not provided |
_hifppi |
_hifppf |
| Private transfers |
Not provided |
_hifpti |
_hifptf |
| Total FY income |
Not provided |
_hifefn, _hifefp |
_hifeff |
| Windfall income |
Not provided |
_hifwfli |
_hifwflf |
| A |
The following variables use total person financial year income (_tifefn, _tifefp) in their calculations: income tax (_txinc), and medicare (_txmed). Use _tifeff as imputation flag for these variables. |
Endnotes:
| 12 |
The Maternity Allowance is allocated to all families with newborn children and included in the Australian pensions and benefits. Back to where you were |
| 13 |
This change has been in place since Release 4.0. In earlier Releases, the Child Care Benefit was included in the total financial year income and the Maternity Allowance was only recorded if the respondent reported it. Back to where you were |
| 14 |
Age groups were used to create the imputation classes because it is a simple characteristic and it is known for almost all donors and recipients. For a few cases, age was missing and was therefore imputed from a person with a similar relationship structure to the missing case. Not all income variables were imputed using imputation classes. The variables where donors and recipients were matched with imputation classes were current wages and salaries, current benefits, financial year wages and salaries, Australian Government pensions and rent income. Back to where you were |
| 15 |
Zeros are carried forward or backward from the surrounding waves with the same probability as that observed in the complete cases. Back to where you were |
| 16 |
For the proportion of cases which are missing, see Table 39. Back to where you were |
| 17 |
A detailed review of the performance of the imputation methods is reported in Starick and Watson (2007). Back to where you were |
|