Data Aggregation and Re-aggregation
Overview
Information is increasingly available at very detailed geographic levels, often for individual households. Such information,
is extremely useful for analysis, but can breach confidentiality requirements if published at such a granular level. Our
cost effective data aggregation service helps organisations to publish their information to
the geographic boundaries that they require and has proved to be particularly useful to clients when combined with our
geocoding services.
In addition, specific local areas or neighbourhoods require detailed monitoring over time, particularly for regeneration.
Often the boundaries of these areas do not relate directly to any existing geographic boundaries and it is necessary to convert
information held at one geographical level to another. Our statistical re-aggregation service helps
organisations to achieve this and the algorithms we use in this service underly the custom geography creation and
analysis in our GEDI software as well.
Data Aggregation
Our service can combine any geography, related geography or point based information to the geographic boundaries that you
require. This is particularly relevant if you need to publish information that adheres to specific data confidentiality limits.
Much data available on the web is located using its postcode alone. This can introduce innaccuracies into its subsequent
aggregation, because the information (e.g. household statistics) is not located in its true location and hence is
aggregated in the results of a neighbouring area. This is a particular issue when presenting information at detailed
geographic levels, such as Output Areas. Manchester Geomatics
geocoding service can be used to generate a much more accurate location for
your addresses, population or household information.
Statistical Re-aggregation
Statistical data in information systems can be captured at various different geographical scales, ranging from the
very detailed such as output areas (containing on average about 250 people), to larger areas such as
wards (containing on average about 30,000 people) or local authority boundaries.
Our algorithms can re-aggregate statistical data from smaller areas in to larger areas
that are either specifically user defined (custom geographies) or are other areas of interest, such as
active areas of housing market renewal. Often, the boundaries of such areas do not relate to those of other geographies
and so the information in the source geographic information needs to be re-aggregated to fit the desired geography.
The algorithm that accomplishes this is enapsulated in our
GEDI report wizard tool.
The re-aggregation process uses a combination of population estimates from the 2001 census and
a detailed address base to estimate the distribution of population within each of the statistical areas
that overlap with the users chosen geography/geographies. The ratio of the population estimate contained in the
overlapping portion of each relevant statistical area to the total population for that small scale geographical unit is used
to re-calculate the statistic for an area.
Test Results
Comparison of the results of this technique with known values shows
a very close match, especially when data for small scale geographies
(such as output areas) is used for the aggregation. Larger scale
geographical units such as middle super output areas (MSOAs) tend to
have lower accuracy. Typical margins of error that are generated by the
re-aggregation when using this tool are summarised in the following table:
| Percentage Difference: | Ward | Super Output Area | Output Area |
| Mean |
15.5 |
3.7 |
1.1 |
| Standard Deviation |
15.3 |
3.4 |
1.4 |
In general, the smaller the geography for which the statistic is available, the more accurate the re-aggegation. For example,
Output areas proved to be very accurate with 95% of statistical results within 4% of their true value.
Super Output Areas are still reasonably accurate, with 95% of results varying by up to 10.5% of their true value. By comparison,
Ward level information was significantly less accurate when re-aggregated, with 95% of results only within 45% of their real values.
Summary
This tool will only allow the re-aggregation of units equal to or smaller in size than Lower Super Output areas, as the re-aggregation of
such smaller areas has been proven to be reasonably accurate, with a margin of error between 4% and 11%.
Nevertheless, even the figures produced by the re-aggregation of Output areas and Super Output Areas in the report wizard should not
be used as a definitive value but as an indication of the likely value of a statistic.
Finally, the accuracy will be affected by the type of variable under consideration. Total population counts, total households, and total
dwellings will all be reasonably accurate because they are closely related to the underlying distribution of population. However,
variables such as ethnicity, age, household composition will be less accurate, because they are not necessarily
(and often definitely not) evenly distributed.
Contact Us
Please contact Ed Scrase for more information.