information knowledge management
delivery
spatial information
cross agency
location
strategy
data dissemination
GEDI
area profiles
strategy
joined-up geography
strategy
cross agency
spatial information
area profiles
joined-up geography
geographic enquiry and display interface

Data Aggregation and Re-aggregation

Overview

Information is increasingly available at very detailed geographic levels, often for individual households. Such information, is extremely useful for analysis, but can breach confidentiality requirements if published at such a granular level. Our cost effective data aggregation service helps organisations to publish their information to the geographic boundaries that they require and has proved to be particularly useful to clients when combined with our geocoding services.


In addition, specific local areas or neighbourhoods require detailed monitoring over time, particularly for regeneration. Often the boundaries of these areas do not relate directly to any existing geographic boundaries and it is necessary to convert information held at one geographical level to another. Our statistical re-aggregation service helps organisations to achieve this and the algorithms we use in this service underly the custom geography creation and analysis in our GEDI software as well.



Data Aggregation

Our service can combine any geography, related geography or point based information to the geographic boundaries that you require. This is particularly relevant if you need to publish information that adheres to specific data confidentiality limits.


Much data available on the web is located using its postcode alone. This can introduce innaccuracies into its subsequent aggregation, because the information (e.g. household statistics) is not located in its true location and hence is aggregated in the results of a neighbouring area. This is a particular issue when presenting information at detailed geographic levels, such as Output Areas. Manchester Geomatics geocoding service can be used to generate a much more accurate location for your addresses, population or household information.



Statistical Re-aggregation

Statistical data in information systems can be captured at various different geographical scales, ranging from the very detailed such as output areas (containing on average about 250 people), to larger areas such as wards (containing on average about 30,000 people) or local authority boundaries.


Our algorithms can re-aggregate statistical data from smaller areas in to larger areas that are either specifically user defined (custom geographies) or are other areas of interest, such as active areas of housing market renewal. Often, the boundaries of such areas do not relate to those of other geographies and so the information in the source geographic information needs to be re-aggregated to fit the desired geography. The algorithm that accomplishes this is enapsulated in our GEDI report wizard tool.



The re-aggregation process uses a combination of population estimates from the 2001 census and a detailed address base to estimate the distribution of population within each of the statistical areas that overlap with the users chosen geography/geographies. The ratio of the population estimate contained in the overlapping portion of each relevant statistical area to the total population for that small scale geographical unit is used to re-calculate the statistic for an area.


Test Results

Comparison of the results of this technique with known values shows a very close match, especially when data for small scale geographies (such as output areas) is used for the aggregation. Larger scale geographical units such as middle super output areas (MSOAs) tend to have lower accuracy. Typical margins of error that are generated by the re-aggregation when using this tool are summarised in the following table:

Percentage Difference: Ward Super Output Area Output Area
Mean 15.5 3.7 1.1
Standard Deviation 15.3 3.4 1.4

In general, the smaller the geography for which the statistic is available, the more accurate the re-aggegation. For example, Output areas proved to be very accurate with 95% of statistical results within 4% of their true value. Super Output Areas are still reasonably accurate, with 95% of results varying by up to 10.5% of their true value. By comparison, Ward level information was significantly less accurate when re-aggregated, with 95% of results only within 45% of their real values.


Summary

This tool will only allow the re-aggregation of units equal to or smaller in size than Lower Super Output areas, as the re-aggregation of such smaller areas has been proven to be reasonably accurate, with a margin of error between 4% and 11%.


Nevertheless, even the figures produced by the re-aggregation of Output areas and Super Output Areas in the report wizard should not be used as a definitive value but as an indication of the likely value of a statistic.


Finally, the accuracy will be affected by the type of variable under consideration. Total population counts, total households, and total dwellings will all be reasonably accurate because they are closely related to the underlying distribution of population. However, variables such as ethnicity, age, household composition will be less accurate, because they are not necessarily (and often definitely not) evenly distributed.



Contact Us

Please contact Ed Scrase for more information.



 
© manchester.geomatics 2000 - <Site Map>