# Overview

Extract, clean and analyse UK Gov Country and Regional Analyses (CRA).

This is being distributed as a python package. To get going:

# make sure you are in this directory
cd ....
# set up a virtualenv for any dependencies
virtualenv pyenv
# install using pip
pip -E pyenv install -e .
# now try it out
python data.py -h


NB: the database stuff (db.py) requires SQLAlchemy.

## Data Details

### cache/cra_2009_db.csv

We do not store the main CRA file in the repo but retrieve it on demand (it is over 5Mb).

### cofog_map.json

This is a mapping from the "function" and "subfunction" values in the published CRA spreadsheet to the most appropriate COFOG code. In theory CRA should be entirely classified by COFOG, but in practice this data is corrupted in various ways. This mapping partially corrects for the corruption.

Originally constructed by William Waites based on the official COFOG list and the CRA data. Cross-checked with a similar mapping constructed by Dave Boyce. Further changes and fixes based on HMT's recommendations here:

http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-RA-07-022/EN/KS-RA-07-022-EN.PDF

#### More Gory Details

[This section is largely trnscribed from e-mail: http://lists.okfn.org/pipermail/wdmmg-discuss/2010-April/000165.html ]

You will recall that the "function" and "subfunction" columns of the published CRA data contain broken data, and that we have been using a mapping designed by William Waites to fix them. Dave Boyce has also produced such a mapping, and we thought it would be a good idea to cross-check them.

It was! I have been able to fix several errors and omissions. However, there are still some hard ones that I would like some help with. There is some partially helpful advice from HM Treasury here:

http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-RA-07-022/EN/KS-RA-07-022-EN.PDF

Here are the noteworthy ones:

• Anything with a subfunction of "LA data subfunction" is a problem. I don't know what this means. This occurs at least once for every function. I have mapped these to relatively coarse COFOG codes: 1, 2, 3, 4.1, 4.2, 5, 6, 7, 8, 9 and 10.
• Function "EU entries", subfunction "EC receipts" is left "unclassified" by HMT. I've mapped this to 1.8 (Transfers of a general character between different levels of government).
• Function "EU entries", subfunction "GNI-based contribution (net of abatement and collection costs)" is also left unmapped. I have mapped it to 1.2 (Foreign economic aid).
• [APS updating for 2010] Rows with function "EU entries", subfunction also "EU entries": I've attributed to 1.8 for now, but will ask others to review.
• Many of the sub-sub-functions of COFOG function 1 (General public services) all say "of which: public and common services". This is presumably a cut-and-paste error.
• Function "3. Public order and safety", subfunction "of which: immigration and citizenship" is a tricky one. HMT recommends mapping it to 3.1.2.
• Function "of which: agriculture, fisheries and forestry", sub-function "of which: other agriculture, food and fisheries policy" maps to two COFOG codes: 4.2.1 and 4.2.3. I have mapped it all to 4.2.3, on the feeble grounds that nothing else is mapped there.
• We previous little level-3 data for code 4 (economic affairs) except for 4.5 (transport). Even there, nothing maps onto 4.5.2 (water transport) or 4.5.4 (air transport).
• Function "of which: transport", subfunction "of which: local public transport" is mapped to 4.5.1 (road transport).
• Many of the sub-sub-functions of COFOG function 4 (economic affairs) all say "of which: enterprise and economic development". This is presumably a cut-and-paste error.
• Function "7. Health" and subfunction "Central and other health services" maps to both 7.4 and 7.6. I have mapped it all to 7.4.
• Function "7. Health", subfunction "Medical services" maps to 7.1, 7.2 and 7.3. I have mapped it all to 7.1.
• Code 10 means "Social protection", and inside it are codes 10.7 "Social exclusion" and 10.9 "Social protection". Therefore, where the published data says function is "Social protection" and subfunction is "blah, blah (social exclusion)" I assume I should use 10.7. However, when both function and subfunction say "Social protection" is it valid to use 10.9?

### nuts1_population_2006.csv

Population data for NUTS regions, from 2006. Only NUTS1 region totals are included, because the CRA spending data is only localised down to NUTS1 regions.

Constructed by Alistair Turnbull based on the data here: