Public consultation analysis: Fairer Council tax

Alma Economics was recently commissioned by the Scottish Government to analyse the responses to a consultation seeking views from the public on a change in Council Tax charges. A bit of context: in 2023, Council Tax in Scotland was projected to contribute approximately £2.9 billion to funding services such as schools, social care, roads, transport and waste services across local authorities. The tax is paid by the occupiers of domestic properties, and the amount that an individual pays depends on several factors. However, there is a perceived imbalance in the system. As noted in the Commission on Local Tax Reform’s 2015 report, the effective Council Tax rate is currently higher for lower-value properties (when expressed as a percentage of the estimated property value). Intending to address this potential issue, the Scottish Government proposed increasing Council Tax charges by 7.5%, 12.5%, 17.5%, and 22.5% for Bands E through H, respectively.

The consultation on the proposed change received over 15,600 responses (submitted online or through e-mail or post) across six closed-format and six open-format questions with free-text fields. Responses to these questions varied significantly in the level of detail, length and topics covered, with some individual responses exceeding 2,000 words in length.

To analyse these responses, we adopted an approach combining manual coding (to develop an initial codebook of themes) and automated text analysis, using Natural Language Processing (NLP) models to assign themes to responses that were not initially reviewed by the research team. This approach allowed us to fully consider all consultation responses while (i) leveraging the insights of experienced qualitative researchers and (ii) meeting the consultation timelines. The models that we selected are known as Large Language Models (LLMs), which are the most recent NLP innovations in the fields of text classification and generation and can act as very advanced “categorisation machines”. This technology was first developed in 2017 and is currently widely used across both industry and academia for NLP tasks such as multilabel text classification.

Once our team completed manual coding of an initial sample of free-text responses, our automated approach consisted of five main steps:

  1. Increase the size of the training dataset by using the large language model Llama to artificially generate 100 synthetic responses for each manually-coded consultation response. This strategy allowed our team to increase the accuracy of the NLP models used for automated text analysis, as higher accuracy levels are correlated with larger input datasets used for fine-tuning

  2. Fine-tune three different open-source models (BERT, GPT-2, and few-shot learning) by training them on the augmented training dataset (including the manually coded and synthetically generated responses).

  3. Use the fine-tuned models to output a set of probabilities of a given answer belonging to a particular theme, then use a procedure known as maximum cut to select the threshold to determine which themes should be assigned to each response.

  4. Combine the themes for the three models using majority voting. This allowed us to remove bias from any individual model by adopting a “wisdom of crowds” approach: the final set of themes was based on combining themes assigned by each of the three models.

  5. Manually review a random sample of responses to ensure themes assigned by the models were consistent with themes that would have been manually assigned by researchers.

To read our full report for the Scottish Government summarising our research findings, click here.