Consultation on proposed changes to the assessment of GCSEs, AS and A levels in 2022: Analysis of responses

While the UK government has committed to exams going ahead in summer 2022, in light of the uncertainty around the continuing impact of the pandemic it has also proposed guidance for contingency arrangements for students, exam centres and exam boards in case exams cannot take place safely or fairly due to public health restrictions. Alma Economics was commissioned by Ofqual to analyse the responses to a consultation launched in September 2021 to collect public opinions on the proposed guidance.

To carry out thematic analysis of open-ended questions, we have developed a machine-assisted approach, with codes generated by machine learning models in Python used as a quality assurance check of manually-applied codes. First, we create an initial set of themes and ideas (a codebook) based on our prior experience and understanding of the consultation context. Based on this initial codebook, we manually review responses, labelling key themes and ideas identified in responses and adding to the codebook as needed. While many of the themes occur over multiply responses, we also take note of standout single comments, especially those that share a personal experience or propose a new policy idea.

As a quality assurance check for the manual labels, we run two classes of machine learning models to check the labels manually assigned by our team:

  • Latent Dirichlet Allocation (LDA), an unsupervised algorithm designed to discover “topics” in a collection of documents (in this project, paragraphs or a set of paragraphs of consultation responses). One advantage of this algorithm is that we do not have to manually label the responses as the algorithm groups responses into topics based on which words frequently appear together in responses. The output of this algorithm is a set of keywords that are typical representatives for each topic.

  • Support Vector Machines (SVM), a supervised algorithm designed for text classification. This means that in practice, SVM draws the best “boundary” between vectors that belong to a given codebook label and vectors that do not belong to it. Implementing SVM as part of this consultation is challenging because labels are not mutually exclusive: responses can touch on multiple themes and thus meet the criteria for multiple labels (a task known as multilabel classification). While multilabel classification cannot be natively implemented with SVM, we break down the task so that Scikit-learn can generate multiple classifiers that together predict labels for responses (using the MultiOutputClassifier object).

There are a number of factors that influence the quality of machine learning model outputs: the quality of text processing before running the model, the distribution of key themes across responses, the choice of algorithm and tuning parameters. Different algorithms need to be compared across metrics, such as perplexity (for unsupervised algorithms) and accuracy score or confusion matrices (for supervised algorithms). While each project requires a different approach based on the specific research question and type of data involved, a machine-assisted approach ultimately allows us to maximise the efficiency and breadth of our analysis while also applying widely-use, robust qualitative research techniques.

Find more on the consultation here.

The report is available here.