All that glitters is not gold - 8 ways behaviour change can fail

Failed interventions are more common than most people think yet the behaviour change movement has largely focused on successes - understandably, because talking about failures might undermine the credibility of our field and profession. Yet failing forward is essential if we want to make progress, so this article takes a look at how behaviour change can go wrong.

Although some attention has been given to nudging failures, a systematic evaluation has been missing. Like many aspects of behavioural science, the incentives typically lie in discovering New Shiny Things that boost the researcher's careers and to a lesser extent in conducting research reviews which will never form the basis of accessible, entertaining pop science books or a block-busting TED talk.

Maybe it's my pessimistic Nordic nature, but this kind of skeptical pre-mortem thinking really appeals to me which is why I was fascinated by this 2020 paper by Magda Osman and her colleagues titled 'Learning from behavioural changes that fail' that aims to identify the characteristics of failed interventions.

This kind of taxonomy helps to put behavioural change on a more robust theoretical foundation as well as being hugely useful for practitioners because it can help us analyse behaviour change interventions and map out factors that might impact their effectiveness.

Why we need a structured approach

Despite what many people think, "nudge" (or nudging) is not a specific framework or even one cohesive theory. It's better seen as a collection of techniques and approaches with shared characteristics for designing choice environments in a way that changes behaviour - think of a power tool with different bits for different jobs!

As such, it doesn't help us understand more detailed dynamics of nudges - for that, we need something more structured. With a causal explanatory approach we could construct scenarios of potential outcomes and determine what features of an intervention could influence them.

The paper suggests these helpful questions to ask when planning an intervention:

What factors could be causally relevant to the success of the intervention?
How could the intervention influence these factors?
What precautionary measures should be taken to avoid failure?

Skipping this kind of pre-mortem analysis risks trialling interventions that may not scale or preserve the effect over time because if you don't fully understand the underlying mechanisms that might compete or undo the successful behaviour change.

Part of the promise of nudging is its low cost, which sometimes leads (mostly commercial sector) practitioners to think it doesn't really matter much if we don't really know what barriers we're trying to solve or whether we have a lot of certainty on if a particular 'nudge' will work. In reality, however, failures don't come cheap - there is always at least a time investment and the opportunity cost of doing something else that might have been more effective.

How to fail fabulously with nudging

The goal of the reviewed article was to:

highlight that reports of failure and backfiring are common in the literature
identify characteristic regularities and causal pathways underlying the failures
present a taxonomy derived from the commonalities

The taxonomy is based on an analysis of 65 studies of field trials and experiments across a range of domains. The failures documented included e.g. the predicted outcome not happening or generating the opposite outcome as well as generating unintended side effects.

Before we dive in, here is a quick summary of the proposed taxonomy of behaviour change failures:

No effect
Backfiring
Intervention is effective but it's offset by a negative side effect
Intervention isn't effective but there's a positive side effect
A proxy measure changes but not the ultimate target behaviour
Successful treatment effect offset by later (bad) behaviour
Environment doesn't support the desired behaviour change
Intervention triggers counteracting forces

The images below illustrate the causal models of behaviour change failures with three basic elements:

nodes (domain variables with two or more states)
arrows (probabilistic causal relationships between variables)
probabilities (not shown here - see original paper)

The original paper's round nodes have been reproduced here as head icons with additional imagery to make it easier to grasp the types quickly.

1. No effect

The most basic fail is simply that there is no treatment effect - i.e. no change in behaviour whatsoever but also no harm done (except wasted effort).

Examples:

A social comparison nudge to reduce water consumption might fail overall and even lead certain subgroups to increase their water consumption.
Using financial incentives to increase people’s physical activity might fail to change behaviour.

2. Backfiring

Next up, interventions can backfire when they change the target behaviour but in the opposite direction to what was intended.

Examples:

Providing information about the negative consequences of unhealthy food can result in increased consumption of those foods by some people which an example of reactance.
Using social norms can also easily backfire - using descriptive norms that communicate typical behaviours instead of injunctive norms that communicate (dis)approved behaviours can accidentally increase undesirable behaviours.
For example, news stories of people breaking COVID-19 regulations might be good TV, but it also unintentionally showcases and normalises undesirable behaviour!

3. Intervention is effective but it's counterbalanced by a negative side effect

The third kind of failure is one where the intervention is successful, but the benefits are offset by an unintended negative consequence that largely negates the positive change (e.g. through compensatory choices).

Examples:

An environmental campaign might reduce water consumption but increase electricity consumption
A green energy default nudge can decrease support for more comprehensive but also more cumbersome policies like a carbon tax
Information on calories increases the choice of healthier options but the overall benefit is diminished by higher calorific sides and drinks

4. Intervention isn't effective but there's a positive side effect

Sometimes the intervention doesn't change the target behaviour, but produces unexpected positive consequences. Assessing these positive side effects is important because it's often assumed that changing specific behaviours will generalise to other behaviours.

For example:

Countries with different default policies for organ donation (opt-in vs opt-outsystem) show no differences in overall transplant rates but a more fine-grained analysis of live and deceased donor rates reveals that opt-out countries have a higher number of deceased donors and a lower number of living donors. This is a positive side effect because fewer live donors are subject to immediate risk of harm from organ harvesting.

5. A proxy measure changes but not the ultimate target behaviour

When trying to influence behaviour for a large population, true behavioural data is sometimes difficult to obtain. In those cases, we might need to settle for using proxy measures - behavioural changes that are pragmatic substitutes for the target behaviour. The problem is that a change in the proxy isn't always a reliable indicator of success.

Examples:

Becoming a potential organ donor is a proxy for actual organs donated
Providing information may increase healthy food choices in a simulated supermarket but have no long-term impact on body mass index and lifestyle

6. Intervention is successful but offset by later (bad) behaviour

Sometimes the intervention is successful in changing the desired behaviour, but some kind of undesirable behaviour cancels out the effect. A common psychological process that needs to be considered is moral licensing - when a virtuous decision leads to indulgent behaviour later on.

Examples:

Taking dietary supplements can increase their consumption but reduce people’s desire to engage in physical exercise
Reminders sent by charitable organisations to potential donors might increase donations in the short term, but result in more people unsubscribing from mailing lists which lead to a decrease in donations in the long-term

7. Environment doesn't support the desired behaviour change

Real life is more complex than an experimental set up which means sometimes it becomes clear with hindsight that the environment the behaviour is embedded in is critical for the change but doesn't really support it.

Examples:

Introducing bike-sharing systems to increase cycling and reduce traffic congestions may fail if users are mostly concerned with road safety
Similarly, introducing an opt-out system for organ donations would have little impact if the necessary medical infrastructure isn't in place

8. Intervention triggers counteracting forces

Sometimes the environment isn't just lacking in support for the behaviour, but that positive effects are counteracted by forces in the broader environment.

Examples:

Consumers may struggle to reduce their consumption of unhealthy sugary drinks in response to regulators’ methods to limit choice (e.g., sugar tax) because drinks companies challenge the regulation
Banks might introduce defaults to limit access to overdrafts of bank accounts, but also include structures that enable the defaults to be overridden easily

Understanding what works, for whom

What struck me as interesting was that one of the most common interventions that results in failures are social norming or social comparisons. The authors suggest that one reason for it might be that they are cheap, simple to implement and as such highly scalable - in other words, more appealing than other kinds of interventions.

Another possibility is that social norms are not all equal, and that in large-scale field studies have found that subgroups often respond differently to social norming messages. For example, the effectiveness of a social comparison nudge based on a household's energy consumption relative to others can depend on their overall consumption but also political ideology!

This was a recurring theme across many studies and types of failures: subgroups matter. Depending on the situation they can result in the intervention not working at all, backfiring or resulting in unexpected side effects. As obvious as this might seem, it's not been a big part of the discourse in behaviour change or behavioural insights so far, and in some cases this issue has even been somewhat brushed under the carpet because it doesn't quite fit the initial promise of small changes resulting in big effects that nudging was saddled with.

Where do we go from here?

It feels like we are on the verge of a new, more honest era for nudging and behaviour change after a decade of unbridled optimism and enthusiastic marketing.

The first step is to take a sober look at how common intervention failures truly are to ascertain whether there is a publication bias. I was reminded of David Halpern's remark at the BX2019 conference that we should expect an 80% failure rate and how we should talk about that more as well as relating the failures back to the literature to gain a deeper understanding. The two evidence reviews summarised in a previous post certainly hint at some kind of file-drawer problem!

One way forward could be to create some sort of database that documents interventions in detail and scores them on scales of success/failure - this might give us a more comprehensive understanding of the behavioural outcomes we can expect.

There are also other questions I think we should explore.

First, we should look at the common cognitive and behavioural characteristics of failed interventions - what can we relate back to cognitive and social theories of behaviour? Second, how can we best advance the theoretical and methodological foundations of behaviour change research?

Most of the commonly used behaviour change "frameworks" (e.g. EAST, MINDSPACE, BASIC) lack a causal model - with the exception of the increasingly popular Behaviour Change Wheel. Its COM-B model is a causal explanatory approach to behaviour and even if it doesn't support more formal causal analysis (as the authors suggest), in my opinion it's still the most holistic approach we currently have to diagnosing the behavioural challenge and giving some guidance to which way the solutions might lie.

If you want to read previous articles in this series:

What kinds of studies were included in the analysis?

Of the 65 studies that were compiled, 58% included a field experiment, and 75% of all studies also included a control or baseline condition to compare the behavioural interventions against.Common domains in which the interventions were trialled included charitable donations(13%), tax compliance (8%), health (diet or exercise; 25%), and pro-environmental behaviour(28%).
The 65 studies utilised several types of interventions, notably defaults(15%), social comparisons and social norming (40%), labelling (12%), and provision of information delivered through letters or text messaging (24%)
More details at: https://psyarxiv.com/ae756

You can find more details and references in:

Osman, M., McLachlan, S., Fenton, N., Neil, M., Löfstedt, R., & Meder, B. (2020). Learning from behavioural changes that fail. Trends in Cognitive Sciences.