Back to Blog

Finding Your Twin: How Propensity Score Matching (PSM) Works

By Cedric Mwesigwa · 5/12/2024
An abstract visualization of data points being matched and filtered.

One of the biggest challenges in program evaluation is creating a fair comparison. If you simply compare people who participated in your program to those who didn't, you might get misleading results. The two groups could be different in many ways. Propensity Score Matching (PSM) is a statistical technique that helps solve this problem.

The Problem: Apples and Oranges

Imagine you run a job training program. The participants who sign up might be more motivated or have more work experience than those who don't. If you just compare the employment rates of the two groups, you won't know if a better outcome is due to your training or their pre-existing motivation. You're comparing apples and oranges.

The Solution: Creating a Statistical "Twin"

PSM tries to find a "statistical twin" for each participant in your program from a larger group of non-participants. It works in two steps:

  1. Calculate the Propensity Score: First, we use a statistical model (usually a logistic regression) to calculate the "propensity score" for every single person in both the treatment and non-treatment groups. This score is the estimated probability (from 0 to 1) of a person participating in the program, based on their observable characteristics like age, education level, location, etc.
  2. Match the Scores: For each person in the treatment group, we then find a person in the non-treatment group who has a very similar propensity score. This creates a new, smaller control group that is, on average, very similar to the treatment group across all the characteristics we measured. We've effectively created a group of statistical twins.
An abstract visualization of data points being matched based on similar characteristics.

Measuring the Impact

Once we have our matched groups, the comparison is straightforward. We can now compare the average outcome (e.g., employment rate) of the treatment group to the average outcome of the newly created, matched control group. Since the groups are now balanced on their observable characteristics, any difference in outcomes can be more confidently attributed to the program itself.

A balanced dataset represented by two identical groups of icons, one for treatment and one for control.

PSM is a powerful tool, but it has a key limitation: it can only account for characteristics that you can observe and measure. It can't account for unobservable differences, like motivation. For that reason, it's often considered less rigorous than an RCT or a strong quasi-experimental design. However, when those methods aren't an option, PSM is an invaluable technique for getting a more accurate estimate of your program's impact from existing data.