Jenny Aker on Rigor for the Rest of Us

 

In her post of June 21st, Kim highlighted the (sometimes) complex world of impact evaluations and the debate over using randomized controlled trials (RCTs) as a way to conduct such evaluations.  She concluded by giving us three options:  to abandon RCTs, to use them (if we have the time and money) or to incorporate their principles into “less expensive” forms of evaluation.

Yet the focus on RCTs is somewhat of a red herring.  Those who advocate RCTs aren’t advocating for randomizationper se – they are (usually) advocating for impact evaluations of development programs (or evaluations that measure the change in a development outcome that can be attributed to the specific intervention or program).  So why do we spend so much time talking about RCTs?

RCTs are often at the center of the debate on impact evaluations for a simple reason:  they can be a potentially powerful tool for measuring program impact.  Why?  Quite simply, they minimize bias – in other words, by using chance to select participants and non-participants, they increase the likelihood that program participants are as similar as possible to non-participants.  This means that, if we observe differences in outcomes between the two groups, then it is (probably) due to the program, and not to something else (which is the point of impact evaluations).  Yet RCTs are one tool among many for measuring impact, and they aren’t always feasible or appropriate.  

What do you do if you want to conduct an impact evaluation, but you can’t or don’t want to randomize?  There are plenty of options.  Here are a few key principles for those interested in impact evaluations – many of which NGOs are probably doing already.

  • Principle #1:  Collect data on both program participants and non-participants before and after the program. 

Suppose your organization collects data on program participants’ corn yields before and after an agricultural program that sought to increase yields by 20 percent.  Corn yields were 100 kg/ha before the program, but dropped to 75 kg/ha after the program.  Did the program fail?  Maybe, maybe not.  Maybe there was a drought during this period, and participants would have been even worse off without the program.  The point is, we don’t know, because we didn’t observe what happened to non-participants. 

Now suppose you collect data on corn yields for participants and non-participants after the program, and find that yields are higher for participants.  Did the program succeed? Maybe, maybe not. It’s possible that the participant farmers were the most motivated or the richest – and so the higher yields among participants are due to those factors and not to the program. We don’t know where each group started, so we don’t know if the participant farmers were better to start off with.

By collecting data on participants and non-participants before and after the program, we can control for two important issues in impact evaluations:  1) different starting points (levels) for each group; and 2) general trends over time (which tell us what might have happened without the program), which are captured by information from the comparison group.  

  •  Principle #2:  Select the program and non-program villages *before* the baseline. 

Seems simple, right? If you want to follow program participants and non-participants over time, we need to know who the participants are.  In practice, though, it isn’t so simple.  Sometimes NGOs want to do a baseline first to decide who to target.  Or, perhaps the NGO will offer the program to beneficiaries, but can’t be sure that someone will accept the offer (a common issue in microfinance or savings programs).  In these cases, try to identify the treatment group at a “higher” geographic level first – such as the village or neighborhood – and collect data from individuals or households within participating and non-participating villages.

  • Principle #3:  Use clear-cut targeting criteria to choose the program participants.

At first glance, this principle seems to contradict the whole point of RCTs – where we randomly assign villages, households or individuals to treatment and comparison groups, increasing the likelihood that the two groups will be as similar as possible before the program. 

In the absence of a RCT, how can these criteria help us?  Suppose that your organization decides to offer savings accounts to individuals with a per capita income below USD$50.  This means that if an individual earns USD$50 or less, they are a program participant – but if he/she earns USD$51 or above, they aren’t.  But how different is someone with $51 (non-participant) as compared with someone with $49 (participant)?  Not too much.  From an evaluation perspective, we could potentially compare those individuals right below the threshold (the treatment group) with those right above the threshold (the comparison group), assuming that they aren’t too different.

  • Principle #4.  Collect data on the what, how and why.  

One of the main criticisms levied against impact evaluations – and RCTs in particular – is that they provide us with the “what” (did the program have an impact?) but not the why (if it did have an impact, through what channels?).  Yet there is nothing inherent in impact evaluations that prevents us from learning about the channels of impact or from using qualitative techniques.  At the end of the day, impact evaluations should tell not only tell us whether the program worked, but also why it worked (or didn’t). 

Suppose you want to pilot a new savings group model in Mali where group members receive SMS reminders to save, as compared with groups that don’t receive reminders.  You think that those groups that receive reminders will remember to save and save more, hopefully allowing them to invest or build their assets.  So we would like to collect data on household (or individual) investments and assets (the “what”), as well as their savings and whether they used the SMS reminders (why). We could also ask individuals whether they like the reminders, or why they were unable to save.  Combining data on outcomes from multiple levels, as well as a combination of qualitative and quantitative techniques, can help us to better understand the impact of the pilot program and why.

  • Principle #5. Share successes and failures. 

It’s human nature: We want to share our successes and perhaps hide our failures.  But by only sharing our success stories (programs that worked) and hiding our failures, we are losing an opportunity to learn.  At best, this means that another NGO repeats the same program somewhere else, wasting and resources.  At worst, this “waste” prevents scarce resources from being used in another context or program that deserved it more, or encourages clients or poor households to waste their scarce time or resources on something that doesn’t work.

Bottom line:  If we’re going to do impact evaluations, we all need to do a better job of sharing our results – with clients, communities, NGOs, donors and governments, successes and failures. Of course this might be easier said than done – but it should be a principle nonetheless. 

Jenny Aker is an Assistant Professor of Development Economics at the Fletcher School, Tufts University. She was previously Deputy Regional Director, Programming, for CRS in West Africa where she oversaw CRS’s microfinance programming.

4 COMMENTS |  EMAIL ARTICLE |  PRINT ARTICLE |  PERMALINK

View Printer Friendly Version

Email Article to Friend

Reader Comments (4) 

Thanks for that. I love the idea of sending SMS reminders to save. (I know that wasn't the point of your post, but what a cool idea!)

Many years ago - it seems like another lifetime - Beth Rhyne, when she was with USAID, said something like, "We don't think it is incumbent on every MFI to prove the positive impact of microcredit." I thought, "Great! Now we can just get out there and lend!" 

The problem was, NOONE really proved the positive impact of microcredit. There WERE lots of studies - I worked on a couple myself - but they were flawed in various ways. I don't think that the present round of RCTs will answer all questions, and I am aware of a lot of their potential shortcomings - but I am very glad to see the standards of proof getting so much higher.

Better (I think) one very good study than two pretty good studies, or four not-too-bad studies, or eight quick-and-dirty studies, and so on.

Tue, June 28, 2011 | Paul Rippey

 editremove 

Dear Jenny -

This is a very thoughtful post. I am always struck by how even simple principles are not followed. I would underscore principle #1 and expand by this: Do the baseline before starting implementation. Seems duh but I often see programs where the baseline is done well into the program, hardly a baseline, which then leaves the evaluators in the difficult position of pretending the mid-term evaluation is a baseline.

Thu, June 30, 2011 | Kim Wilson (Orientrow@yahoo.com)

 editremove 

Great summary - I think its really important to emphasize that RCTs and qualitative methods are not mutually exclusive. I would love to see more links to RCT studies which have a strong qualitative component

Fri, July 8, 2011 | Helen Lindley (helen.lindley@gmail.com)

 editremove 

I like the last prinicple of sharing success and failures? may be the next question in the same spirit of seeking answers to phenomena, is getting an answer to why there is tendency to hide the learning eperiences by players. what incentives exist to organizations that value learning from experiences in project implementation?

Wed, July 13, 2011 | wj

 

Gambling to Save: Tales from Latin America, Part One

Gambling to Save: Tales from Latin America, Part One

Rigor for the Rest of Us

Rigor for the Rest of Us