Evaluation Framework

Kirkpatrick's four levels of evaluation, and how to actually collect the data

Everyone knows the model. Most organizations still only measure Level 1. The problem isn't awareness of the framework. It's that collecting Level 2 and Level 3 data has always been impractical. It doesn't have to be.

The four levels, briefly

Kirkpatrick's model has been the standard framework for training evaluation since the 1950s. You've seen it before. Here's the short version.

L1

Reaction

Did participants find the program useful, engaging, and relevant? This is your post-program satisfaction survey.

L2

Learning

Did participants gain the intended knowledge, skills, or attitudes? Measures what changed in their heads.

L3

Behavior

Are participants applying what they learned back on the job? Measures transfer from classroom to workplace.

L4

Results

Did the program produce business outcomes? Revenue, retention, productivity. The hardest to attribute directly.

The gap between knowing the model and doing it

ATD's State of the Industry report consistently shows the same pattern: over 90% of organizations measure Level 1. Fewer than 40% measure Level 2. Level 3 drops to under 20%.

It's not that people don't value deeper evaluation. The problem is logistics. Level 1 is easy: send a survey after the program, ask if they liked it. Level 2 and 3 traditionally require pre-post testing, follow-up surveys weeks later, matched responses, and enough statistical rigor to be credible. For a consultancy running 20 programs a year, that's a significant operational burden.

So most organizations default to what's easy. They collect satisfaction data, put it in a report, and move on. The Kirkpatrick model stays in the slide deck but not in the data.

The real barrier isn't methodology. It's tooling.

L&D professionals know what good evaluation looks like. They've read the books, attended the sessions. What they lack is a practical way to collect the right data without doubling their survey administration workload.

How ImpactCheck maps to each level

ImpactCheck isn't a full evaluation management system. It's a focused survey tool that makes Level 1-3 data collection practical for consultancies and L&D teams.

L1

Reaction

Standard post-program surveys with scale questions and open-ended feedback. Rate the facilitator, the content, the relevance. This is the baseline most teams already do.

Survey type: Standard post-program

L2

Learning

Scale questions that measure perceived change in knowledge and skills. Using retrospective (post-then-pre) surveys, participants rate their skill level before and after the program in a single sitting. The shift score tells you what they learned.

Survey type: Retrospective

L3

Behavior

Retrospective surveys sent weeks after the program, asking participants to rate behavior change. "I apply active listening techniques in one-on-ones" measured before vs. now. The same mechanism as Level 2, but focused on workplace application rather than knowledge.

Survey type: Retrospective (delayed administration)

L4

Results

ImpactCheck doesn't measure business results directly. That requires organizational data (retention rates, revenue figures, performance metrics) that sits outside a survey tool. But strong Level 2 and 3 data builds the case. If participants demonstrate measurable learning and behavior change, you have a credible argument that business results will follow.

Supported indirectly through Level 2-3 evidence

Why retrospective surveys solve the Level 2-3 problem

The traditional approach to Level 2 evaluation is a pre-test/post-test design. Survey before the program, survey after, compare scores. It's sound in theory but breaks down in practice: you need two survey rounds, matched responses, and you lose participants to attrition between rounds.

Retrospective surveys collapse that into one survey, administered after the program. Participants rate both "before" and "now" at the same time. Research shows this approach is more accurate for self-report measures because it eliminates response-shift bias, where participants recalibrate what "good" means after a learning experience.

The practical effect: you can collect Level 2 and Level 3 data with the same operational effort as a Level 1 satisfaction survey. One survey link, one administration, one set of results.

Read the full methodology behind retrospective surveys, including the research on response-shift bias and when to use each survey type.

Sample evaluation questions by level

Here's what a multi-level evaluation looks like in practice, using a leadership development program as an example.

L1

Reaction

Standard post-program survey

"The program content was relevant to my role"

1 2 3 4 5 Strongly agree

"I would recommend this program to a colleague"

1 2 3 4 5 Strongly agree
L2

Learning

Retrospective survey

"I understand how to adapt my leadership style to different situations"

Before
1 2 3 4 5
Now
1 2 3 4 5

"I can identify the key drivers of team psychological safety"

Before
1 2 3 4 5
Now
1 2 3 4 5
L3

Behavior

Retrospective survey (sent 4-8 weeks later)

"I regularly give constructive feedback to my direct reports"

Before
1 2 3 4 5
Now
1 2 3 4 5

"I use coaching conversations instead of directive management in my team meetings"

Before
1 2 3 4 5
Now
1 2 3 4 5

Start measuring beyond Level 1

ImpactCheck makes Level 2 and Level 3 evaluation as easy as sending a satisfaction survey. Retrospective mode is built in. Your clients get real impact data, not just happy sheets.

Create your free account

Free to use. No credit card required.