Employee Flight Risk Intelligence

Employee Flight Risk Intelligence System

CAN North Financial loses an estimated 252 employees a year. At $78,000 per departure¹ that is $19.7 million leaving through the door — most of it invisible until someone hands in their notice. The CEO’s question was simple: tell me who is about to leave before they do. This is the system that answers it.

Demonstration project. All data is synthetically generated and does not represent real employees or organizations. CAN North Financial is a fictional company created for demonstration purposes. The model, methodology, and analytical framework are genuine and production-ready. Methodology and source code are proprietary.

¹ $78,000 replacement cost methodology: Calculated as 0.5× average annual salary at CAN North Financial ($107,991), reflecting direct costs — recruiting fees (15–20% of salary), onboarding, productivity loss during the 3–6 month ramp period, and manager time. Consistent with SHRM’s published range of 50–200% of annual salary for mid-level professional roles. Applied uniformly across all departures as a conservative floor estimate.

Employees in dataset

1,400

3 years of longitudinal data per employee

Model recall (test set)

98%

49 of 50 real quitters caught on held-out data

At-risk cost identified

$19.7M

252 departures × $78,000 replacement cost

Models raced

5

Logistic Regression, RF, SVM, Tree, KNN

Python scikit-learn Logistic Regression 5-Fold Cross Validation Streamlit Longitudinal Data Feature Engineering StandardScaler People Analytics RTO Analysis

What makes this project different

Longitudinal design

Direction of change, not just snapshot

An employee at 50 engagement today who was at 80 two years ago is more at risk than someone who has always been at 50. The trend is the signal.

Novel feature

RTO Risk Index

Commute shock — not commute distance — drives attrition. An employee forced from remote to in-person with a 60-minute commute added is fundamentally different from one who always commuted 60 minutes.

Disciplined selection

Model race, not model guess

Five models. Same data. Same metric. 5-fold cross validation. The data picks the winner. Logistic Regression won at 96.6% recall — proving that feature engineering outperforms algorithm complexity.

The headline finding: Employees pushed more into office left at 23.6%. Employees given flexibility left at 7.5%. A 3× gap driven by one policy decision. The data said so before any model ran.

The full pipeline

Annual

Generate Data

generate_data.py

Annual

Train Model

train_pipeline.py

Quarterly

Score Employees

score_pipeline.py

Quarterly

Diana Acts

app.py

↗ Open Live Demo

Section 01 — The Business Problem

Before any model — what is Diana trying to solve?

Every technical decision traces back to one business conversation. Getting the problem framing right is more important than any algorithm choice.

Organization

CAN North Financial

1,400 employees · Toronto, Calgary, Vancouver
Pension · Wealth Management · Retail Banking

The problem in numbers

18%

Annual attrition rate

252

Departures per year

$78K

Cost per departure

$19.7M

Annual cost

RTO finding — before any model

More office 23.6%

More flexible 7.5%

3× higher attrition from one policy decision. Found in exploration — before any model ran.

CAN North Financial is losing people it cannot afford to lose

Diana is the Chief People Officer at CAN North Financial. Her company loses roughly 18% of its workforce every year. Each departure costs an estimated $78,000 in recruiting, onboarding, and lost productivity. Across 252 annual departures that is $19.7 million leaving through the door — most of it invisible until someone hands in their notice.

The CEO's ask was direct: "Tell me who is about to leave before they do. I want to call them."

The strategic framing: This is a classification problem — predict quit (1) or stay (0). The metric that matters is not accuracy. It is recall. Missing a quitter costs $78,000. A false alarm costs one manager conversation. The model must be optimised to catch real quitters, not to look good on paper.

Why accuracy is the wrong metric

CAN North's dataset is 82% stayers and 18% leavers. A model that predicts "stays" for every single person scores 82% accuracy — and catches zero real quitters. This is the class imbalance trap. Every subsequent decision — model choice, threshold setting, evaluation — flows from understanding this.

The right metric

Recall — catching real quitters is Diana's priority

Of all employees who actually left — what percentage did the model identify in advance? A recall of 98% means the model caught 49 of 50 real quitters on data it had never seen. One slipped through. Ten unnecessary conversations were had. That is the trade-off Diana accepts.

Section 02 — The Data

Why longitudinal data — and why these specific features

The most important design decision was not which model to use. It was how to structure the data. A snapshot tells you where someone stands today. A longitudinal dataset tells you where they are going.

"An employee at 50 engagement today who was at 80 two years ago is more at risk than someone who has always been at 50. The direction of travel is the signal."

Three years. One employee. 68 columns.

The dataset covers 1,400 CAN North employees across 2023, 2024, and 2025. Each employee has one row with 68 columns covering engagement survey scores, performance ratings, compensation benchmarks, commute data, manager changes, and organizational disruption flags — all tracked year over year.

The target variable — left — indicates whether this employee left by end of 2025. 252 left (18%). 1,148 stayed (82%).

Why commute shock — not commute distance: An employee who always commuted 60 minutes has adapted. An employee whose commute went from 0 to 60 minutes because of an RTO mandate has not. The delta is the signal. This insight drove the design of the RTO Risk Index — a novel composite feature not found in standard people analytics literature.

All scores on a 0–100 scale

A deliberate design choice: every survey score, every index, every composite feature lives on the same 0–100 scale. When Diana presents to the board, there is no unit conversion. An engagement score of 34 means the same thing as an RTO risk score of 34.

── Dataset preview — first 3 employees ──────────

employee_id  division           role       location
CNF0001      Wealth Management  Analyst    Calgary
CNF0002      Retail Banking     Analyst    Vancouver
CNF0003      Wealth Management  Analyst    Vancouver

eng_2023  eng_2024  eng_2025  trend
67.2      58.4      51.1      −16.1
81.3      75.6      75.1       −6.2
68.4      61.2      52.1      −16.3

rto_index  persona_direction  left
37.1       More office        0
25.3       No change          0
8.7        More flexible      0

──────────────────────────────────────────────────
Shape:     1,400 rows × 68 columns
Period:    2023 → 2025
Attrition: 18.0% (252 of 1,400)
Missing:   0 — clean dataset

Engagement survey — longitudinal (0–100)

engagement_2023/24/25 satisfaction_2023/24/25 career_growth_2023/24/25 mgr_effectiveness_2023/24/25 wellbeing_2023/24/25

RTO and commute — the differentiator

persona_direction commute_time_change_min rto_risk_index transit_dependent commute_km

Engineered trend features (calculated)

engagement_trend satisfaction_trend career_growth_trend org_disruption_score manager_stability

Section 03 — Data Exploration

What the data told us before any model ran

The most actionable insights came from exploration — not modelling. Before a single line of model code, the data produced findings Diana could take to the CEO immediately.

30.5

Point satisfaction gap

Leavers: 28.4/100. Stayers: 58.8/100. Largest single-feature gap found.

3×

RTO attrition multiplier

More office: 23.6%. More flexible: 7.5%. One policy. Three times the loss.

−20.5

Point engagement drop

Leavers dropped 20.5 pts over 3 years. Stayers only 6.4. Trend beats snapshot.

12

Critical zone employees

Eng <40 + career <40 + RTO >60 = 100% historical attrition. Called immediately.

── Top signals — leavers vs stayers ─────────────

Signal                       Stayed   Left    Gap
Satisfaction 2025             58.8     28.4   +30.5
Engagement 2025               63.2     36.4   +26.8
Satisfaction trend (3yr)      -9.9    -29.8   +19.9
Engagement trend (3yr)        -6.3    -22.0   +15.7
Career growth 2025            55.2     42.8   +12.4
RTO risk index                27.9     44.4   -16.6
Org disruption score          30.7     43.2   -12.5
──────────────────────────────────────────────────
Salary vs market              49.5     47.0    +2.5
→ Salary barely registers. Not a pay problem.

Correlation with attrition (top 5):
engagement_2025        −0.558  Higher = STAY
satisfaction_2025      −0.548  Higher = STAY
engagement_trend       −0.502  Higher = STAY
satisfaction_trend     −0.400  Higher = STAY
rto_risk_index         +0.352  Higher = LEAVE

The finding that changed the conversation

Salary vs market correlated with attrition at −0.07. Nearly nothing. This is not a compensation problem. The data said so clearly before any model was built.

When Diana presented to the CEO, the first slide was not a model output. It was the RTO chart: More office at 23.6%, More flexible at 7.5%. No statistical literacy required. Just policy action.

The exploration finding that validated the entire design: Engagement trend correlated at −0.502. Current engagement score correlated at −0.558. A gap of only 0.056. The direction of change over three years is almost as powerful as where someone stands today. This confirmed that building longitudinal trend features was the right architectural choice.

What this means for Diana

Do not wait for the score to drop to 20. Act when it starts dropping.

An employee trending from 70 → 55 → 40 is more urgent than one who has always been at 40. The trajectory is the early warning. The model knows this.

Section 04 — Feature Engineering

The work that happened before the model

Feature engineering is where data science actually lives. The 98% recall result was not a function of algorithm choice. It was a function of what we fed the algorithm.

1

From 68 columns to 31 meaningful features

The raw dataset had 68 columns. Feeding all 68 creates noise — correlated features confuse the algorithm and dilute the signal. We selected 31 features based on two criteria: correlation with attrition above 0.15, and domain logic about what actually drives someone to leave.

Key decision: We kept current-year snapshots and engineered trends — but dropped the intermediate years. If we have engagement_2023 and engagement_2025, we do not need engagement_2024. The trend captures the history. Keeping all three gives correlated information three times over.

2

Engineering trend features — the longitudinal power

The most important step: calculating year-over-year trends. Not collected — calculated.

# Direction of change is more predictive than snapshot
engagement_trend   = engagement_2025   - engagement_2023
satisfaction_trend = satisfaction_2025 - satisfaction_2023
career_trend       = career_growth_2025 - career_growth_2023
absence_trend      = absence_days_2025 - absence_days_2023

# engagement_trend of -32 means this person dropped
# 32 points over 3 years — the model reads freefall

3

Building the RTO Risk Index — the novel composite

No single column captured the full return-to-office impact. We combined three signals.

RTO Risk Index =
  commute time added (0-100)     × 0.40
  persona change direction        × 0.30
  satisfaction drop since RTO     × 0.30

Persona direction scores:
  More office    → 80   (highest risk)
  No change      →  20   (baseline)
  More flexible  → 10   (protective factor)

Final correlation with attrition: +0.352

4

Encoding and scaling — making features compete fairly

Categorical columns — division, role level, work persona direction — were encoded as numbers. Then StandardScaler converted every feature to mean=0, std=1. Without this, tenure (0–25) would overpower engagement (0–100) purely because its numbers are larger.

Critical rule: The scaler is fitted on training data only — then applied to test data using the same parameters. Fitting on test data leaks future information into the scaling process and invalidates the evaluation. This rule is non-negotiable.

Section 05 — Model Selection

Five models. Same data. The data picks the winner.

Model selection is a controlled experiment. Same training data in. Same evaluation metric out. The only variable is the model. 5-fold cross validation removes luck from the evaluation.

The model race

5-Fold Cross Validation · Primary metric: Recall · 1,120 training employees

Logistic Regression

96.6%

Winner

SVM

91.1%

—

Random Forest

82.2%

—

Decision Tree

79.2%

—

KNN

39.5%

—

Model	CV Recall	AUC	Std Dev
Logistic Regression ✓	96.6%	0.997	0.033
SVM	91.1%	0.991	0.039
Random Forest	82.2%	0.976	0.046
Decision Tree	79.2%	0.853	0.064
KNN	39.5%	0.938	0.075

── Fold-by-fold recall (consistency check) ──────

Model               F1      F2      F3      F4      F5
Logistic Regression 97.5%  100.0%  100.0%   92.7%   92.7%
SVM                 95.0%   95.0%   92.5%   87.8%   85.4%
Random Forest       85.0%   87.5%   85.0%   75.6%   78.0%
Decision Tree       85.0%   85.0%   70.0%   73.2%   82.9%
KNN                 35.0%   27.5%   45.0%   48.8%   41.5%

Low std = consistent = trustworthy for production use

Why the simplest model won

The most sophisticated model did not win. The simplest one did. Random Forest — which builds 100 decision trees and takes a vote — came third at 82.2% recall. Logistic Regression, which draws a single straight line, hit 96.6%.

This is the most important finding of the model race. It tells us that feature engineering created clean, linearly separable signals. When features are well built, a simple model beats a complex one. Complexity was unnecessary.

Why Random Forest underperformed

Random Forest's strength is finding complex non-linear patterns when features are messy. But engagement trend, satisfaction trend, and the RTO risk index are clean, powerful, linear signals. The forest added noise, not value. Feature engineering was the higher-leverage decision.

Why KNN failed completely

KNN failed because of the curse of dimensionality. In 31-dimensional feature space, every employee appears roughly equidistant from every other. Distance calculations lose meaning. KNN needs very few features or very large data to work in high dimensions. We had neither.

Consistency is reliability

Logistic Regression's fold scores: 97.5%, 100%, 100%, 92.7%, 92.7%

Standard deviation of 0.033 — lowest of all five models. Diana needs a model that performs reliably every quarter, not one that is sometimes great and sometimes mediocre. Consistency is reliability. Reliability is trust.

Section 06 — Model Results

What the model learned — and what it means for Diana

The held-out test set is the only honest number. 280 employees the model has never seen. Training scores do not matter. Cross-validation averages do not matter. This is the real-world result.

── Final test results ───────────────────────────
Winner: Logistic Regression
Test set: 280 held-out employees

Recall:    98.0%
Precision: 83.1%
F1 Score:  0.899
AUC:       0.999

Confusion Matrix:
              Pred Stay  Pred Quit
Actually Stayed   220         10
Actually Left       1         49

In plain English:
→ 49 quitters CAUGHT — Diana intervenes
→  1 quitter  MISSED — walked out
→ 10 false alarms — extra conversations

Business impact ($78k per departure):
→ Potential saves: $3,822,000
→ Missed cost:     $78,000
→ False alarms:    10 conversations

Intervention ROI

$3.0M

Cost if Q2 at-risk leave

38×

Return on intervention

What the model learned — coefficients

Logistic Regression produces one coefficient per feature. Negative = higher value pushes toward staying. Positive = higher value pushes toward leaving. These are what Diana shows the CEO to explain the model.

satisfaction_2025

−4.00

engagement_2025

−3.68

career_growth_2025

−3.35

engagement_trend

−1.68

org_disruption_score

+1.58

rto_risk_index

+1.00

Green bars = push toward staying · Red bars = push toward leaving

Individual explanation — CNF0011: Toronto · Pension Administration · Analyst. Score: 100%. Engagement dropped from 59 to 15 over 3 years (−44 points). RTO risk: 65/100. The engagement freefall alone contributed +8.6 to the model score. This person was not going to stay.

Scoring 200 current employees — Q2 2026

Critical

36

Immediate action

High

3

This quarter

Medium

5

Monitor monthly

Stable

156

No action needed

#  ID       Division     Role      Loc     Score  Action
CNF1401  Pension      Manager   Calgary  100%   Urgent 1:1 HR BP
CNF1408  Wealth       Sr Anlst  Calgary  100%   Flexibility + career
CNF1413  Wealth       Director  Toronto  100%   Flexibility + career
CNF1460  Wealth       Manager   Toronto  100%   Comp review
CNF1446  Wealth       VP        Vanc.    100%   Flexibility + career

at risk · Est. cost if all leave: $3,042,000

Section 07 — Deployment

From model to tool Diana actually uses

A model living in a notebook is not a product. A tool Diana opens on Monday morning, uploads her quarterly data, and hands a ranked list to HR business partners — that is a product.

The production pipeline

Three scripts. Three jobs. The data scientist's work is entirely separate from the HR team's work. Diana never sees the training code. The data scientist never touches Diana's quarterly upload.

Annual

Generate Data

generate_data.py

Annual

Train Model

train_pipeline.py

Quarterly

Score Employees

score_pipeline.py

Quarterly

Diana Acts

app.py

Mode 1 — Annual

Training

Data scientist runs train_pipeline.py on confirmed historical data. Three files saved: model.pkl, scaler.pkl, features.pkl. Model is frozen.

Mode 2 — Quarterly

Scoring

Diana's team uploads fresh employee data. Frozen model scores everyone. Returns the ranked list. No retraining. No data scientist needed.

Data privacy and proprietary methodology

This demonstration runs on synthetic data only. The training pipeline, feature engineering logic, and source code are not publicly available. The live demo shows the methodology in action. Deployment for a real organization requires a separate engagement to retrain on actual HRIS data within the client's environment. Your data never leaves your infrastructure.

Try the demo: Download the sample dataset from the app. Open it in Excel. Change engagement scores, RTO risk, or satisfaction trends. Upload your modified version and watch the predictions change in real time. The model is responding to your inputs.

↗ Open Live Demo

can-north-flight-risk.streamlit.app

🏦 CAN North Financial

Employee Flight Risk Intelligence System

DEMONSTRATION VERSION — Synthetic data. Download the sample dataset, tweak values in Excel, upload and see predictions change in real time.

✓ Model loaded — ready to score employees

36

Critical

3

High

5

Medium

156

Stable

39 employees at risk — estimated cost if all leave: $3,042,000

↗ Open Live Demo