Pay Equity Intelligence System
A female Director at CAN North earns $11,636 less than her male equivalent. Same role. Same grade. She does not know. Her manager does not know. The Pay Equity Commissioner will eventually know — and the organization will be accountable. This system finds the gap, measures it precisely, and produces a legally defensible plan to close it before the Commissioner arrives.
Demonstration only. CAN North Financial is a fictional organization. All employee data is synthetically generated — no real people, no real salaries, no data retained. Legislation references are real and sourced directly from official government websites as cited throughout.

What this system does

Pay Equity Intelligence identifies, measures, and remediates gender-based pay gaps in a legally defensible, audit-ready format. It separates the gap that can be explained by legitimate factors — role, tenure, performance — from the gap that cannot. The unexplained portion is the legal exposure.

The system produces a formal pay equity report, a phased remediation plan, and an AI agent that answers questions against both the analysis results and approved legislation. It is the tool Diana Chen, Head of People Analytics at CAN North Financial, takes to the Pay Equity Commissioner.

"Pay equity analysis is not about whether you intend to discriminate. It is about whether the numbers show that you do. This system makes the numbers impossible to ignore — and gives you a plan to fix them."
Overall Gap
$5,807
5.2% of male average pay · statistically significant
Employees Requiring Adjustment
234
Total remediation cost: $1,684,100
Legal minimum (1% payroll): $1,513,371
Net business benefit: $4,616,200

Where this fits

Pillar 1 — Pay Equity ✓

This project. Measures the unexplained gender gap. Produces the remediation plan. Agent 1 answers questions under the Act.

Pillar 2 — Job Evaluation

Evaluates the intrinsic worth of each job using O*NET point factor data. Agent 2 answers: is this role graded correctly?

In Development

Pillar 3 — Market Benchmarking

Connects job evaluation to ESDC wage data via NOC codes. Agent 3 answers: are we paying competitively?

In Development

How it connects to Flight Risk

This is the same 1,400-employee universe as the Flight Risk Intelligence system — same CAN North Financial, same Diana Chen, same divisions. The two platforms are designed to be read together. Flight Risk shows who will leave. Pay Equity shows why pay is unfair. The Master Orchestrator Agent will eventually connect both.

pandas scipy OLS regression Oaxaca-Blinder decomposition linear programming Anthropic Claude API Streamlit Pay Equity Act S.C. 2018 Ontario Pay Equity Act R.S.O. 1990
The problem Diana cannot ignore
Pay equity is not optional. It is a legal obligation with a named regulator, a posted notice requirement, and enforcement powers. The question is not whether to comply — it is whether you discover the problem before the Commissioner does.

What the law requires

Who it covers

All federally regulated employers with 10 or more employees. CAN North Financial, as a financial services firm, falls under federal jurisdiction.

What it requires

Employers must establish a pay equity plan, identify gaps, and post a notice of the plan. For employers with 100+ employees, remediation may be phased over a maximum of three years. Each year, the minimum spend is 1% of the previous year's total payroll.

Who it covers

All Ontario employers with 10 or more employees, including public and private sector. Ontario's regime was one of the first in North America and remains one of the most detailed.

What it requires

Employers must achieve and maintain pay equity between job classes where female-dominated classes are compared to male-dominated classes. Section 13(4) contains the same 1% minimum annual spend rule as the federal Act.

The business case — why this is an investment, not a cost

Legal Risk Avoided
The model estimates legal exposure at 3× the remediation cost — Commissioner enforcement orders, financial penalties, and reputational damage.
Retention Value
35%
Senior women (Director/VP) currently underpaid. Closing gaps reduces senior female attrition by an estimated 35%. Replacing a VP costs $78,000.
Net Benefit
$4.6M
Total benefit of retention value plus legal risk avoided, net of the $1.68M remediation cost.

Why this is hard without the right tool

The raw gap is easy to calculate. The difficult question is: how much of that gap is explained by legitimate factors — role level, tenure, performance, location — and how much is not? The unexplained portion is what the law requires you to fix. Without a regression model, you cannot separate the two.

Most organizations stop at the raw gap. They look at average male salary versus average female salary and conclude the gap is explained by the fact that more men are in senior roles. This is incomplete. Pay equity law requires comparison within comparable job classes, controlling for legitimate factors. The within-class, controlled gap is the legally actionable number.

The data model — built for legal defensibility
1,400 CAN North Financial employees · synthetic data · 10 features · gender gap deliberately embedded for detection validation

Why synthetic data

Real salary data is among the most sensitive personal data an organization holds. For a demonstration portfolio, synthetic data is the only responsible choice. The data is mathematically generated to mirror the structure of real salary data at a Canadian financial services firm — right-skewed distribution, role-driven compensation hierarchy, division and location premiums — with a deliberately embedded gender gap to validate that the analysis can find what it is designed to find.

A model that cannot detect a gap we deliberately planted is not trustworthy. The embedded gap is the test. Detection proves the methodology works.

Data structure

ColumnTypeNotes
employee_idStringMasked — CNF0001 format. Never a model feature.
genderCategoricalMale / Female. Excluded from the salary model by design.
job_classStringHuman-readable title. e.g. Wealth Analyst, VP Pension.
job_gradeOrdinalGrade 3 to Grade 7. Primary grouping for analysis.
role_levelCategoricalAnalyst / Sr Analyst / Manager / Director / VP
role_level_encodedInteger1–5. Ordinal encoding preserving hierarchy.
divisionCategoricalPension Admin / Wealth Mgmt / Retail Banking
locationCategoricalToronto / Calgary / Vancouver
tenure_yearsFloat0–25. More tenure = higher pay, diminishing returns.
performanceInteger0–100. Centered at 70 average; low variance.
salaryIntegerAnnual base CAD. Target variable for the model.
Base Gender Gap
−$6,500
Applied to all female employees regardless of role. This is the baseline discriminatory component.
Senior Grade Premium Gap (Grade 6–7)
−$4,000
Additional gap at senior grades. Reflects real-world pattern where gaps widen at higher levels.
Why this structure? Real pay gaps are not uniform. They are larger at senior levels and not explained by role alone. The synthetic structure mimics this pattern so the analysis must work harder to find it — exactly as it would in a real audit.

Salary drivers — the three tiers

Tier 1 — Dominant drivers

Role level (Analyst vs VP is a 3× salary difference). Division (Wealth Management pays more than Retail Banking). These two factors explain most of the variation.

Tier 2 — Moderate drivers

Tenure (more years = higher pay, but diminishing returns). Performance (weak effect — scores cluster around 70, low variance). Location (Toronto premium).

Tier 3 — Should NOT drive salary

Gender. Zero legitimate influence on pay. Any measurable effect after controlling for Tier 1 and 2 is the unexplained gap. That is the equity problem.

What the analysis found
Exploratory analysis confirmed the gap exists across every role level and division. Then the regression model measured how much of it is unexplained.

The raw gap — by role level

The raw gap is the simple difference in average salaries. It exists at every level — which matters. A gap only at VP level could be explained by seniority mix. A gap at Analyst level, where men and women are equally new to the organization, has no structural explanation.

Analyst
$5,817
Significant
Senior Analyst
$7,956
Significant
Manager
$5,925
Significant
Director
$11,636
Significant
VP
$12,208
Significant

The regression model — predicting fair pay

The model predicts what each employee should earn, based only on legitimate factors: role level, tenure, performance, division, and location. Gender is deliberately excluded. This is the architecture of a legally defensible pay equity model — we predict fair pay without gender, then check whether actual pay differs by gender after controlling for everything.

Features used: role_level_encoded ← strongest signal (r=0.94) tenure_years ← moderate performance ← weak (low variance) div_Retail_Banking ← vs Pension Admin baseline div_Wealth_Management ← vs Pension Admin baseline loc_Toronto ← vs Calgary baseline loc_Vancouver ← vs Calgary baseline Excluded: gender (not a legitimate pay factor) Excluded: years_in_role (multicollinear with tenure) Excluded: employee_id (identifier only) Algorithm: Ordinary Least Squares (OLS) Train/test: 80/20 split · 1,120 training employees
R² — Test Set
0.940
Legitimate features explain 94% of salary variation. 6% is unexplained — and some of that unexplained portion is the equity gap.
Mean Absolute Error
$7,510
On average, the model predicts salary within $7,510. The gender gap of $5,807 is well within measurable range.
No overfitting — training R² (0.933) and test R² (0.940) are nearly identical. The model generalizes well.
Separating explained from unexplained
The Oaxaca-Blinder decomposition is the statistical method used by economists, governments, and courts worldwide to measure discrimination in wages. It splits the gap into two parts: the part we can explain and the part we cannot.

Why this matters legally

The raw gap includes both legitimate differences (men have slightly longer average tenure at CAN North; more men are in Wealth Management, which pays more) and discriminatory differences (same role, same tenure, different pay because of gender). The law only requires you to fix the unexplained part — but you must prove you have identified it correctly.

Imagine two Wealth Analysts. Both joined in 2021. Both score 74 on performance. One is male, one is female. The model predicts they should each earn the same amount. One earns $7,955 more. That difference — with identical characteristics — is the Oaxaca-Blinder unexplained gap. That is what this system is designed to find and fix.

The decomposition result

Raw Gap (Overall)
$5,807
Simple difference in average male vs female salary. Includes both explained and unexplained portions.
Explained Portion
~$0
Role mix, tenure, and location differences between genders. This is legitimate — not legally actionable.
Unexplained Gap
$5,807
The gap that remains after controlling for every legitimate factor. This is the legally actionable number Diana defends to the Commissioner.
From gap to remediation plan
The analysis identifies who needs an adjustment and by how much. The plan optimizes which employees to adjust first, subject to the legislated budget constraint.

The legislated budget

The Pay Equity Act does not ask how much the employer wants to spend. It sets a minimum floor: 1% of total payroll per year. Employers may spend more. They cannot spend less. For CAN North Financial with a total annual payroll of approximately $151 million, the annual minimum is $1,513,371.

Total Payroll
$151.2M
Annual base pay, all 1,400 employees
Legal Minimum (1%)
$1.51M
Minimum annual spend required by the Act
Total Remediation
$1.68M
Exceeds 1% threshold — phased plan permitted
Employees Affected
234
Underpaid female employees below 95% compa-ratio

Risk-prioritized optimization

When total remediation exceeds the 1% floor, a phased plan is legally permitted. But the order of adjustments matters. The system assigns each affected employee a risk score weighted across three components, then uses linear programming to allocate the annual budget to maximize risk closed per dollar spent.

Legal risk
50%
Flight risk
30%
Reputational risk
20%

Legal carries the most weight because the Commissioner focuses enforcement on the largest gaps. Flight risk is second — a departing senior woman costs $78,000 to replace.

YearEmployeesSpendStatus
Year 1 225 $1,510,900 Legal minimum met
Year 2 9 $173,200 Gap closed
Legislative reference: Pay Equity Act, S.C. 2018, c. 27, s. 416 — Section 61(2). Phased implementation for 100+ employee organizations. Maximum phase-in: 3 years. Minimum annual spend: 1% of previous year payroll.
An agent that answers questions under the Act
Diana does not want to re-read the analysis every time a board member asks a question. The Pay Equity Agent reads the analysis results and the approved legislation — and answers in plain English.

What the agent knows

The agent is deliberately constrained. It has exactly two knowledge sources and nothing else. This is not a limitation — it is a legal architecture decision. When Diana presents this analysis to the Pay Equity Commissioner, every answer must be traceable to either the data or the statute. An agent that searches the internet or invents answers is a liability, not an asset.

Knowledge Source 1

The analysis results from the current session — the gap findings, the decomposition, the remediation plan, the list of affected employees. This resets with each new analysis. The agent never confuses one organization's data with another's.

Knowledge Source 2

Three approved legislation URLs only: the federal Pay Equity Act, the Ontario Pay Equity Act, and the Ontario Pay Equity Commission guidance. The agent fetches and reads these directly — no cached summaries that could be out of date.

Format — every answer follows the Whizlink framework

WHAT

The direct answer to the question. One or two sentences.

WHY

The legislative basis or analytical methodology behind the answer.

SO WHAT

The business or legal implication for Diana and CAN North.

HOW

The next action — what Diana or the organization should do with this information.

⚠ Every agent response includes a declaration: this is not legal advice. The agent provides information based on publicly available legislation and synthetic analysis data. Diana should consult qualified legal counsel for decisions with legal consequences.

Sample questions the agent handles

QuestionWhat the agent draws on
"What is our overall pay gap and is it significant?"Analysis results — raw gap, t-test p-value
"Which job grade has the largest gap?"Analysis results — grade-level gap table
"What does Section 61(2) say about phased implementation?"Federal Pay Equity Act — live URL fetch
"Do we have to post a pay equity notice?"Federal Act + Ontario Act — legislation
"How many employees need adjustments in Year 1?"Analysis results — phased plan
"What is the 1% rule and does it apply to us?"Both Acts — Section 61(2) and Section 13(4)
From script to tool Diana actually uses
The analysis pipeline runs on a local machine. The Streamlit application wraps it in a three-path interface that works for a demo, for tweaking, and for real data.

The pipeline architecture

Data
generate_data.py
Synthetic 1,400-employee dataset with embedded gap
Analysis
analyse.py
EDA → regression → decomposition → remediation plan
Intelligence
agent_pay_equity.py
Claude API + legislation URLs + session results
Interface
app.py
Streamlit — three paths + agent tab

Three paths through the app

1

See the Demo

One click runs the full analysis on CAN North's 1,400-employee dataset. The complete pay equity report appears immediately — gap findings, statistical tests, decomposition, and a phased remediation plan. The Agent tab becomes available to ask questions against the results.

2

Tweak the Demo Data

Download the synthetic CSV. Change values in Excel — widen the gap, change the grade distribution, adjust tenure. Re-upload and re-run. This path shows how the analysis responds to different data structures and helps users understand what drives the findings.

3

Upload Your Own Data

Download the template CSV with column headers and instructions. Fill it with masked employee data. Upload. Choose a remediation budget. Run the analysis. Download the report, the remediation list, and the phased plan — ready to present to the Pay Equity Commissioner.

What is not pushed to the repo

The same philosophy as the Flight Risk system: the methodology stays on the developer's machine. Only the app interface and the pre-generated analysis outputs go to Streamlit Cloud. The analysis pipeline, the data generation logic, and the regression notebooks are proprietary.

Pushed to Streamlit Cloud: app.py ← the interface requirements.txt ← dependencies outputs/ ← pre-run analysis results Stays on local machine: generate_data.py ← synthetic data methodology analyse.py ← the full regression pipeline agent_pay_equity.py ← agent architecture data/ ← raw employee data