Conference Programme

Two days
Three tracks

4 Keynotes · 24 Talks · 6 Lightning Sessions2 Panels · 400 Attendees

Day 01

Saturday, 11 July 2026

ResearchAppliedEthics

Research & Foundations

Applied ML & Engineering

Ethics, Society & Policy

8:30 amRegistration & Morning Coffee30m

9:00 am·Keynote·Plenary

20 min

Opening Remarks

Dr. Leah Ramirez

9:20 am·Keynote·Plenary

45 min

The Bitter Lesson Revisited: What Scaling Actually Taught Us

Prof. Bernard Okafor

10:05 amShort Break10m

10:15 am

TalkResearch

Toward Mechanistic Interpretability at Scale: Sparse Autoencoders Beyond Toy Models

Dr. Jason Lee

TalkApplied

Shipping LLM Products Without Breaking Everything: Lessons from 18 Months in Production

Samantha Patel

TalkEthics

Auditing Foundation Models: Methodologies, Gaps, and Who Should Pay for It

Malik Mensah

10:50 am

TalkResearch

Data Mixture Laws: How Corpus Composition Predicts Downstream Capability

Dr. Silvia Rossi

TalkApplied

Evaluation Pipelines That Don't Lie to You: Building Ground Truth at Scale

Jonathan Kim

TalkEthics

Environmental Accounting in ML: Why Current Carbon Estimates Are Almost Certainly Wrong

Dr. Emma Park

11:25 am

Lightning TalksResearch

Lightning Talks - Research Track

RoPE Embeddings Don't Generalise the Way You Think
Michael Chen
Emergent Structure in Latent Space: A Geometric Perspective
Priyanka Das
Forgetting as a Feature: Controlled Unlearning in Fine-Tuned Models
Dr. Tomas Alvarez

Lightning TalksApplied

Lightning Talks - Applied Track

Context Window Management for Long-Running Agents
Grace Liu
When RAG Goes Wrong: A Postmortem Anthology
Daniel Gomez
Prompt Versioning in the Real World
Maya Kaur

Lightning TalksEthics

Lightning Talks - Ethics Track

Consent at Scale: Can Users Actually Opt Out?
Dr. Lucia Bianchi
Regulatory Divergence and What It Means for Model Deployment
Kevin O'Neil
Red-Teaming as an Ethics Practice, Not Just a Safety One
Sofia Nordin

12:00 pmLunch75m

1:15 pm·Keynote·Plenary

45 min

Agents in the Wild: What Six Months of Real Deployments Actually Looked Like

Angela Rivera

2:00 pm

TalkResearch

Superposition and Polysemanticity: New Evidence from Activation Patching at Depth

Dr. Marcus Evans

TalkApplied

The Latency Tax: Real Costs of Chain-of-Thought in Customer-Facing Products

Lauren Schneider

TalkEthics

Differential Privacy for Fine-Tuning: What Practitioners Actually Need to Know

Dr. Aisha Rahman

2:35 pm

TalkResearch

Long-Context Faithfulness: Measuring How Well Models Actually Use What You Give Them

Noah Bekele

TalkApplied

Structured Generation Without the Footguns: A Practitioner's Honest Assessment

Viktor Petrov

TalkEthics

Disparate Impact Across Language: Benchmarking Multilingual Models for Fairness

Dr. Heidi Sørensen

3:10 pmAfternoon Break25m

3:35 pm·Panel·Plenary

60 min

Is the Benchmark Era Over? How We Know Whether Models Are Actually Getting Better

Dr. Natalie BrooksProf. David IbehDr. Lila AhmedMarcus ValdezCaroline Dubois

4:35 pm·Keynote·Plenary

25 min

Day One Close & Community Announcements

Dr. Brian Choi

5:00 pmEvening Reception - Foyer & Rooftop Terrace90m

Day 02

Sunday, 12 July 2026

ResearchAppliedEthics

Research & Foundations

Applied ML & Engineering

Ethics, Society & Policy

8:45 amMorning Coffee15m

9:00 am·Keynote·Plenary

45 min

Reasoning Without Shortcuts: Building Models That Fail Gracefully

Dr. Hannah McAllister

9:45 amShort Break10m

9:55 am

TalkResearch

Reward Hacking at Deployment: Characterising Sycophancy in RLHF-Trained Systems

Dr. Amir Khan

TalkApplied

Multi-Modal Pipelines in Production: The Failure Modes Nobody Blogs About

Ken Tanaka

TalkEthics

Model Transparency Reports: What They Should Contain and Why They Don't

Astrid Larsen

10:30 am

TalkResearch

Chain-of-Thought Is Not Reasoning: A Philosophical and Empirical Challenge

Prof. Julia Kostova

TalkApplied

Observability for LLM Applications: Tracing What Actually Matters

Sebastian Ortega

TalkEthics

Labor, Annotation, and the Workers Behind Every "Clean" Dataset

Dr. Zainab Musa

11:05 am

TalkResearch

Speculative Decoding at the Edge: Accuracy, Latency, and the Real Tradeoffs

Tanveer Shah

TalkApplied

Fine-Tuning vs. Prompting vs. RAG: A Framework for Actually Deciding

Bridget Weiss

TalkEthics

Concentration Risk in AI Infrastructure: What Happens When Three Providers Go Down

Dr. Felix Mwangi

11:40 amLunch80m

1:00 pm·Keynote·Plenary

45 min

The Quiet Infrastructure: Who Owns the Compute That Shapes What Gets Built

Prof. Chong Park

1:45 pm

TalkResearch

Measuring Calibration in Instruction-Tuned Models Under Distribution Shift

Dr. Pablo Serrano

TalkApplied

Cost-Optimal Inference: A Practical Guide to Routing, Batching, and Caching

Mei Chan

TalkEthics

Meaningful Human Oversight: What It Requires in Practice

Dr. Rachel Mensah

2:20 pm

TalkResearch

Towards Formal Verification of Safety Properties in Neural Networks

Dr. Artem Volkov

TalkApplied

The Prompt Injection Problem Is Not Solved: A Survey of What Works and What Doesn't

Yosef Azulay

TalkEthics

Child Safety and Foundation Models: Gaps in Current Practice

Dr. Claire Reid

2:55 pmAfternoon Break20m

3:15 pm·Panel·Plenary

60 min

Where Is This All Going? A Disagreement About the Next Five Years

Prof. Olivia GrantDr. Eric WuProf. Maria DelgadoDr. Deepak MehtaGrace ZhuThomas Winkler

4:15 pm·Keynote·Plenary

35 min

Closing Keynote: Staying Serious in a Hype Cycle

Dr. Elise Nguyen

4:50 pmConference Close10m

Two daysThree tracks

Saturday, 11 July 2026

Opening Remarks

The Bitter Lesson Revisited: What Scaling Actually Taught Us

Toward Mechanistic Interpretability at Scale: Sparse Autoencoders Beyond Toy Models

Shipping LLM Products Without Breaking Everything: Lessons from 18 Months in Production

Auditing Foundation Models: Methodologies, Gaps, and Who Should Pay for It

Data Mixture Laws: How Corpus Composition Predicts Downstream Capability

Evaluation Pipelines That Don't Lie to You: Building Ground Truth at Scale

Environmental Accounting in ML: Why Current Carbon Estimates Are Almost Certainly Wrong

Lightning Talks - Research Track

Lightning Talks - Applied Track

Lightning Talks - Ethics Track

Agents in the Wild: What Six Months of Real Deployments Actually Looked Like

Superposition and Polysemanticity: New Evidence from Activation Patching at Depth

The Latency Tax: Real Costs of Chain-of-Thought in Customer-Facing Products

Differential Privacy for Fine-Tuning: What Practitioners Actually Need to Know

Long-Context Faithfulness: Measuring How Well Models Actually Use What You Give Them

Structured Generation Without the Footguns: A Practitioner's Honest Assessment

Disparate Impact Across Language: Benchmarking Multilingual Models for Fairness

Is the Benchmark Era Over? How We Know Whether Models Are Actually Getting Better

Day One Close & Community Announcements

Sunday, 12 July 2026

Reasoning Without Shortcuts: Building Models That Fail Gracefully

Reward Hacking at Deployment: Characterising Sycophancy in RLHF-Trained Systems

Multi-Modal Pipelines in Production: The Failure Modes Nobody Blogs About

Model Transparency Reports: What They Should Contain and Why They Don't

Chain-of-Thought Is Not Reasoning: A Philosophical and Empirical Challenge

Observability for LLM Applications: Tracing What Actually Matters

Labor, Annotation, and the Workers Behind Every "Clean" Dataset

Speculative Decoding at the Edge: Accuracy, Latency, and the Real Tradeoffs

Fine-Tuning vs. Prompting vs. RAG: A Framework for Actually Deciding

Concentration Risk in AI Infrastructure: What Happens When Three Providers Go Down

The Quiet Infrastructure: Who Owns the Compute That Shapes What Gets Built

Measuring Calibration in Instruction-Tuned Models Under Distribution Shift

Cost-Optimal Inference: A Practical Guide to Routing, Batching, and Caching

Meaningful Human Oversight: What It Requires in Practice

Towards Formal Verification of Safety Properties in Neural Networks

The Prompt Injection Problem Is Not Solved: A Survey of What Works and What Doesn't

Child Safety and Foundation Models: Gaps in Current Practice

Where Is This All Going? A Disagreement About the Next Five Years

Closing Keynote: Staying Serious in a Hype Cycle

Two days
Three tracks