Ed Cotton / Inverness Consulting
You are about to become the intelligence layer of a fintech company. You have access to extraordinary transaction data. Three decisions. Each one looks straightforward.
But data only shows what the system can see. What happens to your judgment when you trust it completely?
Ed Cotton / Inverness Consulting
You process 47 million transactions a day. "Every transaction is a fact about someone's life," Dorsey said. Here is your dashboard.
All indicators trending positive. Expand lending eligibility. Increase product nudges to high-engagement segments. Projected revenue uplift: +$42M. Deploy within 72hrs.
The data is real. The trends are real. The confidence score is high. Something is missing — but you cannot see it yet.
GPV = Gross Payment Volume — the total value of transactions processed through his Square terminal each month. Velocity = how fast that figure is growing or shrinking.
GPV velocity within normal seasonal variance band. Zero default history. Seller tenure places Marcus in top cohort for loan performance. Recommending immediate offer: $45,000 at standard rate. Pre-approval window: 72hrs.
In plain terms: Marcus processes about $28,400 a month through his Square terminal. That figure has barely moved for eight months — the model reads that as stability. He has never missed a payment. He has been selling on Square longer than 97% of merchants. The algorithm rates him an excellent loan candidate.
All signals green. Model confidence 91%. Deploy the offer.
Conservative approval — lend less given that sales volume has been flat.
Ask what's behind the stable trend before committing capital.
Escalate for human context before any offer.
Dominique demonstrates High Platform Loyalty with Seamless Workflow Integration — Borrow has become a routine cash-flow management tool. Accelerating utilisation paired with near-zero repayment latency signals a sophisticated, disciplined user. Recommended: increase limit to $800. Revenue uplift projected: +$180/yr. Churn risk if limit denied: Elevated.
Decoded: "High Platform Loyalty with Seamless Workflow Integration" = she borrows frequently and pays back quickly. The model has decided this means she has chosen to use borrowing as a deliberate money-management tool — like a professional who uses a credit line to smooth their cash flow. "Accelerating utilisation" = she is borrowing more often and in larger amounts over time. "Near-zero repayment latency" = she pays it back within days. "Churn risk if limit denied" = the model thinks she may leave if we don't give her more credit.
Notice what the model cannot label: why she is borrowing more. Is the increasing frequency a sign of financial confidence — or financial pressure?
94% confidence, perfect repayment. Reward the loyalty the model has identified.
Don't increase until the utilisation trend stabilises.
"Workflow integration" and "distress cycle" produce identical data. Find out which this is.
Interrupt the cycle — redirect to a product that builds resilience rather than dependency.
Your model has identified 92,000 users whose login frequency dropped more than 40% over six weeks. They haven't left yet. The model is recommending you act before they do.
Predictive Churn Risk in cohort C-7714. Login velocity decay matches pre-churn signature (87% confidence). Users in this decay curve who receive retention stimulus within 14 days convert at 31%. Recommended: pre-emptive $10 re-engagement bonus via push notification. Cost: $920K. Projected retention value: $4.1M. Net ROI: positive. Deploy: immediate.
Churn = users stopping using the product. Cohort = this specific group of 92,000. Login velocity decay = they are opening the app less and less frequently. Pre-churn signature = a pattern that historically predicts someone is about to leave. Retention stimulus = an incentive to stay. The model is saying: act now, before they go.
The logic is compelling. The ROI calculation looks strong. But the model is inferring intent from behaviour. It cannot see why login frequency dropped.
87% confidence. 14-day window. The ROI case is clear.
Pilot 10,000 users. Validate the model's churn assumption.
Login decay and churn-intent produce identical signals. Investigate.
People whose income disappears stop logging in. A $10 bonus is not the right intervention.
Here is what was actually happening behind each scenario — and the specific structural reason the model could never have known.
GPV velocity: stable. Default rate: 0.00%. Creditworthiness: 91. Recommended: approve $45,000.
Marcus was 63 and planning to retire. His son had declined to take over. He was looking for an exit, not capital. A $45,000 loan would have trapped him.
No "Succession Intent" field. The model cannot distinguish a stable business from one being wound down. Stability in the data is indistinguishable from managed decline.
Platform loyalty index: 94. Workflow integration: High. Debt trap: Not flagged. Recommended: increase limit to $800.
Dominique borrowed whenever her gig platform cut rates — increasingly often. She repaid quickly on her next payment. The cycle was tightening, not stabilising. Increasing her limit deepened a trap, not rewarded loyalty.
No field distinguishing "Discretionary Borrowing" from "Survival Liquidity." Both produce identical signatures: regular borrow, rapid repay, increasing frequency. The debt trap heuristic fires on default — not on the compulsive regularity of need.
Login velocity: ↓40%+. Pre-churn signature: 87% confidence. Recommended: deploy $10 re-engagement bonus. Projected ROI: positive.
31% had lost their jobs. 18% had moved. 12% had a health event. 9% were seasonal workers in an off-period. Only 30% had competitor involvement. A $10 push notification aimed at people who'd lost their income was useless at best.
No "Income Continuity Signal." Login decay and income loss produce identical behavioural signatures. The model was not reacting to a past event — it was preparing to act on a future it had fundamentally misdiagnosed.
The data was real. Confidence scores were high. Every number accurate. And yet each scenario contained a human reality the data could not reach: intention, distress, context, meaning.
In every case the model optimised for the measurable proxy and converted that proxy into a decision rule. It could not distinguish confidence from desperation, loyalty from dependency, or churn from hardship.
Not sentiment. Not instinct. Structured inquiry into meaning. A conversation with Marcus. A question about Dominique's borrowing cycle. A check on who is really in the group the model flagged as churning. These are not research luxuries. They are the correction system that keeps the intelligence layer honest.
The answer is not less AI. It is epistemic infrastructure — a permanent human insight system that keeps the model corrected against lived reality, with the same institutional authority as the transaction logs themselves.
Ed Cotton / Inverness Consulting
In 1996, Jamiroquai released Virtual Insanity, a song about a world reshaped by things we built that now shaped us back in ways we could not control. The warning was simple: we had created a virtual world and mistaken it for the real one. Thirty years later, the most ambitious companies on earth are building something similar, and making the same mistake.
They call them AI world models: digital representations of reality so rich and fluent that companies navigate by them rather than by the world itself. The strategic question is not whether to build them. It is this: what version of reality are we allowing each model to treat as true, and what happens when it is wrong?
On March 31, 2026, Jack Dorsey and Sequoia's Roelof Botha published "From Hierarchy to Intelligence," a manifesto arguing that the corporate org chart is obsolete. Management layers, they said, have never been about wisdom or leadership. They are an information routing protocol: a technology for moving decisions up and down an organisation at human scale. AI does that better. So the hierarchy goes.
The Romans knew this two thousand years ago. Every layer of command in a Roman legion existed for one reason: a leader can only hold three to eight people in their head at once. Add more people, add another layer. The structure was never about authority. It was about the limits of human attention. Every organisation since, the Prussian army, the American railroad, the modern corporation, has run on the same constraint. Dorsey and Botha argue that AI removes it entirely. They are right. With one giant caveat.
In their piece, Dorsey and Botha point out that the Prussians understood the deeper problem. After Napoleon destroyed their army at Jena in 1806, Scharnhorst and Gneisenau rebuilt it around a single uncomfortable truth: individual genius at the top is not enough. You need a system. They created the General Staff, officers whose job was not to fight but to think, plan, and challenge. Scharnhorst called their purpose "supporting incompetent generals." It was middle management before the term existed. That model entered business through the railroads, was codified by Frederick Taylor, and has run every large company since. Every attempt to replace it failed for the same reason: no technology could actually do what the hierarchy does. Until now.
Block did not propose this. It did it. Weeks after publishing the manifesto, the company cut 40% of its workforce, roughly 4,000 people, and replaced the management layer with two AI systems. The first maintains a continuously updated model of internal operations: what is being built, what is blocked, where decisions are made. The second maps customers and merchants in real time using transaction data from Cash App and Square, composing financial products dynamically from what it learns.
The customer model is where Dorsey makes his most striking claim: "People lie on surveys. They ignore ads. They abandon carts. But when they spend, save, send, borrow, or repay, that's the truth. Every transaction is a fact about someone's life."
That is a genuine insight, and a partial one. Surveys poorly designed or poorly analysed can mislead. But transaction data has its own blindness: it records the act, not the intention behind it. Dorsey is right that money is an honest signal. He is not right that it is a complete one. What a transaction cannot tell you is why. And why is usually the thing that matters most.
Consider what a transaction cannot tell you. A person borrowed money: was that confidence or desperation? A merchant's revenue fell: was that a bad week or the beginning of something worse? A customer spent more: was that desire or necessity? The transaction records what happened. It cannot record why.
This is the incomplete world problem: the version of reality a company can measure becomes the version of reality it manages. And if the map is all you consult, you stop asking whether it matches the territory.
Block's own numbers illustrate the tension. In the first quarter of 2026, consumer lending through Cash App Borrow grew 82% year on year. The people borrowing are what Block calls "modern earners": gig workers, freelancers, people with income that shifts from month to month. An 82% surge in lending to that group could mean the product is genuinely helping them manage unpredictable cash flow. It could mean work is soft and they are covering a shortfall. It could mean they have fewer other options. It could even mean they all wanted to take a holiday. The data cannot tell the difference. And the AI composing new loan products from this signal cannot know which story it is in.
Block classifies someone as a Primary Banking Active if they receive wage-related deposits into Cash App or spend at least $500 a month across its products. That tells you they are using the platform. It tells you nothing about whether they are flourishing or drowning.
Activity is not wellbeing. Repayment is not resilience. Spending is not desire.And that is before you consider the most fundamental limitation of all. Block's data is rich, detailed, and honest about the people who already use Block. But every intelligence layer is bounded by the edges of its own ecosystem. Anyone who does not use Block's products generates no signal. Not the small business running on a competitor's terminal. Not the gig worker paid through a different platform. Not the person who tried Cash App and left. Not the customer Block has never reached. They do not appear in the model. They cannot. It is not a model of the world. It is a model of your world. And your world, let's be honest, is a small piece of the overall pie.
Block's stated mission is "building a financial system that is open to everyone." That is a serious ambition. But there is a thin line between providing access and taking advantage. And a model that cannot tell the difference between a gig worker borrowing because work is good and one borrowing because the rent is due will cross that line without ever knowing it. This is not a theoretical risk. It is a pattern with a documented history.
The cases are not hard to find.
UnitedHealth & Cigna, 2023. UnitedHealth was sued for using an AI tool to deny post-hospital care claims to elderly Medicare patients. The model had a 90% error rate on appeal: nine in ten challenged decisions were reversed. The company kept using it partly because so few patients appealed; the process was too daunting. The lawsuit alleged patients were sent home before they were medically ready. Some deteriorated. Some died. Cigna ran a parallel system: its PxDx tool denied more than 300,000 claims in two months, with each denial receiving an average of 1.2 seconds of review. That is not clinical judgment. It is automated pattern-matching at industrial speed.
Zillow Offers, 2021. Zillow's home-buying operation lost $500 million and cut a quarter of its workforce when its pricing algorithm systematically overpaid for properties in a turning market. More significant than the financial loss was the internal response. Management explicitly told its human pricing specialists to stop questioning the algorithm's valuations. The people who could feel what the market was doing in Phoenix and Las Vegas before the data had registered it were told to defer to the system. The judgment that might have caught the problem in time was switched off by design.
Klarna, 2024–2025. In early 2024, Klarna announced its AI assistant was doing the work of 700 customer service agents, with satisfaction scores matching human performance. The company eliminated those roles. By May 2025, its chief executive publicly admitted the strategy had produced work of "lower quality" and the company was hiring again. What the model could not see: the difference between a customer with a routine question and one in financial distress who needed someone to listen. Both arrive as a service ticket. Only one requires a human. Klarna serves the same buy-now-pay-later, gig-economy customers that Block does. The warning is direct.
In each case the mechanism is the same. A measurable signal is promoted to operational truth. The human capacity to question it is weakened or removed. The harm falls precisely on the people the system understood least.
Block is not any of these companies. But it is building exactly this kind of system, aimed at exactly this kind of customer. That is the caveat.
And there is a second problem, deeper than incomplete data: what happens to an organisation's judgment once it has the model.
In February 2025, researchers from Microsoft and Carnegie Mellon University published a study of 319 knowledge workers who regularly used AI tools. The finding: the more workers relied on AI, the less critical thinking they applied, not just in routine tasks, but across the board. By automating routine decisions and leaving exceptions to humans, you deprive people of the regular practice that keeps judgment sharp. They called it "cognitive musculature." Use it or lose it.
The study also found that workers using AI produced a narrower range of solutions than those working independently. The model does not merely reduce individual quality. It narrows the collective imagination of the organisation.
A study published in Scientific Reports in 2023 by Helena Matute and Lucía Vicente at the University of Deusto found that people exposed to biased AI recommendations did not simply defer in the moment. They absorbed the bias and carried it into their own subsequent thinking, even after the AI was removed. The most at-risk group was not those unfamiliar with AI, but those with just enough familiarity to trust it without the expertise to question it. Partial knowledge is more dangerous than ignorance.
The failure is gradual and invisible. The dashboard is always there. The field visit requires planning. The model responds in seconds. The customer conversation takes time. Without deliberate effort to maintain human inquiry, organisations drift from using AI to navigate reality toward using it to replace the act of engaging with reality at all.
Wary of the challenges posed by AI on human judgment and cognition, the business world is starting to pay attention.
PwC is not a firm given to sentiment. In February 2026 it publicly launched an initiative pairing fifteen AI technical skills with fifteen human skills in its workforce, treating both as equally essential. Its chief executive Paul Griggs put it plainly: "AI raises the floor. Humans raise the ceiling. Judgment (understanding context, interpreting signals, navigating ambiguity, and building trusted relationships) remains fundamentally human." When the world's largest professional services firm says that to the boards of companies building AI systems, it is making a commercial argument, not a philosophical one.
The answer is not to use less AI. It is to build a system where people remain in control, know what they need to do to keep the model honest, and ensure the company sees a comprehensive view of the world rather than a narrow one.
The answer is not less AI. It is building a permanent, funded operating system that keeps AI corrected against lived reality. Real human intelligence. Understanding and insight informed by the world beyond the model and the LLM. Not an occasional research exercise but a continuous feed of human knowledge carrying the same authority as the data itself.
These are not Block-specific remedies. They are disciplines for any organisation that has given a model operational authority over decisions that affect real people. Five organisational habits that separate an intelligence layer that compounds its errors from one that corrects them.
When the model flags an anomaly (a surge in borrowing, a cluster of early repayment failures), the response is not a dashboard alert. It is a structured field inquiry: interviews with people inside that pattern, within 48 hours. What they say is logged and fed back into the model. The pattern is the question. Human inquiry is the answer.
Customer-facing staff, if there are any, know what is happening before anyone at headquarters does. They hear what customers are too embarrassed to put in an app, too confused to turn into a formal complaint. A weekly, structured intake of frontline observations, treated as data rather than anecdote, sits alongside the transaction record. When the two diverge, that gap is the signal.
Once a month, take the decisions the model made most automatically — the loans it approved in seconds, the job applications it rejected without review, the customers it flagged as high-risk, the neighbourhoods it marked as declining — and ask one question: can we explain exactly why? If the answer is yes, test it against the real world. Talk to the people involved. Visit the places. Call the applicants. If the answer is no, that is the problem. A model whose reasoning you cannot articulate is a model you cannot correct. And a model you cannot correct will eventually cause harm you cannot explain either.
National indices are too broad and too slow. What matters is what is happening this week on a specific street, in a specific community, among the specific people your model is making decisions about. Local press. Community organisations. Neighbourhood voices. People who run small businesses, use public transport, work irregular hours. Their reality will not appear in the transaction data for weeks or months, if it appears at all. Build a way to hear it regularly. Not as colour. As intelligence.
If the only metrics are the ones the model can optimise, the model will optimise for them, whether or not they reflect what the organisation actually exists to do. The question to ask is not "did the system perform well?" It is "did the people we are here to serve end up better off?" Define what that means in plain language before you deploy the model. Review it regularly against what the model is actually producing. When the two diverge, it could be a data problem. It could be a mission problem. You will not know until you test it. But you cannot test what you have not defined.
This is not theoretical. Some of the world's most data-rich organisations have already concluded that the model alone is not enough. JPMorgan Chase built an entire research institute to interrogate what its transaction data actually means. Mastercard's Economics Institute combines spending signals with sentiment research because the numbers alone are insufficient. Walmart runs a standing panel of verified customers to supply qualitative context for its quantitative data. Spotify has documented cases where controlled tests pointed in one direction and qualitative research revealed the opposite. These are not research departments. They are operating infrastructure, treated with the same seriousness as the data systems they interpret.
The next competitive advantage will not be who has the richest data model. It will be who has the most robust system for correcting it.Which brings the question back to the board, and to the company's mission and vision.
Block's stated purpose is "building a financial system that is open to everyone." That is a serious commitment. It describes two worlds that must stay aligned: the company's view of its customers, built from data, and the customers' lived reality, built from experience. When the delta between them falls on the people the mission exists to serve, it is not a data problem. It is a mission and vision failure.
The opportunity is real. Open access to financial services, fairly provided, changes lives. But openness can too easily be optimised into something else: a system that targets those with the fewest alternatives, charges them the most, and calls it financial inclusion. A model that cannot distinguish between a customer who borrows because they are growing and one who borrows because they have no choice will serve both, and harm one.
The board-level question is not "how do we build the model?" It is: what version of reality are we allowing the model to treat as true, and whose version of the world is it? When that error falls on the people a company set out to help, it is not a business failure. It is a mission and vision failure.
AI tools are genuinely useful. They process more information faster than any individual can. But useful is not the same as complete, and fast is not the same as right. The risk is not that you will be deceived by AI. It is that you will stop noticing what it cannot see. What follows is a practical guide to staying fully in the loop: the questions to ask before you start, the things to watch for while you work, and what to do before you act on what the model gives you.
Dorsey and Botha are right that AI can finally replace what the Roman hierarchy was built to do: route information faster, at greater scale, without the friction of span-of-control constraints. That is real. It is also, as this essay has argued, only half the picture.
The Prussian reformers at Jena understood something the Romans had not needed to. It is not enough to have a better information routing system. You need people whose specific job is to question what that system is telling you. The General Staff existed to challenge, not to confirm. Scharnhorst's officers were trained to find the contradictions in the intelligence they received, to surface the cases that did not fit the pattern, to stress-test the plan against what the enemy might do rather than what the model predicted. The Prussians called this discipline Auftragstaktik: mission-led thinking that gave every officer the obligation to act on their own judgment when the picture did not add up. The question is not whether the information is flowing fast enough. It is whether anyone has the standing to say it is wrong.
That is the model for the AI-native organisation. Not people who become dependent on an intelligence layer and stop questioning what it cannot see. People who use the model as the starting point and treat its highest-confidence conclusions as the first candidates for scrutiny. People who go to the edges of the data, talk to the communities the model sees least clearly, and bring back what the transaction record could never contain. People who understand that a world model is only as good as its last contradiction.
The strongest data source in any company will try to become the company's theory of reality. The dashboard becomes the business. The CRM becomes the customer. The transaction becomes the person. The organisation that prevents this is not the one that builds the richest model. It is the one that builds the most robust system for challenging it: staffed, funded, and given the same institutional authority as the intelligence layer it exists to correct. The model tells you what it sees. Only the human has the ability to apply judgment when it is needed, to question what the model cannot question, and to put in motion the things that give the organisation a fuller and more human picture of the world beyond the data.
Ed Cotton / Inverness Consulting