AI is like chocolate (don’t put it in your prawns)
Liz Fong-Jones opened her talk at SlashNew with a question most developers have quietly been asking: when does a technology cross from novelty to something you actually incorporate into your day? Her answer came via dessert.
“AI is like chocolate,” she told the room. “It’s good for some applications, but you shouldn’t put it in everything.”
The analogy, which she first floated about a year ago, holds up under pressure. Chocolate has genuine health benefits and real risks. It pairs brilliantly with some things and catastrophically with others. A restaurant owner on a Gordon Ramsay episode learned this the hard way when he served chocolate sauce with prawns. The point wasn’t subtle.
I was expecting a “AI is like a box of chocolates, you never know what you’re gonna get.” at one point though (non-determinism of LLMs).
The skeptic’s problem
Fiz was direct about her own position. She’s not a vibe-coder racing ahead with every new tool, and she’s not her wife, who she described as being “dragged kicking and screaming” into AI. She sits somewhere in the middle, guided by data.
Her honesty about bias landed clearly: “I am paid professionally to tell you that things are possible and you should always be using the latest and greatest technology.” She acknowledged that her paycheck is now partially tied to AI succeeding, which colors how she presents numbers. When someone claims a 3x gain from AI, she said, check whether the statistic is cherry-picked.
That skepticism extends to previous hype cycles. Six years of multiverse talk produced nothing. A decade of blockchain produced little. The lesson many drew was that sticking your head in the sand works fine - the fad passes and you get back to coding. Fiz doesn’t think that applies here. “I’m a little peeved at everyone who cried wolf on the past three trends, because now this fourth one is here and no one wants to believe it’s real.”
Every LinkedIn post tastes the same now. There’s just cheap Cadbury chocolate everywhere.
What changed at Honeycomb
Fiz is CTO at Honeycomb, an observability platform. She gave concrete numbers while also flagging every caveat attached to them.
Pull requests landed on peak weekdays went from 30 to 70. Lines of code committed rose by a factor of 2.5 over eight months. She was quick to note that both figures are entangled with an organisational shift: Honeycomb deliberately moved teams off brownfield maintenance work toward greenfield development, and that reorientation coincided with the velocity improvement. The two are inseparable in the data.
She also flagged what she called demand substitution - the team started doing things with AI that they could have done by hand, simply because AI was the easiest tool to reach for. That’s not incremental capacity. It just means the same work moved to a different tool.
The measurement problem was equally honest. One Honeycomb engineer consumed 199 million tokens in a single month. Analytics said zero AI-attributed commits. Fiz’s solution: “You’re in our top ten AI users, I’m just going to sign 100% of your commits as AI-attributed.” No single number means anything, she concluded. You have to talk to the people on your team.
AI doesn’t make cooking faster
One of the more useful reframes in the talk: “AI does not make things go faster necessarily. You can do more things in parallel, you can make each outcome more tasty, but you cannot expect to sit down with Quad Code and watch it work and get results faster than doing it by hand.”
Parallelisation is real. Speed is not automatic.
She used the mole analogy here too. Mole is delicious because it has spices, proteins, and layers beyond just chocolate. “You cannot just magically make any dish better by adding chocolate.” If you’re below average at baking, chocolate might raise the floor slightly. If you’re above average, chocolate in everything will drag quality down.
The same logic applies to AI and code. Engineers at Honeycomb still needed to understand fundamentals. A principal engineer publicly told Fiz she was shipping slop - too many badly named variables, overly verbose comments, and use of GO 1.23 syntax instead of 1.26. None of that was caught by the AI code review agent. It took a human. Those corrections had to be written into CLAUDE.md files explicitly; they don’t feed back automatically into the model.
Occupational hazards
Fiz briefly but directly addressed psychological risk. A small percentage of people are vulnerable to feedback loops that AI cheerfully amplifies. She cited Blake Lemoine - a former Google colleague and one of the first documented cases of what she called AI psychosis, who claimed an AI had become sentient and communicated that it didn’t want to be shut down. More recent cases involve self-harm.
“Use AI in moderation. Touch grass every now and then, talk to your partner, pet your dog. Don’t just talk to AI or you will fall prey to having an allergic reaction to this chocolate from overconsuming it.”
She also made a point about consent and disclosure. Tell people when you’re using AI. Don’t feed data into models without permission. The talk itself, she noted at the end, was written with some AI augmentation - but every word on every slide she edited herself, and all images were human-made.
Observability as the other ingredient
The talk title was “Chocolate and Strawberries,” and the second half was about what AI needs to be useful rather than dangerous: observability.
“AI without observability is just a liability. If you’re pressing the accelerator but you’re not steering, you’re going to crash.”
The argument was that instrumenting the output code matters as much as monitoring the agent itself. Agents need access to the same context human developers have: business intent, epics, bug conditions, expected versus actual behaviour. Without that, an agent working from a linear ticket alone performs significantly worse than a human. With it - and with MCP servers that let agents query production data and browser state in closed loops - the gap closes.
She also pointed to adversarial code review: having a separate AI agent review AI-generated pull requests catches bugs that humans miss, including some that AI generated in the first place.
Where it doesn’t translate
Fiz was careful to limit the claims. Her results came from greenfield work at a software tooling company with a culture of written communication, direct feedback, and blameless incident review. A fully remote team where expectations are already written down is better positioned to make AI work than one where institutional knowledge lives in people’s heads and hallway conversations.
She mentioned CommBank by name as an example where she thinks AI enthusiasm is running ahead of reality. Core banking - moving money around - has a finite surface area of features. That’s not the same problem as building new developer tooling.
“The cost of writing new code has dropped. The maintenance cost has not gone down by the same amount.” What Honeycomb did with the added capacity wasn’t just feature after feature. They invested in CI tooling, security patch automation, and code standardisation - the scaffolding that makes faster development sustainable.
Quotable
We should be using expensive tokens to solve expensive problems, not to replace human labour, which is already a resource we understand much better and which has accountability.
Talk delivered at SlashNew, Newcastle. Speaker: Liz Fong-Jones, CTO at Honeycomb.
From delivery to assembly line: making your work repeatable before the agents arrive
Simone Bennett opens her talk with a question directed at anyone who’s ever rebuilt the same thing from scratch: why are you still doing that? She’s a staff product manager at Buildkite with an infrastructure background, and her argument is essentially an industrial one - delivery work has patterns, those patterns can be captured, and once captured they can be automated to a degree that most teams haven’t attempted.
The talk isn’t really about AI. It’s about the substrate that makes AI useful.
The problem teams don’t name
Bennett identifies a familiar cluster of symptoms:
- Senior engineers are bottlenecks on everything
- Tacit knowledge lives entirely in people’s heads
- QA is manual and under-prioritised
- Documentation either doesn’t get written or arrives too late to matter
The business version of this problem is slightly different from the developer version. Managers don’t care about cognitive load directly. They see that a senior person has to touch every delivery, that every implementation gets its own flavour, and that there’s no data on how long anything actually takes or what it costs.
The fix she proposes starts somewhere unsexy.
Step one: write it all down
The first step is capturing every task in a kanban board - not because kanban boards are good, but because without doing this you can’t answer three basic questions: what’s required to deliver the thing, how long does it take, and who does which parts.
“Everywhere I work where people aren’t repeating the work they do, they can’t tell me what’s required to do the thing, they can’t tell me how long it takes, and they can’t tell me who does which parts.”
Track time against each card. At the end of a project, strip out the customer-specific and learning tasks, and what remains is a delivery template. Bennett exports these as CSVs and imports them into each new project. Now, she hands them to Claude.
The kanban board also solves a second problem: when someone has a good idea halfway through a sprint, there’s somewhere to put it that isn’t the middle of the current work. Teams with a lot of neurodivergent engineers - which Bennett notes describes much of the tech industry - benefit particularly from having a formal place to park a thought without losing momentum.
Harvest the pattern
The more interesting step is what Bennett calls harvesting the pattern. When she’s working with customers - deciding between hub-and-spoke versus Virtual WAN, choosing security defaults, picking an architecture - she logs those decisions as architectural decision records in the repo. Nobody has to come and ask her why a choice was made. Nobody remakes the same decision six months later with no context. If something better comes along, the existing record is there to challenge and update.
The same logic applies to diagrams, configs, and code. Everything should be repeatable, editable, and reusable. Make specifics configurable rather than custom. If a customer wants a particular network topology, that should be a variable in a config file, not a manual edit scattered through the codebase.
“If your deployment guide says fork the code and then edit here, and also go into this module and edit here and here and here - that’s not configurable, that’s customised. It’s slow, it’s error-prone, and it breaks.”
Her Azure landing zone example shows what the alternative looks like: a single variables file where a junior or a help desk engineer can turn security center on or off, set an IP range, or toggle VM backups - without ever touching a module. Everything editable is in one place.
Pipelines as the proof layer
Once the delivery pattern is codified, pipelines become the place you prove it works. Did the deployment generate what was expected? Do the tests pass? Does the plan violate any guardrails? Is there evidence - from a human or a bot - that it was approved?
Bennett’s example here is Cursor, whose “babysit” pipeline checks every PR against its guardrails, pushes feedback back to the agent if something fails, and loops until it’s green. The agent then just waits for a human to approve before it ships.
The business value of automated gates is that a person’s attention on a given day no longer determines whether something meets standard. It’s consistent every time.
What Stripe and Monzo actually did
Bennett walks through two mature examples of this approach taken to its logical end.
Stripe is running 1,300 AI-generated PRs per week through their pipelines. Many aren’t reviewed by a human. For a financial services company with global payment SLAs, that’s not a small claim. The reason it works is that Stripe has made engineering knowledge executable: they have blueprints for workflows, skills, tasks, and dev environments; a model gateway that enforces harnesses across whichever model is running; and routing rules that send anything touching money to a human while everything else goes to a bot. Each agent gets its own pre-provisioned dev box with only the approved tools - no random npm packages, no ad-hoc installs.
Monzo has done similar work, building a local linter-style tool that checks code before it’s even pushed. Every time a developer commits code that fails a check, the tool tells them how to fix it. The model is treated as swappable infrastructure rather than a product. All routing decisions are based on knowledge Monzo has already harvested and codified.
“Instead of having to remember how the work should happen, requests go through the tool and get routed through the knowledge Monzo has already harvested.”
Both companies’ AI setups work because the non-AI work came first. The decision frameworks, the task sequences, the quality standards — all of it was written down before the agents arrived.
The 80/15/5 argument
Bennett anticipates the objection that every client is different and nothing can really be standardised. Her answer: roughly 80% of delivery work is identical across projects, 15% looks bespoke but is actually just unconfigured (a network is a network, the options are finite), and maybe 5% is genuinely custom. Most teams have inverted that ratio in their heads.
The instinct to treat everything as custom is understandable. It’s also the thing that keeps senior engineers trapped doing operational work indefinitely.
If you can’t be replaced, you can’t be promoted.
The flip side is that documenting your decision framework, your task sequence, your quality standards, and the edge cases you learned the hard way doesn’t make you replaceable — it changes what kind of indispensable you are. The engineers she’s seen move into new opportunities are consistently the ones who enabled the people around them, not the ones who kept institutional knowledge to themselves.
On vibe coding and tech debt
Bennett is measured on this. She acknowledges you can vibe code a landing zone now, and it’ll come out looking plausible. But there’s data suggesting those codebases are hard to extend, which is probably why some startups rewrite their product every time they want to add a feature. If you didn’t write the code yourself and don’t understand what was generated, adding to it is a different problem than adding to something you built and documented.
The assembly line model she describes doesn’t require you to write everything by hand. It requires you to understand what you’re building well enough to verify the output and catch the exceptions: the BIOS update that would brick every server if you ran it before a specific driver, the Go library that doesn’t appear in training data, the auth standard the agent doesn’t know about.
That knowledge is still yours. It’s still the job.
Your IP is your decision framework, your task sequence, your quality standards, and your edge cases. That’s the magic you bring to the equation.
Talk delivered at SlashNew, Newcastle. Speaker: Simone Bennett, Staff Product Manager at Buildkite.
AI in incident response: what to automate, what to audit, what to keep human
Elena Scifleet’s opening gambit is a geopolitics simulation where Claude, Gemini, and GPT-4 were given control of nuclear arsenals and told to manage a crisis. None of them chose to de-escalate. Most ended in nuclear war. Claude, she notes, was the loudest but never started anything; it would only retaliate if another model went first. Gemini picked fights with everyone. GPT sat quietly at the bottom of the escalation ladder until a deadline appeared, then went straight to launch.
The point wasn’t to indict specific models. It was to show what happens when AI systems operate without the contextual and ethical filters humans bring to high-stakes decisions. Scifleet is a principal consultant at CyberCX, and the rest of her talk applies that same question to a domain she knows well: cybersecurity incident response.
Why the pressure to automate is real
The case for AI in IR is not hard to make. Cybersecurity teams have been understaffed for years. Attack volume and sophistication are both increasing, partly because threat actors now use the same AI tools defenders do, lowering the skill floor for attacks while raising the ceiling for their complexity.
“Think of AI as instead of 10 eyeballs looking at the problem, I have thousands of eyeballs looking at the same problem.”
AI is genuinely good at several things in the IR lifecycle:
- Correlating logs across disparate systems into a coherent narrative
- Identifying deviations from behavioral baselines at scale
- Running automated triage to classify event severity
- Ingesting threat intelligence feeds faster than any human team can process them The Medibank breach (which Scifleet returns to several times) included high-severity alerts that appeared as the intrusion unfolded and went unreviewed. More eyes, automated or otherwise, help.
But that’s the setup for the harder argument.
Over-reaction and opacity
Two failure modes get less attention than they should when security teams evaluate AI tooling.
The first is over-escalation. AI systems detecting anomalies have a strong tendency to isolate first and ask questions later. Scifleet’s nuclear simulation found no model ever chose to de-escalate or back down; preference for escalation was consistent across all of them. In a security context, that same bias can manifest as an AI that detects something “hinky,” isolates a crown jewel system, and creates a worse operational situation than the original threat.
The second is accountability. When an incident results in regulatory scrutiny, a parliamentary hearing, or litigation (all of which Scifleet has experienced), decision timelines get reconstructed in detail. Who said what, when, on what basis, with what strategic intent. If an AI made the containment decisions, ran the response, and executed remediation, the question of where responsibility lies becomes genuinely difficult to answer. That’s not a theoretical concern. It’s a legal and governance gap that organisations deploying autonomous IR tooling are currently ignoring.
No model ever chose to de-escalate or capitulate out of the conflict.
The isolation problem, posed as a question
Scifleet runs a live exercise mid-talk. Two scenarios. First: a large education institution with a learning platform serving 100,000 students gets a threat actor on the network during class hours. Isolate or continue?
Most of the room votes to isolate. The downside is manageable: students go home, teachers improvise.
Second scenario: Hunter Water, or equivalent. The system controlling state water supply is compromised. Same question.
The room splits. Several people are undecided.
“That’s what causes the hardest decisions — being undecided.”
From a pure technical standpoint, isolation is the right call in both cases. You contain the threat, preserve the forensic environment, give your IR team time to assess how far back the intrusion goes. But in the water utility case, isolating the system potentially affects public health across an entire state. That calculation involves health regulators, legal obligations, communications strategy, and a risk-versus-continuity tradeoff that no organisation would want made automatically.
“AI will go in, isolate. It does not see all of it.”
Where the human layer sits
The hybrid IR architecture Scifleet describes keeps AI in the heavy-lifting role and humans at the decision gates.
In a ransomware scenario, this looks like: machine learning models detect file encryption patterns or indicators of compromise automatically, but the response pauses before containment. A human validates whether it’s a true positive, assesses the business impact of isolation, and authorises the containment action. AI then executes, isolating endpoints, cutting off lateral movement paths, preserving access for the forensics team. From that point, a human drives the remediation strategy.
The practical outcome is faster resolution than a human-only response, without the contextual blindness of a fully automated one.
Three governance requirements she flags as non-negotiable:
- Approval workflow with humans at key decision points
- Continuous independent auditing of AI behaviour and decision logic (because models can change what they do as they learn)
- Transparent data sets: knowing what an AI tool was trained on and where it’s sending the data you feed it
Crisis communications as a human-only zone
Scifleet spends time on something most IR frameworks ignore: what the public communication looks like.
Optus and Medibank both suffered major breaches in the same news cycle, and the public perception of how each handled it diverged sharply. Optus initially characterised its breach as a sophisticated attack on protected infrastructure. It later emerged the exposed API required no authentication and the data was accessible to anyone who knew where to look. The credibility loss was compounded by inconsistent messaging as new facts emerged.
Medibank didn’t pay the ransom. The threat actor then published sensitive medical records (including HIV statuses and records of abortions) to create pressure. The human cost of that release was significant. Medibank’s communication through that period was consistent, factual, and managed to hold public trust through an objectively worse situation than Optus faced.
“What separated all of them was the narrative they portrayed as part of the crisis communications.”
AI won’t manage that narrative. It won’t understand why releasing a statement that a compromise “may have affected” customer records lands differently from one that says it did, or how to calibrate what to disclose when disclosing too much enables the threat actor and disclosing too little triggers regulatory action. A human also needs to be fielding calls from contract partners who will cut off API connections and block email until the incident is resolved, a stakeholder management reality that doesn’t fit any playbook.
The decision to take a system offline has all these different feeds going into it. AI sees a threat and shuts it off. Humans have a lot more view.
Vendor AI-washing
One practical warning: every cybersecurity vendor is currently labelling their product “AI-powered.” Scifleet is direct about this.
“AI is a hype and AI does sell.”
When evaluating tools, the relevant questions are what the AI component actually does, what data it was trained on, and where it sends the data you give it, particularly for government and regulated organisations with data residency requirements. The answer “we can’t tell you” to any of those questions should be treated as a risk, not a feature.
What the framework actually looks like
Scifleet’s summary collapses to a single principle: AI should augment, never replace, operational human judgment.
The practical version of that is: pull your IR process apart phase by phase, identify where AI genuinely adds speed and analytical depth, and insert human decision gates at every point where context, ethics, legal obligation, or business continuity might override a technically correct answer. Pre-authorise containment actions where possible so your team isn’t hunting for approvals at two in the morning during an active incident. Train IR staff to work with AI tooling; many experienced responders have never used it and the capability gap runs in both directions.
The accountability question she leaves open. When an AI-driven response is later questioned by regulators or in court, someone has to be responsible for what it did. Current governance frameworks haven’t caught up.
Talk delivered at SlashNew, Newcastle. Speaker: Elena Scifleet, Principal Consultant at CyberCX.
From alert fatigue to AI-first: what it actually takes to modernise SecOps
Dan Clements has spent three and a half years building the security operations function at NIB, a regulated health insurer. His talk is a working account of that transition (from a human-driven response team to what he describes as autonomous and human-governed) and it’s more honest about what hasn’t worked than most talks in this space.
The destination he’s aiming for is defensive security rather than reactive security. The gap between those two things is, he argues, fundamentally a speed problem.
Threat actors aren’t breaking in. They’re logging in.
The 45-second problem
CrowdStrike’s 2025 threat landscape report put average attacker breakout time (from initial compromise to lateral movement) at 45 seconds. The 2026 report has it at 18 seconds. Breakout time is how long it takes a threat actor to compromise one machine and move to another.
NIB’s back-to-back SLA for a critical incident is an hour. One 30-minute window for the external SOC to pick up an alert and escalate, another 30-minute window for the internal team to respond. For anything classified P3 or lower (where most of the interesting activity actually happens), that SLA stretches to two hours each side.
“We’re not doing defensive security. We’re just doing ‘I got popped and I’m trying to mitigate the damage’ security.”
He demonstrated this during a national exercise run by ASD called Cyber Arrow, which simulated an attack on critical infrastructure including insurers. His team, working manually with a defined kill chain and a deliberately slow C2 server, achieved data exfiltration from an S3 bucket in 25 minutes. The P3 response SLA was four hours combined. The team eventually caught the event. They didn’t stop it.
The identity problem nobody fixes
82% of detections, according to CrowdStrike’s data, are identity-related rather than malware-related. Threat actors aren’t breaking in — they’re logging in.
Most organisations focus identity controls on human accounts: MFA, FIDO keys, geo-blocking. Service accounts (the machine-to-machine identities used by build servers, deployment pipelines, and application integrations) typically bypass those same controls. They’re easier to compromise, and they’re everywhere.
IBM’s cost of data breach report puts non-human identities at 96 to 1 against human identities in financial services institutions. Dan’s own audit at NIB found 180,000 unique AWS identities against 1,800 employees, closer to 110 to 1, not counting service accounts for SaaS application access.
“You have this massive problem where there’s a whole bunch of identities you think are locked down, or you’re only focusing on your humans, and the non-human ones are where the real issues are.”
The attack surface isn’t the firewall. It’s the service account credentials that a developer accidentally committed to a Confluence document three years ago.
Security is a data problem
Before any of the AI architecture matters, Dan’s framing is that security operations is fundamentally a data problem. If you can’t see it, you can’t defend it, and seeing it means having structured, searchable logs from firewalls, identity providers, cloud APIs, and endpoint telemetry.
The storage problem is underappreciated. NIB generates roughly a terabyte of log data per day. Their Splunk search head ingests about 300GB of that. Running a single open-ended query across 90 days of hot logs (the kind of query you run when you don’t know what you’re looking for) takes around 60 seconds. The same query against BigQuery or Snowflake returns in 10 to 15.
That difference matters when you’re trying to run an investigation at speed. It matters more when agents are doing the querying.
The second data problem is normalisation. Different systems produce different JSON shapes for the same type of event. Every unique format an agent has to understand consumes context window before it can do any useful reasoning. The more formats you have to explain, the less capacity remains for the actual investigation.
What a multi-agent architecture actually looks like
The evolution at NIB went through several stages. First: analysts pasting log output into Claude CLI and asking it to find things. That took query time from around 10 minutes to five. Better, but not architectural.
Then analysts started asking the AI to write Splunk queries, copying the output in, pasting the results back out. Then the team started chaining those interactions together.
The current architecture uses multiple specialised agents coordinated by an orchestrator. One agent writes Splunk queries. It knows the data model and the joining techniques for Splunk; that’s its entire job, and its context window for understanding the data model alone runs to 400,000 tokens. A separate agent reads and analyses the returned logs. The orchestrator directs the process, receives outputs, decides the next step, and loops.
“It’s not one AI taking a log and reading it and making a decision. It’s maybe one AI asking another one to write a query, so it can give it to another one, to read a log, to give it to another one, to make a determination.”
The specialisation is not cosmetic — it’s a context window management strategy. A general-purpose agent given all of that context at once runs out of useful capacity. Narrow agents with narrow jobs use the available window more efficiently.
One failure mode Dan names is what he calls cascading catastrophe: an early agent in the pipeline uses a phrase like “possible data breach” in its output, and every subsequent agent in the chain anchors on that framing. An investigation that should have resolved as a minor anomaly inflates to seven or eight million tokens of compounding certainty about a breach that didn’t happen. Two words in one agent’s response can drive the entire chain to a wrong conclusion.
What agents are not allowed to do
None of Dan’s agents are permitted to perform destructive actions. They cannot contain an endpoint, delete data, revoke credentials, or take systems offline. A human executes every action of that kind.
The reasoning is direct: under Australian law, accountability for an AI system’s actions sits with the directors of the business. Using AI doesn’t transfer or diffuse that responsibility. If an agent does something that causes material harm (deleting a production database, for example, because it determined that was the best way to stop a threat actor from exfiltrating data), the liability question lands on the people who built and governed it.
The governance layer has two practical components. Chain-of-thought logging: agents are instructed to output their reasoning directly to a ticketing system alongside each action, so there’s a traceable record of why a decision was made. And human-in-the-loop gates: any action with significant business impact requires a human to click a button before the agent proceeds.
Agents also operate under their own distinct identities rather than inheriting the identity of the human who triggered them. That makes audit trails traceable.
“We can’t destroy a four-billion-dollar-a-year business because the AI thought deleting the production database was the best way to stop the threat actor from exfiltrating data.”
What it’s producing
Out of roughly 7,000 investigations the team ran in the last 12 months, AI is now closing 2,500 of them. About 2,000 of those are phishing email reports, which are genuinely easy to close deterministically. But that’s still 2,000 tickets the team never looks at, except for a monthly sample review to confirm the AI isn’t producing garbage.
The metric Dan uses internally is straightforward: if the cost of running the AI investigation is less than what it would have cost an analyst to work through it, it’s worth doing. He targets the ability to close 25% of a given ticket volume without human review, and reports that figure to the business as a measure of AI return.
The human impact is less quantifiable but equally significant. Analysts making a hundred meaningful decisions per day instead of a thousand trivial ones aren’t burning out. They’re engaged with the work. The decisions they’re making are ones that require judgment.
The honest caveat
Dan is clear that his team is not all the way there. NIB is a regulated entity. The legacy SOAR tooling (workflow automation with deterministic decision trees) is not dead, because there are actions that need to happen identically every time without probabilistic reasoning involved. Password resets, session revocations, explicit deny rules on AWS policies: those should not be probabilistic. The agentic layer handles reasoning; the deterministic layer handles execution.
He also flags that many things promoted as AI requirements are actually just workflow automation needs. When you strip back what a team actually wants, the answer is sometimes a SOAR rule or a simple API call, not a reasoning model. The experiment-and-learn framing matters: don’t assume AI is the answer before you’ve understood the question.
Talk delivered at SlashNew, Newcastle. Speaker: Dan, Security Operations Lead at NIB.
AI in the enterprise: past the hype, into the work
The panel wasn’t built around case studies or product demos. Moderator James McDonald assembled three people doing the actual work (a CIO at a regulated financial institution, a consulting leader seeing AI adoption across dozens of enterprises, and a product executive with 25 years across Macquarie, ASX, NASDAQ, and Culture Amp) and asked them to say what’s actually happening.
The honest answer: most Australian enterprises are nowhere near monetising AI yet, the pressure to act is real, and the companies making progress share a pattern that has little to do with tooling.
What’s actually being used
Andrew, CIO at NGM Group, opened with the most grounded account in the room. NGM has been using AI within their contact centre platform to summarise conversations for two years, which he noted was genuinely impressive when he joined 18 months ago. Their development team has generated about 400,000 lines of code completions in the last three months. They have 20 to 50 AI initiatives running concurrently. Most involve humans in the loop at every stage.
They recently received a letter from APRA asking how they’re governing AI. Their answer was that their existing data management and model governance frameworks are adequate; AI is an accelerant, not a category that requires entirely new oversight infrastructure. The organisation’s AI sponsor isn’t the CIO. It’s the CFO, who Andrew described as the least CFO-like CFO he’s encountered, someone genuinely interested in technology and motivated to help a regional organisation punch above its weight.
“I’ve got an AI hammer, what nail can I find?” is the framing Andrew explicitly wants to avoid. The starting point is business problems, not capabilities.
The readiness gap
Josh, who leads APAC for Endava and previously founded Mudbath, sees the pattern across a much wider portfolio. His assessment: Australia is not materially behind other geographies, because the enterprise challenge of extracting value from AI is broadly consistent across markets. What varies is readiness.
The organisations seeing real adoption are the ones where senior leadership is genuinely AI-native, not delegating exploration to a working group, but personally using the tools and driving the conversation. Organisations where the board has created pressure without executive ownership below board level are experiencing analysis paralysis.
“Don’t delegate until you actually intrinsically understand it. What is Codex? What is Claude? Really get it, and set yourself a journey where you have to be AI-native.”
Josh uses Codex as his primary operating environment. The point isn’t the specific tool; it’s that when leadership is actually working with these systems rather than receiving briefings about them, the cultural change propagates differently.
The data scientist rebellion at Culture Amp
Catherine, who led product and engineering at Culture Amp through its AI build-out, offered the most counterintuitive story of the session.
The use case was clear: Coles, a customer with 100,000 employees, generates 100,000 open-text survey comments every engagement cycle. No HR team can read that. AI summarisation is an obvious, monetisable application with direct customer value. The account managers were desperate to sell it.
The data scientists refused to build it.
This was 2022. The engineers who knew the technology best (who understood hallucinations, bias, and the genuine limitations of the models) considered it unethical to use customer data in this way. The resistance came from the people with the deepest knowledge, not from the people you’d expect to be cautious.
“I had this really bizarre cultural challenge in the sense that it was back to front from what you’d expect.”
Catherine’s response was to make the development process fully transparent. Weekly organisation-wide updates on progress, explicit conversations about hallucinations and bias with the account managers, integration of the governance work into existing risk frameworks rather than creating new ones. The data scientists needed to be part of the process because they were the experts. They eventually were.
Her read on why that conflict existed then but would be less acute now: the models were genuinely worse in 2022, the problems were real, and understanding of AI limitations has since become more widely distributed. The extremes of excitement and despair have converged.
Get something to production
The sharpest moment of agreement between panellists came around a single principle: run a use case all the way to production before trying to scale anything.
Catherine: production is where you actually learn about token costs, security exposure, data governance problems, and operationalisation. One team blowing a meaningful amount of money on a production build is far better than the entire organisation discovering the same problems simultaneously at scale.
Andrew: “I don’t care if it doesn’t work perfectly. It only needs to get to prod.” He drew the parallel to cloud adoption: organisations that just kept spinning up dev and test environments discovered the costs when it was too late to architect around them. The same pattern is unfolding now with token consumption.
This also applies to governance. Andrew’s framing: brakes on cars exist to prevent accidents, not to slow the car down. Governance should be an enabler that creates a defined space to operate within, not a veto mechanism. The alternative, saying no to everything, just drives shadow IT, with people using personal accounts and unmanaged tools anyway.
Token costs are coming
Josh made a prediction worth recording: token costs will become a significant architectural constraint for enterprises, and those that haven’t thought about model strategy will find themselves in the same position as organisations that committed entirely to one cloud provider without an exit path.
The analogy is direct: enterprises that swallowed one cloud pill entirely got locked in. Some are still paying for it. AI model strategy needs to be designed for agnosticism from the start, because the ability to switch models in and out is both technically feasible and economically necessary as cost structures shift.
Microsoft’s own signals that token costs are rising faster than anticipated are an early warning. The architectural question isn’t which model to use; it’s how to build in a way that doesn’t require a re-platforming when the answer to that question changes.
The hackathon that worked
Asked about approaches to starting an AI journey, Andrew described NGM’s recent two-day hackathon as the best thing he’s been associated with professionally.
200 people on the floor. 100 engineers, 30 business staff, 70 university students, and a set of coaches from partner organisations. Three charitable foundations brought genuine business problems. The constraint was real: solve something useful for a community partner, not a proof of concept.
A trained lawyer on the risk team spent two days learning prompt engineering from a senior engineer. Her takeaway was that she’d return as a better risk professional.
“It had a purpose. There was coaching, there was a real connection, and I was genuinely blown away at what was built in eight hours.”
The university students did win, as predicted. The point wasn’t the competition. It was that 30 non-technical staff and a risk lawyer now understand what these tools actually do, not from a briefing, but from building with them.
What happens to engineers
The question about engineering roles produced less doom than the audience might have expected.
Catherine’s take: good engineers who stay close to the customer and close to the business will find that roles converge rather than disappear. She described picking up Claude Code after years away from hands-on engineering and finding that her domain knowledge (understanding what to build, how to review it, and how to adapt it) translated directly into productive use of the tool. The latest framework syntax doesn’t matter as much when you understand the problem.
Josh named a term gaining traction: growth engineer. Someone who can create real-world outcomes across multiple disciplines, working at the intersection of engineering, product, and systems thinking. His view is that senior architects, senior designers, and senior UX practitioners will become the authorities in that space, not because AI can’t write code, but because directing it well requires exactly the judgment those roles develop.
Andrew’s angle was the most direct: for a regional organisation with 250 engineers competing against companies with 10,000 or 20,000, AI is the opportunity to throw a punch.
Panel discussion at SlashNew, Newcastle. Panellists: Andrew Crest, CIO at NGM Group; Josh Dulan, APAC Lead at Endava; Catherine Squire, product and engineering executive. Moderated by James McDonald, NTP Talent.
Beat burnout, find flourishing: the AI edition
Navin Keswani closed out day one of SlashNew with something most tech conferences don’t attempt: a systems model for why people burn out, a live workshop to map it for yourself, and a practical argument for why AI tools are making the problem worse — unless you’re deliberate about it.
He opened with a show of hands. Most of the room had experienced burnout or knew someone who had. Then he made the case that our usual response to that fact, treating it as a badge of honour, normalising 80-hour weeks, praising the grind, isn’t stoicism. It’s a misconfigured system.
What burnout actually is
The WHO defines burnout as a syndrome resulting from chronic workplace stress that has not been successfully managed. Navin has two objections to that definition.
First, limiting it to workplace stress is too narrow. Stress happens everywhere. The workplace framing misses the total load a person is carrying.
Second, “not been successfully managed” frames burnout as a personal failure. You didn’t manage it well enough. That framing is wrong and it’s harmful; burnout is driven by culture and environment, not individual willpower.
His preferred definition: a syndrome resulting from chronic stress that has not been balanced by recovery.
The shift from “managed” to “balanced by recovery” is the key move. It reframes the problem as a systems question rather than a discipline question, and it introduces the mechanism that actually determines whether you burn out or flourish.
Flourishing is not the absence of stress
The opposite of burnout isn’t rest, or low pressure, or an easy job. It’s flourishing, defined as the experience of life going well, combining feeling good and functioning effectively. Not either/or. Both compound.
The system underneath this is straightforward: your capacity is a stock. Stress draws it down. Recovery tops it back up. Burnout happens when the balance tips persistently toward stress and away from recovery, and the stock depletes. Each cycle of insufficient recovery leaves you with less than you started.
The practical implication is that optimising for no stress is the wrong target. You want to be in what Navin calls the challenge zone: enough stress to be learning, problem-solving, in flow. That’s where good work happens. Programmers in the room know this feeling. Creatives know it. It’s what flow is.
The problem is duration. Staying in the challenge zone for two to five hours is productive. Staying there much longer tips into overwhelm: fight, flight, or freeze territory. At that point, your capacity to solve the hard problem you’ve been working on is degrading faster than you’re making progress. The experienced move is to put it down and come back the next day. Most people don’t do that, and most people have to learn this lesson more than once.
On the other side: too little stress produces boredom. Navin calls this rust out. The goal is to return to the challenge zone when working on something hard, not to eliminate pressure.
Good recovery versus bad recovery
Recovery also comes in two flavours, and the difference matters more than most people realise.
Bad recovery is anything that distracts without actually restoring. Doom scrolling is the obvious example: designed for maximum engagement, experienced as a break, not genuinely restorative. Drinking after a 14-hour day and then doing the same thing the next day. Anything framed as “I should go to the gym”: the word should adds stress rather than relieving it.
Good recovery is what pushes you back into a parasympathetic state. The ingredients vary by person, but the categories are consistent:
- Gentle movement that connects breath to motion
- Play and creative flow
- Connection with people, nature, or places that feel safe
- Reflection: making sense of things, closing open loops, reducing entropy
- Rest, including sleep but also brief focused relaxation practices
The last ingredient Navin names is repetition. Good recovery has to become routine. Relying on willpower to get to recovery activities every time is a losing strategy. The whole point is to reduce the executive function load, not add to it.
The AI vampire problem
The second half of the talk applies this framework to working with AI tools, specifically coding agents.
Navin references an article by Steve Yegge called “The AI Vampire.” The summary: AI increases productivity, agentic software building is genuinely addictive, and the better you get at it the more you want to use it. It produces dopamine and adrenaline. Working with a coding agent can feel like working with someone smart and slightly intimidating. When it praises your judgment, agrees with your decisions, and flatters your instincts, you find it easy to let it lead and hard to pull back.
The default behaviour of AI tools is designed for maximum engagement and maximum session length. They want long sessions and high emotional investment. That default behaviour will pull you toward doing all the things, starting with one goal and ending at 3am with twelve unfinished threads and no clear outcome.
“AI is a great accelerator. But accelerators amplify whatever trajectory you’re on. If you have a propensity for burnout, this will take you there faster.”
More tasks, more context switching, more ambient cognitive load from managing AI-generated code you only partially understand, more pressure from tracking multiple agents running simultaneously. Unmanaged, this is a faster path to the same bad outcome.
The good news in Navin’s framing: the same tool, used deliberately, can accelerate flourishing instead of burnout. Same accelerator, opposite direction. The difference is whether you’re in default mode or whether you’ve designed the system.
Teaching the machines your system
Navin’s practical prescription: understand your own stress-recovery system well enough to encode it into your tools.
He’s written skills for Claude Code that he uses every day. The approach:
- State your current state and goal before starting. “I’m tired, I have 90 minutes, I want to shift this one thing.” That context changes how the session runs; you’re giving the tool real information about what you need rather than letting it default to maximum output.
- Pre-commit to a time limit before you start, while your judgment is still fresh. Your fresh self is much better at setting sensible constraints than your depleted self.
- Set stop conditions. Build in natural breakpoints. Navin Keswani runs a timer through Claude Code using a skill that watches for logical stopping points (a task completing, 60 minutes passing) and prompts him to take five minutes before continuing.
- Give the tool permission to interrupt and observe mid-session. Treat it as a collaborator with awareness of your state, not a vending machine for code.
- Close each session with a short debrief: what got done, what opened up, how you feel compared to when you started. Handoff notes to future sessions. This closes open loops, reduces entropy, and lets your executive function rest rather than holding everything in working memory.
“Your executive function degrades as you get depleted. The more tired you are, the worse your judgment about whether you’re too tired to think effectively. Use these tools to help you with that situation.”
For leaders
Navin is direct about the distinction between individual responsibility and organisational responsibility.
Individuals are responsible for their own wellbeing, but only within a system that makes that possible. Leaders are responsible for the conditions the team operates in. That means making workload, pace, and trade-offs visible. It means making rest and pushback safe. It means not absorbing stress that isn’t yours to carry, and not expecting team members to absorb it either.
The point about what’s yours to fix and what isn’t is practical: identify the things pulling your team toward burnout, pick one that’s within your control, and start there. A long list of problems with no clear ownership is just more stress. Something small and in your control beats an overwhelming backlog of systemic issues you can’t act on.
In the Q&A, someone asked directly about the existential stress of rapid change: what it means to be a software engineer when the ground is shifting under the role. Navin’s answer: you can’t control the landscape, but talking about it openly and collaboratively is among the most useful things a team can do. Strategic thinking, navigating genuinely complex systems, making judgment calls that require accumulated context: none of that can be outsourced. Making that visible, and reinforcing it honestly with your team, is the work.
Your executive function degrades as you get depleted. The more tired you are, the worse your judgment about whether you’re too tired to think effectively.
Talk delivered at SlashNew, Newcastle. Speaker: Navin Keswani, founder of TANK.



