原文 · 未翻译
Joe Rose, president at strategic technology provider JBS Dev, wants to cut through one of the myths of working with generative and agentic AI systems. “It’s a common misconception that your data has to be perfect before you do any of these types of workloads,” he explains.
As a recent article in AI Fieldbook outlines, vendors and consultants – not surprisingly – suggest you need huge data lakes and multi-year data transformation programmes respectively. Executives are therefore scratching their heads at it all. The reality is slightly different. “The tooling has never been better than it is now to deal with poor quality data,” says Rose. “It’s almost remarkable what an LLM can understand on a half-written prompt.”
It makes sense. If you’ve got such a tool available, then it’s worth utilising that to your advantage – with the correct guardrails in place. The inherent unpredictability of models means a need to handle bad output, which is where the human in the loop comes in. For textual or category data, there is a resilience in place. “People are… used to ‘we build it, it works, we forget about it,’” says Rose. “That’s just not how these systems work.”
Regarding imperfect data, Rose gives an example of a client in the medical sector where the goal was to migrate to another billing reconciliation system. Records were a mix; some were in PDF, others an image; the procedure would sometimes be in the doctor’s name, the doctor’s name would be in the patient’s name, and so on. The gen AI was able to scope the clean data from a simple prompt, from OCR to the images to text extraction for the PDFs, while more agentic approaches were subsequently leveraged, such as comparing a customer record to an insurance contract to see if they were billed at the right rate.
“You start to layer different use cases on top of one another,” says Rose. “That’s not to say that it gets everything right – you still need a human in the loop. But what you want to do is say, ‘we started at 20% automated, and then 40%, and then 60, 80%’, and kind of grow that over time.”
Going forward, Rose expects future discussions for these models to be around cost and portability. “I think you’re going to see a shift away from these radical leaps and model capability, and more shift towards ‘how do we make the cost more sustainable that we don’t have to build data centres at the rate we’re building data centres?’,” he says.