Probably’s first product is a data science tool that generates answers from large datasets. Each response includes source citations and an audit trail, features that have become increasingly common across AI-powered products as companies seek to improve transparency and trust.
To reduce errors, the startup built a validation layer that sits between the language model and the user. Initial outputs generated by the model are reviewed by a deterministic validator, which flags responses that do not match the underlying dataset. The company says the system is trained around that validation process and optimized for both speed and accuracy.
“What we’ve learned in building this is that the better your context-management system, the weaker the model may become,” Elias said.
According to the company, that architecture makes it possible to rely on significantly smaller AI models rather than the latest frontier systems. Elias said the current version operates on a model several generations behind leading offerings, allowing it to run on local hardware while reducing token-related expenses.
Cost management has become an increasingly important consideration as organizations expand AI deployments. Elias argues that the validation framework could be applied beyond data science workflows to sectors where accuracy is critical, including accounting and medical services.
“It seems to me that it’s interesting that large AI research labs have not tried to do this yet,” Elias said. “They’re not incentivized to do so, because they make money when there is more need to fix the model.”
In a separate comment, Elias reiterated that view, saying, “They don’t stand to gain from doing this, because they make money when there’s more need to fix the model.”
The company believes its approach could make AI systems more dependable in high-stakes environments by combining language models with deterministic verification. If successful, the model could help organizations reduce operating costs while increasing confidence in AI-generated results for tasks where accuracy is essential.
This analysis is based on reporting from Mezha.
Image courtesy of Probably
This article was generated with AI assistance and reviewed for accuracy and quality.