BWI

Agentic AI for Finance: Towards Verifiable, Trustworthy & Reliable Swiss AI Agents using an Agentic RAG Architecture

Anna Barcikowska

09 Juni 2026 / 3 min read

Banks run on documents. Regulatory texts, supervisory guidance, internal policies that are long, conditional and cross-referenced. Getting an answer wrong carries real consequences. AI can process these documents at speed, but standard language models generate confident answers even when the evidence does not support them. The question this thesis asked is whether a more controlled Agentic architecture could change that.

Context

Risk governance in banks really depends on the correct interpretation of complex regulatory documents. While large language models can read and summarize regulatory documents, they rely only on static training data. Because of that, they do not always produce an output that is based on what the original document actually said. Retrieval-Augmented Generation (RAG) manages some of these issues by using external documents before producing an output. However, standard RAG architectures still generate outputs without validating whether or not the answers actually support the retrieved documents. In regulated work, that gap is a problem.

Goal

The thesis examined whether an Agentic RAG architecture can improve the verifiability and trustworthiness of document-based AI decision support in risk governance compared to a standard RAG approach. The main research question was:

How can an Agentic RAG architecture improve verifiability and trustworthiness of document-based AI decision support for risk governance compared to a standard RAG approach?

Methods

Two systems were designed and tested under identical conditions, one being the basic RAG pipeline, retrieve and generate, and the other, the Agentic RAG prototype which had five separate roles like planner, retriever, drafter, verifier and explainer. Before the evaluation of the systems, six experts were interviewed to help improve the system and gain an insight into the banking industry and the usage of AI agents there. Testing of systems was performed against eight different queries and it was based on five different criteria: grounding, reference quality, strength of claim, failure handling and explainability.

Key Findings

When the Agentic system was evaluated, it refused to generate an answer if supporting documentation was not available, explaining the missing documents that was required to support a complete answer. The baseline RAG system would generate complete answers regardless of whether or not there was actual supporting evidence available.

When evaluating the system against the 5 evaluation criteria, the Agentic system consistently outscored the baseline system with an average of 9.5 out of 10 compared to 7.5 out of 10. The most noticeable difference in the two systems was how they referenced the source of the information. the Agentic system referenced the exact section and page number of the retrieved documents, which is important for regulated industries since the industry requires a high level of traceability.

Because the Agentic system is more strict when generating an answer, the system didn't answer two out of eight questions even though the information existed in the documents. This an acceptable error to make given the risk management perspective where a wrong answer is a worse case than an incomplete one.

Every interviewed expert was in agreement that a human should always be included in the process. Neither group supported making decisions based only on the use of Agentic system, and believe that a human should always verify all decisions made by the artificial intelligence before those decisions are communicated to other people. They also found a great value in using the system for regulatory research and document analysis.

Conclusion

In regulated contexts, agentic systems can help build more trust of f AI-based decision support by providing verification, traceable citations and transparent reasoning. The findings show that a stricter system will sometimes decline to answer when a less cautious one would not, but in risk governance that situation is more acceptable for this type of work.

The system matters most as a support tool, the one that helps professionals work through complex regulatory documents faster and with a clear evidence trail, while keeping the final judgment with a person.