Can AI agents or RAGs replace service desks? ── Reasons for getting stuck and real solutions

Isn’t it safe to say that the idea of ``replacing the primary inquiry response/service desk with AI’’ is now a common idea? In fact, what I want to do is clear. We want to reduce waiting time, reduce operator burden, reduce personnel, and level out response quality. There is no objection to this purpose.

A common mistake is that ``let RAG eat everything’’ is often chosen as the method. I personally tested a configuration that inputs past logs, internal wikis, and chat history all at once, but what I got was not the expected efficiency, but a mass production of seemingly garbage answers.

The conclusion is clear. Partial replacement is possible, but SD replacement without designing knowledge operation has a high probability of failure. And the majority of failures are not due to model selection errors. The quality of input data and the design of operational responsibilities are inadequate.

In the service desk field, correct answers and auditability are required at the same time. In order to achieve both of these goals, the only way to achieve both is to end up with ``official knowledge that is approved and correct for the organization.''

Conclusion first

Even if you input the entire past log, if the quality is low, it will simply search for garbage at high speed and return garbage-like answers.
Even if all information within the organization is input, if the boundary between formal and informal is ambiguous, it becomes a device to justify incorrect answers without assuming the official context or rules of the organization.
A structure that answers only in general terms is against the rules of the organization and cannot be used in the field.
The real solution is to run a KCS (Knowledge-Centered Service) loop with AI assistance and clarify the quality of the reference source and the person responsible.
AI that cannot refer to approved official knowledge cannot be used in service desk production.

Why does “put everything in for now” fail?

1. Past correspondence logs are not knowledge in the first place.

Many inquiry logs are written to close tickets in the field. While this in itself is correct for business purposes, it is often insufficient as reusable knowledge.

Prerequisite information is missing (device used, authority, connection route, environment difference).
The goal is to complete the ticket, and in extreme cases it may end with “correspondence completed”. Not intended for reuse.
A mixture of abbreviations and colloquialisms, and relies on the tacit knowledge of the person in charge.
Root cause and interim workaround are not separated.

If a log in this state is submitted to RAG, even if the search is successful, the answer will not be accurate. This is because AI can plausibly complement documents that are ambiguous or incomprehensible even when read by humans, resulting in the creation of unrealistic delusional answers. This creates a troublesome situation where the answer is quick but the problem is not resolved. Even if you think the prompt is bad and try to tune it as hard as you can, it will probably end up in vain. At least it happened to me.

2. Full input of internal information causes “official dilution”

The configuration that allows users to read all internal wikis,'' all file servers,’’ and ``all chat logs’’ is comprehensive at first glance. However, in reality, this is similar to arranging information with different degrees of reliability on the same search screen.

Typically the following occurs:

Official procedures (approved) and personal notes (unapproved) are searched in the same column
Obsolete procedures remain and conflict with the latest procedures
Temporary know-how is misquoted as permanent procedure
A reversal occurs where the update date and time of the article is new but the content is old.

RAG is good at “finding documents”. However, a separate system must be created to ensure that the document is correct for the organization.

3. Generalizations without knowing organizational rules cannot be used even if they are correct.

AIs with insufficient formal knowledge often return answers that are technically correct but operationally infeasible.

Grant local administrator privileges
Temporary relaxation of security settings
Use of external cloud at individual discretion
Direct access to cross-functional data

Even if the AI returns a general solution, it will be impossible to implement if it violates organizational rules. At this moment, service desk AI could even become a device that justifies rule-breaking. It cannot be used as a solution to business problems.

Examples of failures that occur on site

Failure example 1: Misdirection due to VPN connection failure

Symptom: “Unable to connect to VPN from home”
AI answer: “Please initialize the OS network settings and restart.”
Actual: The cause is certificate revocation on the authentication infrastructure side, and it cannot be resolved on the user side.

why did you wake up? This is because the past logs contained many “temporary malfunctions of personal devices”, and the AI was drawn to them. Furthermore, the procedure for isolating service-side failures'' was weak as official knowledge, so it was not ranked as a top candidate. In addition, the organizational context of Is it okay for users to reset NW settings without permission?’’ is not factored into the answer.

Failure example 2: Proposing violation of regulations in software application

Symptom: “I want to use analysis tools”
AI answer: “Download the trial version yourself and start using it.”
In practice: Procurement, license management, and carry-on screening are essential for organizations.

why did you wake up? This is because official procurement flow documents were buried by referring to external general articles and personal memos. This is a typical example of not being able to separate what can be done'' from what is okay to do.''

Failure example 3: Same question, answer changes every time

Symptom: “What are the steps to set up email forwarding?”
AI answer: Depends on the day
Actual: Old operation procedure page and new procedure page coexist

why did you wake up? This is because there was no metadata indicating the “valid version” (valid start date, abolishment date, approver), and the answers varied in search rankings. When personal emails and chat snippets are mixed in, the probability that an informal explanation will rank higher by chance increases.

So what to do: Run the KCS loop with AI assistance

The real solution is not to replace KCS with AI. Accelerating KCS with AI.

What is KCS?

KCS (Knowledge-Centered Service) is an operational concept that records the knowledge generated in the field of inquiry response, reuses it, and continues to improve it. The important thing is not to ``write a document after solving the problem,’’ but to embed knowledge updating into the act of solving the problem itself. The reason why KCS is being reevaluated in the RAG era is simple: the quality of the search destination largely determines the quality of the answers.

Short story: Old story, but effective.

KCS is an idea that started in 1992, much older than AI. However, the reason it is currently effective is because the issues on the ground have not essentially changed.

There is a log, but I can’t read it.
There is a document, but I don’t know the official version.
Not updated and rots

When you include generation AI, these three things tend to be amplified rather than disappear. That’s why it makes sense to go back to the plain but unbreakable KCS mold before flashy new features.

Minimum configuration for KCS operation

Capture: Structure and record symptoms, causes, countermeasures, and applicable conditions when responding to inquiries.
Structure: Create a template and make environmental conditions, target range, and prohibitions mandatory items.
Reuse: Refer to the next inquiry and present the answer and the basis URL as a set
Improve: Reflecting differences in actual operation and continuing to correct ambiguous words and old procedures

Production loop: Nurturing AI answers with official knowledge

In the field, it is realistic to gradually expand the scope of application in the next loop.

User asks AI first
If AI does not resolve the issue, escalate to Service Desk (SD)
Convert SD correspondence records into knowledge using templates suitable for reading by AI
Add only approved and reviewed official knowledge to AI reference data sources
AI’s primary response rate and correct answer rate increase with similar inquiries

The key point of this loop is the accumulation of knowledge that can be used for “approval” or official answers, rather than “quantity.” Make only “approved information”, not “recorded information”, the data source for AI. If this is broken, the range of answers will expand, but the accuracy will not.

Areas that can be left to AI

Clustering of similar queries
Generation of draft answers (with evidence)
Point out missing items (“Target OS is unknown”, “Authority prerequisites are not stated”, etc.)
Presentation of candidates for integrating duplicate knowledge

Areas where humans should be responsible

Official/informal judgment
Approval of procedures and decision to abolish them
Allow exception handling
Knowledge expiration management

Even if you try to leave this to AI, the AI will not be able to respond properly.

Practical points

The following points should be kept in mind in practice.

Knowledge operation side

Define who will create and publish what kind of official knowledge and when, and decide on the creation and quality assurance scheme.
Define who can view the knowledge and decide on access controls.

AI side

Explicitly instruct students to provide rationale for their answers in the prompt.
Do not allow unofficial/unauthorized sources to be used in answer generation.

Improved operation

Determine the means to collect AI’s incorrect answers and incorporate knowledge generation for improvement into normal operations.

These three points may seem abstract, but they make a difference when implemented. Teams that decide ``who will be responsible’’ first improve quickly, while teams that do not make decisions repeatedly make the same mistakes.

The answer to “Can it be replaced?”

If you ask me whether a service desk can be 100% replaced by an AI agent, the current answer is “no.” However, if you break down the types of inquiries and establish an operation that clarifies knowledge responsibilities, Presentation of primary answers, fixed guidance, and standard procedures can be fully substituted.

In other words, the point at issue is not the intelligence of AI itself.

Which knowledge to adopt as official?
Who guarantees quality?
Who will stop you when you make a mistake?

Only organizations that can design these three points will be able to transform AI agents from just a quick chat tool'' to a business force.’' Conversely, an organization that does not have an approved official knowledge operation will not be able to create service desk quality, no matter how sophisticated the model.

summary

The idea that ``if you include all the logs, you will become smarter’’ is an illusion. The real reason why service desk AI is stuck is not the amount of knowledge, but the lack of governance of knowledge.

What we need to do now is not grand total automation. This is a low-key operation that uses AI assist to run the KCS loop and cultivate official knowledge.

As organizations begin to accept this modesty, the quality of their responses changes visibly. When using AI at a service desk, the top priority is always the same. **Continuously develop approved and institutionally correct official knowledge. **