"Hey, You Made a Mistake!": Coaching AI Agents with Verbal Feedback

Think of a large language model (LLM) like ChatGPT, Gemini or Claude being trained to learn a task through trial and error using the traditional approach of reinforcement learning methods use a reward-and-punishment approach as data is processed. This process can be slow and inefficient.

The Reflexion framework reinforces language models not by updating weights but instead through verbal feedback. After each iteration the feedback explains what went wrong and how to improve. The LLM stores this feedback in a memory buffer and use this feedback when it encounters similar situation later, to make a better decision.

The Reflexion process

It utilizes three distinct models: an Actor, which generates text and actions; an Evaluator model, that scores the outputs produced by actor and a Self-Reflection model, which generates verbal reinforcement cues to assist the Actor in self-improvement.

Actor

LLM that is specifically prompted to generate necessary text and actions.

Evaluator

This assesses the output generated by the actor. It computes a reward score that reflects the performance of the actor within the given task context. This is done in several ways -

Reward functions based on exact match (EM) grading, ensuring that the generated output aligns closely with the expected solution.
In decision-making tasks, employ pre-defined heuristic functions that are tailored to specific evaluation criteria.
Using a different instantiation of an LLM itself as an Evaluator, generating rewards for decision-making and programming tasks.

This multi-faceted approach to Evaluator design allows to examine different strategies for scoring generated outputs, offering insights into their effectiveness and suitability across a range of tasks.

Self-reflection

Generates verbal self-reflections to provide valuable feedback for the future trials. It takes the reward signal such as success or failure from the evaluator, the entire path the actor took and the past lessons learned from the memory then provide a constructive feedback which tells actor what to do differently. This verbal feedback is then stored in the actor's memory.

Memory

It has two components: short-term memory and long-term memory.

The short-term memory stores the recent history of the actions actor took and the long-term memory stores the lessons learned in the past attempts in the form of verbal feedback generated by self-reflection model.

The importance of memory in Reflexion

Provides context: The short-term memory gives the Actor all the details about the current situation. This helps it understand what's happening right now.
Informs future actions: The long-term memory (verbal feedback) helps the Actor make better decisions in the future. It reminds the Actor of past mistakes and suggests better choices based on learned experiences.

Benefits of Reflexion

By getting specific advice after each try, the Actor can improve its decision-making much faster than with just a win/lose signal.
The Actor can use both its recent experience (trajectory) and past lessons (memory) to make better choices in the future.

Conclusion

By using both short-term (current situation) and long-term memory (past lessons), Reflexion agents can make informed decisions that are better than other AI approaches that rely only on the current situation. It's like having both the immediate details and the wisdom of experience to guide your actions.

Overall, the Self-reflection model is like a super-coach that helps the Actor learn from its mistakes and become a better decision-maker.

Source

Reflexion: Language Agents with Verbal Reinforcement Learning

Amit Sharma's Blog

Amit Sharma's Blog

"Hey, You Made a Mistake!": Coaching AI Agents with Verbal Feedback

AI Agentic Design Pattern: Reflexion

The Reflexion process

Benefits of Reflexion

Conclusion