31 Oct. 2024 - Michael Simonetti, BSc BE MTE - Total Reads 657
One of the most misunderstood aspects of enterprise AI is how quickly and completely it forgets. Tools like ChatGPT, Claude, or Gemini may sound like they understand and recall information, but under the hood, these models are stateless. Each call to the API is a blank slate. There’s no memory of previous chats, no knowledge of user preferences, and no continuity unless you explicitly build it yourself. This is a major issue when building AI for customer service, sales assistance, coaching tools, or anything requiring follow-up or context.
At a technical level, the AI model only sees what you send in the current prompt. If you want it to remember something, you have to feed that memory back into the prompt each time. That means memory management is not a model feature — it’s an application architecture responsibility. So how do we solve this?
Step 1: Identify and Persist User Identity
Start by assigning every user a unique ID (UUID, email, or account number). On each interaction, log the conversation using this ID.
Step 2: Store and Retrieve Memory
Use a SQL database or a vector store (e.g. Pinecone, Weaviate) to keep chat history, user preferences, or extracted facts. For scalability, embed and store text as vectors, allowing retrieval by semantic similarity.
Step 3: Summarise if Needed
If token limits are an issue, summarise history before sending to the model. Use a separate LLM or prompt to condense.
Step 4: Inject Memory into Prompt
Now you can construct a prompt dynamically, layering in both current user input and summarised memory:
Additional Tips:
Use embedding models (e.g. OpenAI’s text-embedding-3-small) to vectorise long-term memories
For real-time apps, keep for example, recent 2–3 interactions inline; archive older history
Use tools like LangChain or LlamaIndex if you want to manage memory pipelines more easily
If you’re serious about building context-aware AI tools, memory is not optional — it’s foundational. It also separates production-grade systems from flashy demos. As Robert Cialdini teaches in pre-suasion, success comes from what happens before action. This architecture is your pre-suasion: the groundwork that ensures your AI delivers value users remember. Want to fast-track your build with a team that’s solved this? Contact AndMine.
Go on, see if you can challenge us on "Why Does AI keep “Forgetting” Things, like Code Segments or Parts of the Discussion ?" - Part of our 184 services at AndMine. We are quick to respond but if you want to go direct, test us during office hours.
Add Your CommentI strongly recommend AndMine and Michael's work to any forward thinking company. They have helped us rapidly evolve our business with great success in a very difficult period in retail. From online to offline they are the new generation of advertising delivering brilliant results and service. From simple EDM's to our new online store and creative campaigns, they have gone above and beyond, engrained themselves in our business and amplified our brand. We look forward to what Michael and the team achieve for us next. Harry Goles, OJAY Clothing
More Testimonials