RAG – Donald Mucci

Summary

Generative Artificial Intelligence (GenAI) applications are transforming our world, making it possible to create text, images, music, and videos automatically.

Using advanced algorithms and neural networks, these technologies are revolutionizing areas such as art, writing, design, and video games.

In this series, I’m going to share the path I am taking to learn these technologies, how I am approaching the study, how I am setting up my development environment, what applications I intend to study, what I intend to develop, and how I am developing them. I will be discussing both open-source technologies and their equivalents in Microsoft’s Modern Work (Microsoft Copilot for Microsoft 365), as it’s part of my daily job.

Generative AI is evolving at the speed of light. There are numerous concepts to learn, with standards, tools, and frameworks springing up like mushrooms. What I write today could become obsolete by tomorrow morning. Therefore, I'm going to write this series following three guiding principles:
1. I will focus on the grand scheme of things, identifying the fundamental concepts and technologies to learn and concentrating on those without digressing. However, I will leave references in the articles for those who wish to delve deeper.
2. I will condense in my articles the results of my tests, projects, and experiments, explaining the path I followed to implement certain solutions or solve certain problems, while also mentioning possible alternative methods.
3. I will strive to keep these posts up-to-date with regular reviews.

More than ChatGPT…

At the heart of ChatGPT is an LLM (Large Language Model), an artificial intelligence model specialized in natural language processing. ChatGPT, as well as Claude, Gemini, LLAMA, and many others, are trained on large textual datasets and use deep learning techniques to ‘understand’ and generate text similarly to how a human would. LLMs can perform a variety of language-related tasks, including machine translation, sentence completion, creative writing, code programming, answering questions, and more.

With GenAI apps, we take all of this a step further, and LLMs become the core of our applications. These applications can do things that were unthinkable until recently, such as looking up information in our books in near real-time, translating into another language using our talking avatar, holding a phone conversation on our behalf, suggesting a recipe for dinner with ingredients we have at home, helping our child with homework, or finding us the cheapest tickets to our next destination (and buying them for us too!).

But how many types of GenAI apps exist? What do they comprise? What is their architecture? As mentioned, when it comes to AI, what is true today may no longer be true tomorrow morning. Let’s say that at the moment we can imagine a pyramid in which, at the base, we find LLMs pure and simple: mostly open-source models, created by OpenAI, Google, Facebook, Anthropic, Mistral, and others, with which we can interact by loading them into specialised servers such as Ollama, AnythingLLM, GPT4All to name the most famous open-source servers. Additionally, there are closed-source systems, such as OpenAI’s ChatGPT, Google Gemini, and Microsoft Copilot, which we use everyday, running the LLMs they themselves produce (in the case of Microsoft, the LLMs used are those of OpenAI).

RAGs

Immediately above this, we find RAG (Retrieval-Augmented Generation) apps, which are basically LLM servers to which we can provide additional context that they will then use to base their answers, citing the sources. For example, we can provide the RAG with the address of a website or one or more documents, and then ask the RAG to answer questions about that website or those documents. The analogy that is usually made here is that of a Court Judge who has a deep knowledge of the law but sometimes deals with a case so peculiar that it requires the advice of an expert witness. Here, the Judge is the LLM, and the expert is the RAG, whose expertise is the additional context on which the Judge relies to deal with the case.

In some respects, ChatGPT and similar models have already evolved to the stage of RAG. In their latest versions, they no longer limit themselves to being generalist LLMs. We can pass them files to be analyzed, summarized, and queried on, and they are also able to search for information on search engines themselves and cite sources in their answers.

We mere mortals cannot develop new LLMs, as it takes enormous computing resources to train them. But we can exploit open-source LLMs to develop RAGs.

AI Agents

Above the RAGs, we find the Agents, and here things start to get interesting. Agents have an LLM and execution contexts, but they also have tools, enabling them to decide for themselves how to react to a certain situation using the tools at their disposal. An AI Agent, for instance, is an app that we ask to book us the cheapest trip to a certain destination based on our preferences (preferences that the Agent already knows). An AI Agent is a fraud detection system used in banks, an automated customer service operator, Tesla’s autonomous driving system, and so on. New use cases appear almost daily.

Microsoft Copilot itself is evolving very rapidly from RAG to AI Agent (I will discuss Microsoft Copilot in more detail in other posts).

But can we mere mortals, with only a small computer with an overpriced Nvidia GPU and Visual Studio Code, create an AI Agent? Yes, we can!

Above and beyond the Agents…

And how does the pyramid end? Is there anything above the AI Agents? Above AI Agents, I would put LLM OS—something that may not exist yet but should be coming soon (who said ChatGPT 5?), first theorised by Andrej Karpathy, a kind of Operating System where the AI Agents will be its apps…

Above all, I’d only imagine the AGI (Artificial General Intelligence), the last stage, the artificial brain—something far more intelligent than human beings.

<IMHO>Something that, frankly, once in the hands of the people who govern us, will only create disasters.</IMHO>

Architecture of GenAI apps

An app is usually made of at least three things: business logic, data, and user interface. GenAI apps are no exception.

Business Layer

We have said that a GenAI app is practically an LLM “on steroids”: at the heart of the business layer is, therefore, the LLM server. This server specifies the LLM to be used and exposes the primitives for interacting with it, usually through REST APIs, to the rest of the business layer. However, the business layer of GenAI apps is also made up of a second component: the Orchestrator. The Orchestrator is fundamental in managing the multiple interactions with the LLM and the data layer, as well as the interfaces with external APIs and multiple prompts.

Data Layer

Additional contexts (files, external data sources) available to the system via various types of connectors, the implementation of which varies from case to case, form part of the data layer. In order for these external sources to be processed by the LLM, however, a further component must be introduced: the Vector Database.

Vector DatabaseS

In simple terms, Vector Databases allow for the storage and retrieval of vector representations of data, enabling efficient similarity searches and data retrieval for the LLM. These representations of data are called Vector Embeddings, i.e. pieces of information (a block of text, an image) translated into matrices of numbers in such a way that they are comprehensible to an LLM. Vector embeddings are a way to convert words and sentences and other data into numbers that capture their meaning and relationships. They represent different data types as points in a multidimensional space, where similar data points are clustered closer together.

Presentation Layer

The user interface of a GenAI app has at the very least a chat window, where we can write prompts and receive output (answers to our questions, outcome of actions taken by the agent, etc.).

With the advent of Chat GPT 4o, however, we are also starting to see multimodal user interfaces, i.e. capable of handling not only textual input/output but also audio, video and images.

Conceptual Architectures

In summary, the conceptual architecture of a RAG app can be schematised as follows:

The Agent’s one is very similar, in addition there are just the logic concerning the selection and operation of tools, as well as a more elaborate handling of prompt templates:

Of course what I’ve just described is the basic architecture: depending on the type of app and the problem to be solved though, it may be necessary to introduce new concepts, new elements, and things may become exponentially more complex.

As I said at the beginning, however, these topics are new to everyone and constantly evolving, so I think it is better to start with the fundamentals and what is certain, and then gradually add complexity and test new paths.

Recap

In this article, we have seen what GenAI apps are, their types (RAG, Agent, LLM OS), and then outlined the conceptual architecture of GenAI apps, mentioning their main components and interactions.

Coming up next

The next articles of this series will talk about:

My open source setup for the development environment.
Tools for the development of GenAI apps.
Which GenAI apps I am trying to develop.
Copilot for Microsoft 365 and GenAI applied to Modern Work.
Advanced GenAI (in-depth prompt engineering, LLMOps, etc.).
Review of the main Low-Code platforms for building GenAI apps.

Tag: RAG

Quick introduction to GenAI apps