The Evolution of Web Search: From Retrieval-Augmented Generation to Knowledge Graphs

Max Carter

Max Carter

January 14, 2025 · 5 min read
The Evolution of Web Search: From Retrieval-Augmented Generation to Knowledge Graphs

The journey to create a knowledge graph, a structured, machine-readable framework, has been a significant milestone in the evolution of web search engines. From simple keyword matching to sophisticated entity recognition, the development of knowledge graphs has enabled advanced reasoning and AI tasks. This transformative process has been instrumental in solving complex problems in AI, particularly in the realm of question answering (QA) systems.

The early search engines, such as AltaVista, relied on simple keyword matching, treating web pages as isolated entities. However, web pages are interconnected through hyperlinks, and Google's recognition of this led to a significant enhancement in search capabilities. The introduction of the knowledge graph in 2012, encapsulated by the phrase "things not strings," aimed to connect entities rather than just words. This shift in perspective has been instrumental in supporting more advanced functions like reasoning, which require a structured, machine-readable framework.

The graph of knowledge (GoK) is a broader, more conceptual idea focusing on interconnected information, without necessarily being highly structured. In contrast, the knowledge graph (KG) refers to a formal, structured, machine-readable network of entities and relationships, designed for advanced reasoning and AI tasks. The creation of knowledge graphs has transformed how information is retrieved, organized, and connected, moving from simple keyword matching to sophisticated entity recognition.

QA systems are one of the most powerful applications in the generative AI space, requiring the ability to extract precise information from both structured and unstructured data. There are three common types of questions, each with varying levels of complexity and requirements for structured data: single-point access questions, multi-point access questions, and advanced reasoning questions. While the first two question types can often be answered using a graph of knowledge, the third type demands a more structured approach – a true knowledge graph.

Retrieval-augmented generation (RAG) has emerged as the state-of-the-art approach for question answering in the generative AI era. However, RAG treats documents as independent entities, indexing each document segment separately. To address this limitation, Microsoft introduced the concept of GraphRAG in early 2024, which organizes information into a graph of knowledge, enabling it to leverage relationships between pieces of information.

GraphRAG helps build a graph of knowledge by connecting fragmented text into a graph-like structure, providing large language models (LLMs) with more relevant, interconnected input to improve question answering performance. By treating text passages as nodes in a graph, GraphRAG enables graph operations like community detection, pattern extraction, and graph traversal. These operations allow for the synthesis of multiple pieces of information, which can then be fed into RAG models to generate richer, more accurate answers to multi-point questions.

While GraphRAG relies on the reasoning capabilities of LLMs to connect text-based data in a graph of knowledge, the third type of question – those requiring deep reasoning – need more than a GoK. They require a fully structured knowledge graph, where facts, entities, and relationships are organized into a formal ontology. In these scenarios, LLMs are still important, but their role shifts from generating or synthesizing content to querying the structured KG.

A recent publication, QirK: Question Answering via Intermediate Representation on Knowledge Graphs, outlines a framework for combining LLM capabilities with the logical power of knowledge graphs to answer complex queries. The framework supports question answering on top of the popular Wikidata knowledge graph, enabling the answering of complex queries like "Name a movie directed by Quentin Tarantino or Martin Scorsese that has Robert De Niro as a cast member."

The road to advanced AI reasoning is complex and challenging, but the rewards are immense. Intermediate steps, such as the graph of knowledge, offer practical solutions that advance AI applications like question answering, even as we work toward the more ambitious goal of fully realized knowledge graphs. By bridging the gap between unstructured text and structured knowledge, tools like GraphRAG pave the way for AI systems capable of answering increasingly complex questions with greater accuracy, making the vision of advanced, reasoning-powered QA systems a reality.

Nikolaos Vasiloglou, VP of Research-ML for RelationalAI, notes that generative AI provides an unprecedented opportunity to reshape the way we organize and retrieve knowledge. As we continue to advance AI applications, the journey from unstructured data to a fully structured knowledge graph will be instrumental in unlocking the full potential of AI systems.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.