🏗️ System Design

Architecture

A 4-layer AI Factory model inspired by the GraphRAG and LightRAG papers, built on TigerGraph for graph storage and traversal.

🔶

Layer 01

Graph Layer

TigerGraph Cloud

🔀

Layer 02

Orchestration Layer

Dual Pipeline Router

🤖

Layer 03

LLM Layer

12 Providers via Universal API

📊

Layer 04

Evaluation Layer

RAGAS + F1/EM + Cost Tracking

Query → Graph → Orchestration → LLM → Evaluation → Answer

Layer 01

Graph Layer

TigerGraph Cloud

Foundation of the system. TigerGraph stores entities, relationships, and their properties as a native graph. GSQL queries enable multi-hop traversal that would be prohibitively expensive with traditional databases.

Entity storage with typed vertices (PERSON, LOCATION, WORK, etc.)

Relationship edges with properties (BORN_IN, DIRECTED, etc.)

GSQL queries for 1-hop, 2-hop, and multi-hop traversal

Schema-bounded extraction — only valid vertex types accepted

Real-time graph updates via ingestion pipeline

graph_layer.py

# GSQL Multi-Hop Query
CREATE QUERY find_connections(VERTEX<Entity> start, INT hops) {
  Start = {start};
  FOREACH i IN RANGE[1, hops] DO
    Start = SELECT t
            FROM Start:s -(HAS_RELATION:e)-> Entity:t
            ACCUM @@paths += (s, e.relation, t);
  END;
  PRINT @@paths;
}

Layer 02

Orchestration Layer

Dual Pipeline Router

The brain of the system. Analyzes incoming queries, classifies their complexity, and routes them through the appropriate pipeline — Baseline RAG for simple queries, GraphRAG for complex multi-hop questions.

Adaptive Query Router — complexity scoring (0.0–1.0)

Query type classification (bridge, comparison, factoid)

Dual-Level Keyword extraction (high-level concepts + low-level entities)

Pipeline A: Query → Vector Search → LLM (fast, cheap)

Pipeline B: Query → Entity Extraction → Graph Traversal → LLM (precise)

orchestration_layer.py

# Adaptive Query Router
class AdaptiveRouter:
  def classify(self, query: str) -> RouteDecision:
    complexity = self.score_complexity(query)
    query_type = self.detect_type(query)  # bridge/comparison/factoid

    if complexity > 0.6 or query_type == "bridge":
      return Route.GRAPHRAG
    return Route.BASELINE

Layer 03

LLM Layer

12 Providers via Universal API

Universal LLM abstraction that supports 12 providers through a single API. Swap between Claude, GPT-4, Gemini, Llama, and more with one parameter change — no code modifications needed.

Anthropic Claude (Sonnet 4, Haiku 4)

OpenAI (GPT-4o, GPT-4o-mini)

Google Gemini (2.0 Flash, Pro)

Meta Llama via Groq / Together / HuggingFace

Mistral, DeepSeek, Cohere, xAI Grok, OpenRouter

Local: Ollama for fully offline inference

llm_layer.py

# Universal LLM — one interface, 12 providers
llm = UniversalLLM(provider="anthropic", model="claude-sonnet-4")
response = llm.generate(
  context=graph_evidence,
  query=user_question,
  max_tokens=500
)
# Switch provider with one line:
llm = UniversalLLM(provider="groq", model="llama-3.3-70b")

Layer 04

Evaluation Layer

RAGAS + F1/EM + Cost Tracking

Automated evaluation that measures every query. Computes F1 score, Exact Match, RAGAS metrics, token usage, latency, and USD cost for both pipelines. Powers the benchmark dashboard and cost projections.

F1 Score — token-level overlap with ground truth

Exact Match — binary correctness metric

RAGAS integration — faithfulness, relevancy, context metrics

Token counting — input/output per provider

Cost tracking — USD per query based on provider pricing

Latency measurement — end-to-end milliseconds

evaluation_layer.py

# Evaluation Layer
evaluator = RAGASEvaluator()
metrics = evaluator.evaluate(
  query=question,
  answer=llm_response,
  ground_truth=reference_answer,
  context=retrieved_context
)
# Returns: { f1: 0.89, em: 1.0, tokens: 2400,
#            cost_usd: 0.0096, latency_ms: 1800 }

Tech Stack

Built with modern tools

TigerGraph

Graph Database

Anthropic

LLM Provider

Google Gemini

Generation Model

Groq

Independent Judge

Python

Backend

Next.js

Frontend

HuggingFace

Embeddings

Vercel

Hosting