๐Ÿ—๏ธ System Design

Architecture

A 4-layer AI Factory model inspired by the GraphRAG and LightRAG papers, built on TigerGraph for graph storage and traversal.

๐Ÿ”ถ
Layer 01
Graph Layer
TigerGraph Cloud
๐Ÿ”€
Layer 02
Orchestration Layer
Dual Pipeline Router
๐Ÿค–
Layer 03
LLM Layer
12 Providers via Universal API
๐Ÿ“Š
Layer 04
Evaluation Layer
RAGAS + F1/EM + Cost Tracking
Query โ†’ Graph โ†’ Orchestration โ†’ LLM โ†’ Evaluation โ†’ Answer
Layer 01

Graph Layer

TigerGraph Cloud

Foundation of the system. TigerGraph stores entities, relationships, and their properties as a native graph. GSQL queries enable multi-hop traversal that would be prohibitively expensive with traditional databases.

Entity storage with typed vertices (PERSON, LOCATION, WORK, etc.)
Relationship edges with properties (BORN_IN, DIRECTED, etc.)
GSQL queries for 1-hop, 2-hop, and multi-hop traversal
Schema-bounded extraction โ€” only valid vertex types accepted
Real-time graph updates via ingestion pipeline
graph_layer.py
# GSQL Multi-Hop Query
CREATE QUERY find_connections(VERTEX<Entity> start, INT hops) {
  Start = {start};
  FOREACH i IN RANGE[1, hops] DO
    Start = SELECT t
            FROM Start:s -(HAS_RELATION:e)-> Entity:t
            ACCUM @@paths += (s, e.relation, t);
  END;
  PRINT @@paths;
}
Layer 02

Orchestration Layer

Dual Pipeline Router

The brain of the system. Analyzes incoming queries, classifies their complexity, and routes them through the appropriate pipeline โ€” Baseline RAG for simple queries, GraphRAG for complex multi-hop questions.

Adaptive Query Router โ€” complexity scoring (0.0โ€“1.0)
Query type classification (bridge, comparison, factoid)
Dual-Level Keyword extraction (high-level concepts + low-level entities)
Pipeline A: Query โ†’ Vector Search โ†’ LLM (fast, cheap)
Pipeline B: Query โ†’ Entity Extraction โ†’ Graph Traversal โ†’ LLM (precise)
orchestration_layer.py
# Adaptive Query Router
class AdaptiveRouter:
  def classify(self, query: str) -> RouteDecision:
    complexity = self.score_complexity(query)
    query_type = self.detect_type(query)  # bridge/comparison/factoid

    if complexity > 0.6 or query_type == "bridge":
      return Route.GRAPHRAG
    return Route.BASELINE
Layer 03

LLM Layer

12 Providers via Universal API

Universal LLM abstraction that supports 12 providers through a single API. Swap between Claude, GPT-4, Gemini, Llama, and more with one parameter change โ€” no code modifications needed.

Anthropic Claude (Sonnet 4, Haiku 4)
OpenAI (GPT-4o, GPT-4o-mini)
Google Gemini (2.0 Flash, Pro)
Meta Llama via Groq / Together / HuggingFace
Mistral, DeepSeek, Cohere, xAI Grok, OpenRouter
Local: Ollama for fully offline inference
llm_layer.py
# Universal LLM โ€” one interface, 12 providers
llm = UniversalLLM(provider="anthropic", model="claude-sonnet-4")
response = llm.generate(
  context=graph_evidence,
  query=user_question,
  max_tokens=500
)
# Switch provider with one line:
llm = UniversalLLM(provider="groq", model="llama-3.3-70b")
Layer 04

Evaluation Layer

RAGAS + F1/EM + Cost Tracking

Automated evaluation that measures every query. Computes F1 score, Exact Match, RAGAS metrics, token usage, latency, and USD cost for both pipelines. Powers the benchmark dashboard and cost projections.

F1 Score โ€” token-level overlap with ground truth
Exact Match โ€” binary correctness metric
RAGAS integration โ€” faithfulness, relevancy, context metrics
Token counting โ€” input/output per provider
Cost tracking โ€” USD per query based on provider pricing
Latency measurement โ€” end-to-end milliseconds
evaluation_layer.py
# Evaluation Layer
evaluator = RAGASEvaluator()
metrics = evaluator.evaluate(
  query=question,
  answer=llm_response,
  ground_truth=reference_answer,
  context=retrieved_context
)
# Returns: { f1: 0.89, em: 1.0, tokens: 2400,
#            cost_usd: 0.0096, latency_ms: 1800 }
Tech Stack

Built with modern tools

TigerGraph
TigerGraph
Graph Database
Anthropic
Anthropic
LLM Provider
Google Gemini
Google Gemini
Generation Model
Groq
Groq
Independent Judge
Python
Python
Backend
Next.js
Next.js
Frontend
HuggingFace
HuggingFace
Embeddings
Vercel
Vercel
Hosting