//ieam-graphragbyplayground

ieam-graphrag

HTML

IEAM GraphRAG - Neo4j Knowledge Graph RAG System

A GraphRAG (Graph Retrieval-Augmented Generation) application for IBM Edge Application Manager (IEAM) documentation using Neo4j as the knowledge graph database.

Overview

This project implements a sophisticated RAG system that:

Parses HTML documentation from local files
Constructs a knowledge graph in Neo4j with entities, relationships, and semantic connections
Supports multiple embedding providers (Ollama, OpenAI, etc.)
Provides a REST API for querying documentation using natural language
Leverages graph traversal for context-aware responses

Architecture

┌─────────────────┐
│  HTML Documents │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  HTML Parser    │
│  & Processor    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────┐
│  Entity         │────▶│  Embeddings  │
│  Extraction     │     │  (Ollama/    │
└────────┬────────┘     │   OpenAI)    │
         │              └──────────────┘
         ▼
┌─────────────────┐
│  Neo4j Graph    │
│  Construction   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GraphRAG       │
│  Query Engine   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Express API    │
│  Server         │
└─────────────────┘

Features

1. Knowledge Graph Constructionundefined

Automatic entity extraction from HTML documentation
Relationship detection between concepts
Hierarchical document structure preservation
Semantic similarity connections

2. Multi-Provider Embedding Supportundefined

undefinedOllama: Local embedding generation (nomic-embed-text, mxbai-embed-large)
undefinedOpenAI: Cloud-based embeddings (text-embedding-3-small, text-embedding-3-large)
undefinedConfigurable: Easy to add new providers

3. GraphRAG Query Processingundefined

Semantic search using vector similarity
Graph traversal for contextual information
Community detection for topic clustering
Multi-hop reasoning across related concepts

4. REST APIundefined

/api/query - Natural language queries
/api/graph/stats - Graph statistics
/api/graph/search - Entity search
/api/health - Health check

Prerequisites

undefinedNeo4j Aura or Local Instanceundefined
- Neo4j Aura: https://console.neo4j.io/
- Local: Docker or Neo4j Desktop
undefinedEmbedding Provider (choose one):
- Ollama (local): https://ollama.ai/
- OpenAI API key
undefinedNode.js >= 18.x

Installation

# Clone or navigate to the project
cd ieam-graphrag

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
nano .env

Configuration

Create a .env file with the following:

# Server Configuration
PORT=3000
HOST=localhost

# Neo4j Configuration
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j

# Embedding Provider (ollama or openai)
EMBEDDING_PROVIDER=ollama

# Ollama Configuration (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_LLM_MODEL=llama3.2:3b

# OpenAI Configuration (if using OpenAI)
OPENAI_API_KEY=your-api-key
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_LLM_MODEL=gpt-4

# Data Paths
HTML_DOCS_PATH=./data/ieam-html
PROCESSED_DATA_PATH=./data/processed

# Graph Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
SIMILARITY_THRESHOLD=0.75
MAX_GRAPH_DEPTH=3

Usage

1. Import HTML Documentation to Neo4j

# Build the project
npm run build

# Parse HTML and import to Neo4j
npm run import

# This will:
# - Parse all HTML files in data/ieam-html
# - Extract entities and relationships
# - Generate embeddings
# - Create knowledge graph in Neo4j

2. Start the API Server

# Development mode with auto-reload
npm run dev

# Production mode
npm start

3. Query the Documentation

# Using curl
curl -X POST http://localhost:3000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How to register an edge node in IEAM?"}'

# Using the provided test script
npm run test:query

4. Explore the Graph

# Get graph statistics
curl http://localhost:3000/api/graph/stats

# Search for entities
curl "http://localhost:3000/api/graph/search?q=edge+node"

Project Structure

ieam-graphrag/
├── src/
│   ├── config/
│   │   └── index.ts              # Configuration management
│   ├── parsers/
│   │   ├── html-parser.ts        # HTML document parser
│   │   └── entity-extractor.ts   # Entity extraction
│   ├── embeddings/
│   │   ├── base.ts               # Base embedding interface
│   │   ├── ollama.ts             # Ollama provider
│   │   └── openai.ts             # OpenAI provider
│   ├── graph/
│   │   ├── neo4j-client.ts       # Neo4j connection
│   │   ├── graph-builder.ts      # Graph construction
│   │   └── graph-query.ts        # Graph queries
│   ├── graphrag/
│   │   ├── query-processor.ts    # Query processing
│   │   └── context-builder.ts    # Context aggregation
│   ├── api/
│   │   ├── server.ts             # Express server
│   │   └── routes.ts             # API routes
│   ├── utils/
│   │   ├── logger.ts             # Logging utility
│   │   └── helpers.ts            # Helper functions
│   └── index.ts                  # Main entry point
├── data/
│   ├── ieam-html/                # HTML documentation
│   └── processed/                # Processed data
├── docs/
│   ├── ARCHITECTURE.md           # Architecture details
│   ├── API.md                    # API documentation
│   └── NEO4J_SETUP.md           # Neo4j setup guide
├── tests/
│   └── integration/              # Integration tests
├── .env.example                  # Environment template
├── package.json
├── tsconfig.json
└── README.md

Neo4j Graph Schema

Node Types

undefinedDocumentundefined
- Properties: id, title, url, content, embedding
- Represents a documentation page
undefinedSectionundefined
- Properties: id, title, content, level, embedding
- Represents a section within a document
undefinedEntityundefined
- Properties: id, name, type, description, embedding
- Types: Concept, Component, Command, API, Configuration
undefinedTopicundefined
- Properties: id, name, description
- Represents high-level topics/categories

Relationship Types

undefinedHAS_SECTION: Document → Section
undefinedMENTIONS: Section → Entity
undefinedRELATES_TO: Entity → Entity (semantic similarity)
undefinedBELONGS_TO: Entity → Topic
undefinedSIMILAR_TO: Document → Document (vector similarity)
undefinedNEXT: Section → Section (sequential order)

GraphRAG Query Process

undefinedQuery Embedding: Convert user query to vector
undefinedSemantic Search: Find relevant nodes using vector similarity
undefinedGraph Traversal: Expand context through relationships
undefinedCommunity Detection: Identify related concept clusters
undefinedContext Aggregation: Combine information from multiple paths
undefinedResponse Generation: Use LLM with enriched context

API Examples

Query Documentation

// POST /api/query
{
  "query": "How to register an edge node in IEAM?",
  "maxResults": 5,
  "includeGraph": true
}

// Response
{
  "answer": "To register an edge node in IEAM...",
  "sources": [
    {
      "title": "Registering Edge Nodes",
      "url": "...",
      "relevance": 0.95
    }
  ],
  "graph": {
    "nodes": [...],
    "relationships": [...]
  }
}

Get Graph Statistics

// GET /api/graph/stats
{
  "nodes": {
    "Document": 150,
    "Section": 450,
    "Entity": 320,
    "Topic": 25
  },
  "relationships": {
    "HAS_SECTION": 450,
    "MENTIONS": 1200,
    "RELATES_TO": 850
  },
  "totalNodes": 945,
  "totalRelationships": 2500
}

Advanced Features

1. Community Detection

Automatically groups related concepts:

CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId

2. PageRank for Important Concepts

Identifies key concepts:

CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score

3. Shortest Path Queries

Finds connections between concepts:

MATCH path = shortestPath(
  (a:Entity {name: 'Edge Node'})-[*]-(b:Entity {name: 'Agent'})
)
RETURN path

Performance Optimization

undefinedVector Indexes: Create vector indexes for fast similarity search
undefinedBatch Processing: Import documents in batches
undefinedConnection Pooling: Reuse Neo4j connections
undefinedCaching: Cache frequent queries
undefinedParallel Processing: Process documents concurrently

Troubleshooting

Neo4j Connection Issues

# Test connection
npm run test:neo4j

# Check Neo4j logs in Aura console

Embedding Generation Slow

# For Ollama, ensure model is pulled
ollama pull nomic-embed-text

# Check Ollama is running
curl http://localhost:11434/api/tags

Import Fails

# Check HTML files exist
ls -la data/ieam-html

# Verify Neo4j credentials
npm run test:config

Development

# Run tests
npm test

# Run specific test
npm test -- graph-builder

# Lint code
npm run lint

# Format code
npm run format

# Type check
npm run type-check

Deployment

Docker Deployment

# Build image
docker build -t ieam-graphrag .

# Run container
docker run -p 3000:3000 --env-file .env ieam-graphrag

Cloud Deployment

See docs/DEPLOYMENT.md for cloud deployment guides.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

ISC

Resources

Support

For issues and questions:

GitHub Issues: [Create an issue]
Documentation: See docs/ folder
Neo4j Community: https://community.neo4j.com/

undefinedMade with ❤️ for IEAM Documentationundefined

Find me

[beta]v0.14.0