Appearance
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. The project consists of:
- Backend (Python): FastAPI/Flask-based API server with RAG pipeline, document processing, LLM integrations
- Frontend (React/TypeScript): Modern React application built with Umi framework, supporting micro-frontend architecture
- Agent System: Conversational AI agents with component-based workflow engine
- Document Processing: Deep document understanding with OCR, layout recognition, and chunking
- Vector Storage: Elasticsearch/Infinity for full-text and vector search
Common Development Commands
Backend Development
bash
# Install dependencies (requires Python 3.10-3.12)
uv sync --python 3.10 --all-extras
uv run download_deps.py
pre-commit install
# Start backend services
source .venv/bin/activate
export PYTHONPATH=$(pwd)
bash docker/launch_backend_service.sh
# Start dependent services (MySQL, Redis, MinIO, Elasticsearch)
docker compose -f docker/docker-compose-base.yml up -d
# Run tests
uv run pytest
uv run python -m pytest test/ -m p1 # High priority testsFrontend Development
bash
cd web
# Install dependencies
npm install
# Development server
npm run dev # Standard development
npm run dev:micro # Micro-frontend mode
npm run start:static # Static development mode
# Build
npm run build # Production build
npm run build:micro # Micro-frontend build
npm run build:copy # Build with PDF.js assets
# Testing and Linting
npm run test # Jest tests
npm run lint # ESLint checkDocker Development
bash
# Full stack with Docker
cd docker
docker compose -f docker-compose.yml up -d # CPU mode
docker compose -f docker-compose-gpu.yml up -d # GPU mode
# Build custom images
docker build -f Dockerfile -t ragflow:custom . # Full image (~9GB)
docker build --build-arg LIGHTEN=1 -f Dockerfile -t ragflow:slim . # Slim image (~2GB)Architecture Overview
Backend Architecture
- API Layer (
api/): REST endpoints organized by domain (chat, documents, datasets, agents) - RAG Engine (
rag/): Core retrieval-augmented generation pipeline - Document Processing (
deepdoc/): PDF/DOCX/Excel parsing with vision models - Agent System (
agent/): Component-based conversational AI workflows - Graph RAG (
graphrag/): Knowledge graph construction and reasoning - LLM Integrations (
rag/llm/): Adapters for OpenAI, Anthropic, local models
Frontend Architecture
- Framework: Umi 4 with React 18, TypeScript
- UI Libraries: Ant Design + Radix UI components with Tailwind CSS
- State Management: Zustand stores with React Query for server state
- Routing: File-based routing with nested layouts
- Micro-Frontend: Qiankun support for integration with main applications
Key frontend patterns:
- Custom hooks in
hooks/for API interactions and business logic - Reusable UI components in
components/ui/following Radix patterns - Service layer in
services/for API abstractions - Utils in
utils/for common functionality (auth, validation, formatting)
Data Flow
- Document Ingestion: Files → DeepDoc parsing → Chunking → Vector embedding → Storage
- RAG Pipeline: Query → Retrieval → LLM augmentation → Response with citations
- Agent Workflows: Components → Canvas execution → Tool calling → Response generation
Important Configuration Files
Backend Configuration
pyproject.toml: Python dependencies and project metadatadocker/service_conf.yaml.template: Service configuration templateconf/service_conf.yaml: Runtime service configurationrag/settings.py: RAG pipeline settings
Frontend Configuration
web/.umirc.ts: Umi framework configuration with micro-frontend supportweb/package.json: Dependencies and build scriptsweb/tailwind.config.js: Tailwind CSS configurationweb/src/routes.ts: Application routing configuration
Environment Setup
docker/.env: Docker environment variablesdocker/docker-compose*.yml: Container orchestration for different environments
Testing Strategy
Backend Tests
- Unit tests in
test/directory using pytest - API tests for all endpoints with authentication
- Integration tests for RAG pipeline components
- Priority markers:
@pytest.mark.p1(high),@pytest.mark.p2(medium),@pytest.mark.p3(low)
Frontend Tests
- Jest with React Testing Library in
web/src/ - Component tests for UI elements
- Integration tests for API interactions
- Coverage reporting enabled
Key Development Workflows
Adding New Document Parsers
- Implement parser in
deepdoc/parser/ - Add parser registration in
rag/app/ - Update frontend document type handling
- Add tests for new document types
Creating Agent Components
- Define component in
agent/component/ - Register in component factory
- Add UI representation in
web/src/pages/flow/ - Implement component logic and tests
LLM Model Integration
- Add model adapter in
rag/llm/ - Update
conf/llm_factories.json - Add frontend model selection UI
- Configure API key handling
Frontend Page Development
- Create page component in
web/src/pages/ - Add route in
web/src/routes.ts - Implement API hooks in
web/src/hooks/ - Add navigation and breadcrumbs
Micro-Frontend Integration
The frontend supports deployment as a micro-application:
- Qiankun Integration: Configured in
web/src/app.tsxwith lifecycle hooks - Path Handling: Dynamic publicPath based on
MICRO_APPenvironment variable - Token Sharing: Automatic token propagation from main application
- API Prefix: Configurable API prefixes for different deployment contexts
Build for micro-frontend: npm run build:micro
Security Considerations
- API endpoints require authentication via JWT tokens
- File uploads validated for type and size
- SQL injection protection in database queries
- XSS protection in frontend components
- Secrets managed via environment variables
- CORS properly configured for development/production
Performance Optimization
Backend
- Vector search with Elasticsearch/Infinity
- Async processing for document parsing
- Connection pooling for databases
- Caching with Redis for frequently accessed data
Frontend
- Code splitting with dynamic imports
- Image optimization and lazy loading
- React Query for efficient data fetching
- Bundle optimization with Terser
Development Environment Setup
- Prerequisites: Python 3.10+, Node.js 18+, Docker 24+
- Backend: Follow "Backend Development" commands above
- Frontend: Follow "Frontend Development" commands above
- Database: Use Docker Compose for development dependencies
- Environment: Copy
docker/.envand configure as needed
Ensure vm.max_map_count >= 262144 for Elasticsearch:
bash
sudo sysctl -w vm.max_map_count=262144