graphcap Architecture Overview#

Audience and Architectural Principles#

This document targets system architects and software engineers maintaining or extending the graphcap system.

graphcap is designed explicitly for small to medium-sized on-premises or single-user deployments, adopting a local-first architecture. The system prioritizes:

Offline-first Operation: Capable of functioning with intermittent or no internet connectivity.
Local Data Sovereignty: Data persists locally and synchronizes opportunistically.
Modular and Extensible Components: Clear service boundaries and stateless design patterns for maintainability.

System Components#

graphcap consists of specialized services working together to provide image captioning capabilities:

React Client: Web-based user interface and system orchestrator.
Data Service: Manages database operations and data persistence.
Inference Bridge: Performs AI-based image captioning.
Media Server: Manages image storage, retrieval, and processing.

Communication Flows#

REST API Communication#

The React Client orchestrates all services through direct REST API calls:

Client ↔ Data Service: Database operations, caption storage, retrieval.
Client ↔ Inference Bridge: Caption generation requests and responses.
Client ↔ Media Server: Image upload, retrieval, processing, file system operations.

Independent Service Operation#

Each service operates independently without direct communication with other services:

Data Service: Persists caption data, handles database operations.
Inference Bridge: Processes image captioning requests when invoked.
Media Server: Manages file operations and image processing.

All services directly access the shared workspace volume for file operations.

WebSocket Communication#

Real-time updates may be implemented through direct WebSocket connections:

Data Service → Client: Database updates and events.
Inference Bridge → Client: Caption processing status.

Component Deep Dives#

Data Service#

Manages data persistence and database operations:

Single source of truth via PostgreSQL.
Stores caption data, metadata, and relationships.
Provides REST APIs for data retrieval and modification.
Only service with direct PostgreSQL access.

Inference Bridge#

Stateless AI caption processing:

Receives caption requests directly from the client.
Communicates with AI providers (Gemini, Ollama, OpenAI).
Returns results directly to the client.
Remains completely stateless.
Reads images from the shared workspace volume.

Media Server#

Responsible for media asset management:

Provides file upload, retrieval, and processing APIs.
Manages workspace directory structure.
Generates thumbnails, extracts metadata.
Handles all file system operations on the workspace.

React Client#

Interactive front-end interface and system orchestrator:

Orchestrates workflow between services.
Directly communicates with all services.
Manages UI state and user experience.
Coordinates business logic and process flow.
Utilizes TanStack Query for efficient state management.