How Folki Works: The Architecture of Truth
From raw catalogs to verified Knowledge Graphs: The 4-step enrichment pipeline.
# How Folki Works: The Architecture of Truth
Last Updated: January 26, 2026
Level: Technical
Reading Time: 8 Minutes
Folki is not a simple database tool or a manual input form. It is an autonomous research and structuring engine. While most PIM (Product Information Management) systems are empty boxes waiting for you to type in data, Folki is an active worker that goes out and finds the data for you.
This document outlines the technical architecture of the Folki pipeline, explaining exactly how we transform a barren "Product ID" into a rich, verified "Knowledge Graph."
Phase 1: Ingestion & Gap Analysis
The process begins when you connect Folki to your store (Shopify, WooCommerce, or custom feed).
1. The Shadow Catalog
Folki creates a readonly shadow copy of your catalog.
- Safety First: We never write directly to your live database during the research phase. All operations happen in our isolated sandbox.
- Sync State: We maintain a hash of your product data to detect when prices or titles change in your main store, triggering re-evaluation.
2. The "Thin Data" Audit
Before we research, we must know what is missing.
- Schema Mapping: We map your products to their Schema.org type (e.g., `Product` vs `Vehicle` vs `SoftwareApplication`).
- Density Scoring: We scan for "Attribute Gaps."
- *Input:* "Sony A7 IV Camera"
- *Analysis:* "Missing: Sensor Resolution, ISO Range, Battery Type, Video Codecs."
- *Verdict:* High Priority for Enrichment.
Phase 2: Autonomous Research (The Agentic Layer)
This is the core differentiator. We employ a swarm of specialized AI agents (built on LLMs like GPT-4o and Gemini 1.5 Pro) to act as specialized digital librarians.
1. Source Discovery Agent
This agent's sole job is to find the Truth.
- Strategy: It constructs complex search queries (`"Model Number + datasheet filetype:pdf"`, `"official specifications site:manufacturer.com"`).
- Filtering: It evaluates domain authority in real-time. It knows that `sony.com` is authoritative, while `gadget-rumors.xyz` is not.
2. Extraction Agent (OCR & NLP)
Once a source is found, this agent reads it.
- Vision Capabilities: It can "read" charts, tables, and infographics in PDF brochures using Computer Vision.
- Context Awareness: It distinguishes between "Package Weight" (shipping) and "Item Weight" (product spec).
- Normalization: It converts unstructured text (`"approx 2.5 lbs"`) into structured values (`value: 1.13, unit: "kg"`).
3. Verification Agent
This agent acts as the auditor.
- Cross-Referencing: It compares the extracted value against other sources.
- Hallucination Check: It runs semantic consistency checks (e.g., verifying that a "55-inch TV" doesn't have a screen size of "12 inches").
Phase 3: The Knowledge Graph Construction
Raw data is just a list. To make it "AI Ready," we must structure it into a Graph.
1. Entity Resolution
Folki understands relationships.
- *Raw:* `Lens Mount: E-mount`
- *Graph:* `[Subject: Camera] --(compatibleWith)--> [Object: E-mount System]`
This allows AI search engines to answer complex queries like *"Which cameras fit my existing lenses?"*
2. Unit Standardization
All measurements are converted to International System of Units (SI) standards for backend storage, while preserving user-friendly display formats (Imperial/Metric) for the frontend.
3. Citation Linking
We attach the Proof to the Fact. Every node in the graph carries metadata about its source URL and retrieval timestamp. This creates the "Trust Signal" required by modern search algorithms.
Phase 4: Delivery & Injection
The final phase is delivering this structured intelligence to the "eyes" of Search Engines and AI bots.
1. The JSON-LD Payload
We generate a sophisticated, nested JSON-LD script.
- Standard Properties: `name`, `description`, `price`.
- Rich Attributes: `additionalProperty` array for all the technical specs.
- Merchant Data: `shippingDetails`, `returnPolicy`, `hasMerchantReturnPolicy`.
2. Headless Sync / App Blocks
- Shopify: We use Metaobjects and App Blocks to inject the code without slowing down your theme.
- Headless: You can fetch the pre-computed schema via our API (`GET /api/v1/products/{id}/schema`) and render it server-side.
The Result: "Citable" Products
By the end of this pipeline, your product has graduated from a generic "Web Page" to a structured "Knowledge Entity."
- Before: A string of text that requires a human brain to interpret.
- After: A machine-readable entity with verified attributes, ready for the semantic web.
This architecture ensures that when the "Machine Customer" of 2026 comes looking, your product isn't just found—it's understood.