Incremental Knowledge Base Updates¶
Overview¶
Fellow supports incremental updates to the knowledge base, allowing you to efficiently update only the changed parts of your codebase instead of re-analyzing everything.
How It Works¶
Initial Build (Full Extraction)¶
Creates: - .fellow-data/semantic/factual_knowledge.json - .fellow-data/semantic/procedural_knowledge.json - .fellow-data/semantic/conceptual_knowledge.json - .fellow-data/semantic/SEMANTIC_KNOWLEDGE_SUMMARY.md - .fellow-data/semantic/extraction_metadata.json (tracks extraction state)
Incremental Update¶
If knowledge base already exists, automatically performs incremental update: 1. Detects changed files (via git or file modification time) 2. Re-extracts knowledge only for changed files 3. Merges new knowledge with existing knowledge base 4. Updates metadata and summary
Force Full Rebuild¶
Forces complete re-extraction even if knowledge base exists.
Extraction Metadata Format¶
File: .fellow-data/semantic/extraction_metadata.json
{
"version": "2.0.0",
"project_path": "/path/to/project",
"last_full_extraction": "2026-01-05T10:00:00Z",
"last_update": "2026-01-05T15:30:00Z",
"extraction_method": "incremental",
"git_info": {
"commit_hash": "abc123def456",
"branch": "main",
"has_uncommitted_changes": false
},
"file_registry": {
"src/main.py": {
"last_analyzed": "2026-01-05T10:00:00Z",
"hash": "sha256:abc123...",
"size": 1234,
"entities_extracted": ["MainClass", "ConfigLoader"],
"workflows_extracted": ["startup_workflow"],
"status": "analyzed"
},
"src/services/auth.py": {
"last_analyzed": "2026-01-05T15:30:00Z",
"hash": "sha256:def456...",
"size": 2345,
"entities_extracted": ["AuthService", "TokenValidator"],
"workflows_extracted": ["authentication_flow"],
"status": "analyzed"
}
},
"statistics": {
"total_files_analyzed": 25,
"total_entities": 45,
"total_workflows": 12,
"last_update_duration_seconds": 15,
"files_changed_since_last_update": 3
}
}
Change Detection Strategies¶
1. Git-Based (Preferred)¶
If project is a git repository:
# Detect changed files since last extraction
git diff --name-only <last_commit_hash> HEAD
# Detect uncommitted changes
git diff --name-only
git ls-files --others --exclude-standard
Advantages: - Precise change tracking - Handles renames and moves - Integrates with version control - Tracks deletions
2. File Modification Time¶
If not a git repository:
# Compare file modification times with last_analyzed timestamp
find . -name "*.py" -newer .fellow-data/semantic/extraction_metadata.json
Advantages: - Works without git - Simple and reliable - Fast detection
3. File Hash Comparison¶
Calculate SHA-256 hash and compare with stored hashes:
import hashlib
def file_hash(filepath):
sha256 = hashlib.sha256()
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b""):
sha256.update(chunk)
return sha256.hexdigest()
Advantages: - Most accurate - Detects content changes even if mtime unchanged - Cross-platform
Incremental Extraction Process¶
Phase 1: Change Detection¶
- Load metadata: Read
extraction_metadata.json - Detect changes:
- Use git if available:
git diff --name-only <commit> - Fall back to file modification time
- Calculate hashes if needed
- Categorize changes:
- Modified files: Re-extract knowledge
- New files: Extract knowledge
- Deleted files: Remove from knowledge base
- Renamed files: Update references
Phase 2: Targeted Extraction¶
For Modified/New Files:
- Factual Knowledge:
- Extract entities from changed files
- Update relationships involving these entities
- Remove old entities from these files
-
Add new entities
-
Procedural Knowledge:
- Re-analyze workflows starting from changed files
- Update affected workflows
-
Remove workflows that no longer exist
-
Conceptual Knowledge:
- Typically requires full re-analysis (architecture overview)
- Or: Only update if architectural changes detected
Phase 3: Knowledge Base Merge¶
Merge Strategy:
# Pseudo-code for merge logic
def merge_factual_knowledge(existing, new_extraction, changed_files):
# Remove entities from changed files
existing_entities = [
e for e in existing["entities"]
if e["grounding"]["file"] not in changed_files
]
# Add newly extracted entities
updated_entities = existing_entities + new_extraction["entities"]
# Update relationships
updated_relationships = update_relationships(
existing["entity_relationships"],
new_extraction["entity_relationships"],
changed_files
)
return {
"metadata": update_metadata(existing["metadata"]),
"entities": updated_entities,
"entity_relationships": updated_relationships,
"summary": recalculate_summary(updated_entities)
}
Conflict Resolution: - Entity name conflicts: Use file path + line number as unique identifier - Relationship updates: Remove old, add new - Metadata merging: Keep historical info, update current stats
Phase 4: Metadata Update¶
{
"last_update": "2026-01-05T15:30:00Z",
"extraction_method": "incremental",
"git_info": {
"commit_hash": "new_commit_hash"
},
"file_registry": {
// Update entries for changed files
"src/services/auth.py": {
"last_analyzed": "2026-01-05T15:30:00Z",
"hash": "new_hash",
"status": "analyzed"
}
},
"statistics": {
"files_changed_since_last_update": 3,
"entities_added": 2,
"entities_removed": 1,
"entities_updated": 1,
"workflows_updated": 1
}
}
Performance Comparison¶
Full Extraction¶
- Time: 2-5 minutes for medium project (10K-50K LOC)
- Analyzes: All files
- CPU Usage: High
- Use Case: Initial build, major refactoring
Incremental Update¶
- Time: 5-30 seconds for typical changes (1-10 files)
- Analyzes: Only changed files
- CPU Usage: Low
- Use Case: Regular development, minor changes
Speedup Example¶
Project: 100 files, 20K LOC
Change: Modified 3 files (~300 LOC)
Full extraction: 180 seconds
Incremental update: 15 seconds
Speedup: 12x faster
Usage Patterns¶
Continuous Development¶
# Day 1: Initial extraction
/build-kb
# Day 2: After changing 5 files
/build-kb # Auto-detects changes, incremental update (10 seconds)
# Day 3: After refactoring module
/build-kb # Incremental (15 seconds)
# Week 2: Major restructuring
/build-kb --full # Full rebuild (3 minutes)
CI/CD Integration¶
# .github/workflows/update-kb.yml
name: Update Knowledge Base
on:
push:
branches: [main]
jobs:
update-kb:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0 # Full history for git-based detection
- name: Update Knowledge Base
run: |
# Incremental update
/build-kb
# Commit changes if knowledge base updated
git add .fellow-data/semantic/
git commit -m "Update knowledge base [skip ci]"
git push
Team Workflow¶
# Team member A: Makes changes and updates KB
git pull
# ... make changes ...
/build-kb # Incremental update
git add .fellow-data/semantic/
git commit -m "feat: Add user service + update KB"
git push
# Team member B: Pulls changes
git pull
# KB is already up-to-date from team member A
# Can immediately use /fellow command with current context
Limitations¶
When Incremental Updates May Not Suffice¶
- Major Refactoring: Renaming entities, moving files
-
Solution: Force full rebuild with
--full -
Relationship Changes: When unchanged files now depend on changed files
-
Solution: Enhanced relationship tracking (future improvement)
-
Architectural Changes: Layer restructuring, pattern changes
-
Solution: Conceptual knowledge always does targeted re-analysis
-
Git History Lost: After rebasing, squashing commits
- Solution: Fall back to hash-based detection
Handling Edge Cases¶
Deleted Files:
# Detected by: git diff shows deletions
# Action: Remove entities with grounding to deleted files
# Update: Relationships involving deleted entities
Renamed Files:
# Detected by: git diff --find-renames
# Action: Update grounding locations
# Preserve: Entity knowledge (same content, new location)
Moved Code Within File:
# Detected by: File hash changed
# Action: Re-extract entire file
# Update: Line numbers in grounding
Future Enhancements¶
Smart Relationship Inference¶
Track entity usage across files:
{
"entity": "AuthService",
"defined_in": "src/services/auth.py",
"used_in": [
"src/api/routes.py",
"src/middleware/auth.py",
"src/services/user.py"
]
}
When AuthService changes, re-analyze usage files too.
Parallel Incremental Extraction¶
# Extract changed files in parallel
changed_files = detect_changes()
parallel_extract(changed_files, num_workers=4)
Watch Mode (Future)¶
/build-kb --watch
# Continuously monitors file changes
# Auto-updates knowledge base in background
# Useful for active development
Best Practices¶
- Commit KB to Git: Track knowledge base evolution with code
- Regular Full Rebuilds: Monthly or after major refactoring
- Use Git Tags: Tag commits after full extraction for easy reference
- Review Incremental Changes: Check what changed in KB after extraction
- CI/CD Integration: Auto-update KB on main branch commits
Troubleshooting¶
KB Out of Sync¶
# Symptoms: /fellow command returns outdated context
# Solution: Force full rebuild
/build-kb --full
Metadata Corrupted¶
# Symptoms: extraction_metadata.json invalid or missing
# Solution: Delete and rebuild
rm .fellow-data/semantic/extraction_metadata.json
/build-kb
Large Incremental Updates¶
# If incremental update takes > 1 minute
# Consider forcing full rebuild for consistency
/build-kb --full
Conclusion¶
Incremental updates make Fellow practical for active development by reducing update time from minutes to seconds. The knowledge base stays synchronized with your code without the overhead of full re-extraction.