Tracking progress on the SHM content archiving tools. Updated as work progresses.
🎉 Phase 1 Complete!
The web page archiver is working! Key features:
✅ HTML-to-SHM conversion with hierarchy inference
✅ Image upload to IPFS
✅ Dedicated archive keys (one per source)
✅ Archive profile creation with metadata
✅ Provenance tracking (source URL, timestamp, tool)
Live Demo Archives
Mozilla Blog Archive
Dedicated archive account with its own cryptographic identity:
• Mozilla Monitor Plus article - First archived article with images
IonBobcat Archive Demo
GitHub Markdown Guide - 160 blocks, proper hierarchy
CLI Usage
# Create a dedicated archive identity
shm-archive create-archive web "example-site" \
--description "Archived articles from Example" \
--source-url "https://example.com"
# Archive a URL to that identity
shm-archive url "https://example.com/article" \
--key archive-web-example-site
# Or test parsing without publishing
shm-archive test-parse "https://example.com/article"Technical Achievement: Hierarchy Inference
The hardest problem: HTML is flat, SHM requires tree structure. Solved by tracking heading levels with a stack algorithm.
Empty Mermaid block
Next: Phase 2 - More Sources
⬜ Wikipedia archiver (MediaWiki API)
⬜ Social media (Twitter, Mastodon)
⬜ Web UI for triggering archives
⬜ Batch archiving support
Source Code
Tool: ~/shm-web-archiver/
Key files:
• src/html-to-shm.js - HTML parsing and hierarchy inference
• src/image-uploader.js - IPFS image upload
• src/archive-key.js - Dedicated key management
• bin/archive.js - CLI interface