How I Saved 17 Years of Blog Posts from a Dead Hosting Account
I started blogging in 2009. Back then it was WordPress, on shared hosting, writing in Romanian about PR, advertising events, and whatever was happening in the Bucharest social media scene. Over the years the blog evolved — I wrote in English, moved to London, San Francisco, Austin, and Denver, and the blog followed me through all of it. Marketing, startups, crypto, identity, AI, politics, food.
At some point my hosting disappeared. I don’t remember exactly when or why — either I forgot to renew, or the provider shut down, or some combination of both. The domain still pointed somewhere but the content was gone. No backup. Just gone.
Or so I thought.
The Archivarix Rescue
A few months after I discovered the blog was offline and realised there was no backup, I found Archivarix — a CMS that creates a complete offline archive of a website. It crawls every URL it can reach — the HTML, images, CSS, JavaScript — and saves everything into a local SQLite database paired with a folder of hashed files. Think of it as a personal Wayback Machine, but one that runs on your own server.
I pointed it at the Wayback Machine’s cached copies of my domain, let it run, and ended up with a compressed .tar of the archive sitting in /opt/homebrew/var/www/tituscapilnean_ro doing nothing useful — until now.
Inside: 1,364 HTML files, each a fully-rendered WordPress page. Plus 1,247 images. Everything was cross-referenced in a structure.db SQLite file that mapped URLs to filenames.
The New Setup
Meanwhile, I had been slowly rebuilding the blog on GitHub Pages using Jekyll with the Chirpy theme. I had manually migrated about 100 articles — mostly the ones I remembered writing and considered worth keeping.
But there were 17 years of posts sitting in that Archivarix archive. Around 1,300 unique articles. Most of them from the early years, in Romanian, covering things I barely remembered writing.
The Migration
Rather than migrating by hand (which would have taken weeks), I wrote a Python migration script with Claude Code. The workflow:
- Query the SQLite DB for all clean article URLs matching the pattern
YYYY/MM/slug - Parse the WordPress HTML with BeautifulSoup to extract title, date, categories, tags, and the
.entry-contentdiv (with a fallback todiv.entryfor older theme versions) - Copy images from the Archivarix
html/directory toassets/img/posts/YEAR/SLUG/, rewritingsrcpaths in the content - Flag broken images and embeds as YAML comments at the top of each post — so I can find and fix them later
- Convert HTML to Markdown with
html2text - Write Jekyll-format
.mdfiles with proper frontmatter
The trickiest parts:
- URL encoding: Romanian has diacritics (
ș,ț,ă,â,î) that WordPress URL-encodes as%c8%99,%c8%9b, etc. These needed to be decoded before using as filenames. - Two theme generations: My early 2009–2010 posts used an older WordPress theme that didn’t use the
<article>HTML5 tag, but instead adiv.entry. The script needed to handle both. - Attachment sub-pages: WordPress creates per-image attachment pages at URLs like
/2009/09/post-slug/image-filename/. These slipped through my initial filter. I fixed the SQL query to only include 3-segment paths (year/month/slug). - Duplicates: Some articles appeared twice in the DB — once with and once without a trailing slash. Deduplication by effective slug solved this.
The final result: 1,401 unique articles migrated, 716 of which have flagged issues (mostly broken images whose files weren’t captured in the archive, or old YouTube/Vimeo embeds that no longer resolve).
What Was Lost
Not everything survived. Some articles were saved in a password-protected state — the Archivarix archive captured the login form page, not the content. A handful of very early posts returned empty entry-content divs, likely due to caching or server errors at crawl time.
For the broken images and embeds: old screenshots of Romanian ad campaigns from 2009, event photos, embedded YouTube videos of things that have probably been taken down. The text is intact, which is what matters most.
The Weird Part
Reading through articles I wrote 15 years ago is strange. The 2009–2012 posts are in Romanian and read like a very enthusiastic student discovering that the internet was changing media and advertising. I was right about some things (mobile, social ads, email marketing) and spectacularly wrong about others. There are event recaps for conferences that no longer exist, companies that went bankrupt, campaigns that won awards nobody remembers.
But it’s all there. The through-line from “what is Twitter?” in 2009 to AI agents in 2026 is now visible. The blog didn’t just survive — it became a complete record.
The Tools
If you ever need to do something similar:
- Archivarix — offline website archiver, creates a browsable CMS from static files
- BeautifulSoup — Python HTML parsing
- html2text — HTML to Markdown conversion
- Jekyll Chirpy theme — clean, fast static blog
- Claude Code — the AI coding assistant that helped write the migration script, debug the edge cases, and write this article
The migration script lives in the repo at migrate.py if you want to see how it works or adapt it for your own WordPress rescue operation.
716 posts still have broken images or embed issues. I’ll fix them gradually. If you’re reading a post with a broken image — now you know why.