HackerNews Profile Archivist Agent
In an effort to archive and sanitize my HackerNews profile—which contained over 15 years of saved articles—I built a multi-step agentic pipeline to a) archive my entire profile history and b) scrub it entirely. Since HackerNews doesn’t offer an API, this had to be done the old-fashioned way: with a bit of AI. The first step was creating a method to parse the HTML of each saved stories page and identify entries tied to my account. I built an agentic model that reliably archived all data, including tracking the last successfully archived item to allow for seamless recovery in case of failure. I also implemented randomized article ID checks to ensure 100% data integrity, along with a batch processing feature that chunked and merged saved items to avoid excessive URL visits. Once the archival process was complete, I built a second agent to remove each saved item by mimicking browser interactions. This required building a login system using my credentials. During testing, my agents were eventually blocked, so I integrated VPN switching along with randomized delays and actions to simulate human behavior—ensuring I could access and remove my own data without triggering anti-bot defenses by mistake. Finally, I wrote a conversion pipeline to transform the archived JSON into CSV, and then into a SQLite3 database for local querying and exploration. In the end, I successfully archived and removed thousands of links, leaving me with a clean HackerNews profile and a tidy, fully searchable local SQLite3 archive of my saved reading history.