The Hidden Blog

As it turns out, I do have a hostname.

WIP: Added authentication, archiving and monitoring

Published on Aug 01, 2024

This is the second update in my "WIP" post series, previous post:

Since last time I've mostly worked on building the scheduling and archiving feature with Sidekiq and finding a good strategy to implement the rate limiting. The archive.org endpoint has a very strict rate limiting and I'm trying to be a good platform citizen by minimizing the requests to the Internet Archive.

For that I'll also do some more work soon to check the "Last modified at" header of sites to skip archiving them again, having a hard limit on how many entries you can have in your sitemap and to make it really clear that the use case is personal or very niche sites and not popular pages with millions of sitemap entries.

The interface to see that status of each sitemap looks like this now:

I also found a well maintained gem to throttle ActiveJobs called sidekiq-throttle, which even hooks into the web interface adding a "Throttled" tab.

In the next days I'll probably focus on the following areas:

  • Build "Edit Sitemap" modal

  • Rework "Add Sitemap" button behavior, right now it's just triggering the onboarding flow again

  • Add "About" and other default pages

  • Add caching header check to skip archival

Let me know if you have any feedback or feature ideas.