Sometimes, website authors change the URL of a feed without adding a redirect. This causes a problem for people subscribed to the feed: without the redirect, the software used to subscribe to the feed will have an out of date URL. After seeing Artemis, the calm web reader that I run, try to retrieve several dozen feeds that now return 404s, I thought to myself: is there a way to recover them?
I had a few ideas. I could go to the home page of the feed website, then perform RSS auto-discovery. This involves searching for HTTP headers or HTML link
tags that indicate a feed is available for a page. If a feed is found, that feed could be used as the new feed. This, however, only works if the home page adds rel=alternate
tags. Some blogs I analysed only had <a>
links to RSS feeds with anchors like “rss” or “feed” or “rss feed”. Thus, I added a case to search for such links and consider them as candidate feeds.
I tried this logic out on ~150 feeds users are subscribed to in Artemis that return 404s. I was able to reconcile ~60% using the logic above. With such a success rate, I then thought: how do I make this available to users?
This process cannot be automated due to the ambiguity and potential for error in reconciling feeds. For example, an author may offer two feeds: one for blog posts and another for bookmarks. To reconcile this, it is best to ask the user which feed meets their preferences. In addition, not all feeds are available from home pages: for news sites or YouTube, for example, you cannot retrieve a feed by going to the home page.
For the reasons above, I decided that feed recovery would not be a feature that runs exclusively in the background. Instead, users would be able to run a wizard when a feed breaks and choose from all the feeds that are discovered.
There were three discrete areas of development:
- Keep track of 404s when feeds are polled.
- Indicate to users when a feed has returned 404s over the last three days or more.
- Create a recovery tool that uses the aforementioned logic to aim to recover a tool.
First, I added a system that logs all 404 requests. If a feed returns a 404 three or more days in a row, a visual indicator is added to a user’s Authors page. This page lists all feeds to which the user is subscribed. The visual indicator turns the background colour of the record for the blog from its default to a red colour. A [Fix]
button also appears.
A user can see a list of errors by clicking “Edit” and scrolling to the Retrieval Errors section:
When clicked, the [Fix]
button runs the aforementioned logic to try and find feeds. If feeds are found, they are listed on the page:
A user can then click on a feed to preview it:
This preview is essential to give a user confidence that the feed that was found contains the content they are expecting.
If the preview meets the user’s expectations, they can click “Accept” to change the feed URL; otherwise, the user can go back to see the other feeds that are available. If a user accepts the new URL, the warning that indicates the feed is broken goes away:
In the worst case scenario, this feature informs a user that a feed is broken and cannot identify an appropriate new feed. This is acceptable because knowledge of a broken feed allows a user to independently work with the knowledge that a feed is broken. In the best case scenario, Artemis can help someone find a replacement for a broken feed without having to manually look for a new feed.
This feature is now available for all users in Artemis. For this feature to work on any author to whom you are subscribed, you will need to be subscribed to a URL for at last three days. This is because errors are only surfaced if a feed was unretrievable for three or more days. If you have any questions, feel free to send me an email at [email protected].
This feature is an example of a forgiving interface. The application knows that feeds break, and offers the user both insight into when a feed is broken and a means by which the user can find a new feed.
A variation of the feed recovery code is open source on GitHub.