Feedsearch provides a simple API for searching websites for RSS, Atom, and JSON feeds. It's the library that runs the feed search function at Auctorial.

Feeds for xkcd

API Usage

Make a GET request to https://feedsearch.auctorial.com/search with a "url" value in the querystring containing the URL you'd like to search:

curl -X GET "https://feedsearch.auctorial.com/search?url=arstechnica.com"

If the scheme (e.g. "https://") is not provided in the "url" value then the scheme will default to "http://".

In order to prevent the need to crawl every site every time a request to this API is made, all results for a given site (e.g. "example.com") are saved.

A request URL that does not contain a path (e.g. "http://example.com" or "example.com") will always return all saved feeds for that site.

If a URL that contains a path (e.g. "https://example.com/test") is requested from the API, the path will always be crawled and the results saved along with other feeds previous found for that site: however, only the results from that particular crawl are returned. This is done for the following reasons:

  • To reduce the chances of returning feeds that are irrelevant to the requested path on sites that may have many feeds.
  • To increase the chances of discovering feeds that may not be easily discoverable from the site's homepage.

In the case of sites that have only have few feeds or that have well-formed feed paths and feed discovery, the difference between the results returned from a crawl of the path and the list of all feeds at the site should be none or negligible.

The API returns a list of found feeds in JSON format, with attached metadata about the feed.

[
  {
    "bozo": 0,
    "content_length": 82139,
    "content_type": "text/xml; charset=UTF-8",
    "description": "Serving the Technologist for more than a decade. IT news, reviews, and analysis.",
    "favicon": "https://cdn.arstechnica.net/favicon.ico",
    "favicon_data_uri": "",
    "hubs": [
      "http://pubsubhubbub.appspot.com/"
    ],
    "is_push": true,
    "last_updated": "2019-07-05T16:00:30+00:00",
    "score": 27,
    "self_url": "http://feeds.arstechnica.com/arstechnica/index",
    "site_name": "Ars Technica",
    "site_url": "https://arstechnica.com/",
    "title": "Ars Technica",
    "url": "http://feeds.arstechnica.com/arstechnica/index",
    "version": "rss20"
  }
]

The API accepts the following query parameters:

  • url: The URL to search. Will return 400 Bad Request if not set.
  • info: Returns all feed metadata as above. Defaults True. If False, only found URLs are returned, and all other values will be empty or default.
  • favicon: Returns favicon as a Data Uri. Defaults True.
  • checkall: Attempts to search a range of possible feed paths for at the URL. Defaults False.
  • opml: Return the feeds as an OPML XML string. Defaults False.
curl "https://feedsearch.auctorial.com/search?url=arstechnica.com&info=true&favicon=true&checkall=false&opml=false"

Documentation

Feedsearch is written as an Python library, and is available as a Python package on PyPI.

Further documentation and source code can be found at the Feedsearch-Crawler GitHub repository.