Feedsearch provides a simple API for searching websites for RSS, Atom, and JSON feeds. The feed search function at Auctorial is powered by Feedsearch.

API Usage

Make a GET request to https://feedsearch.auctorial.com/api/v1/search with a "url" value in the querystring containing the URL you'd like to search:

curl -X GET "https://feedsearch.auctorial.com/api/v1/search?url=arstechnica.com"

If the scheme (e.g. "https://") is not provided in the "url" value then the scheme will default to "http://".

In order to prevent the need to crawl each site every time a request to this API is made, all results for a given site (e.g. "example.com") are saved.

A request URL that does not contain a path (e.g. "http://example.com" or "example.com") will always return all saved feeds for that site.

URL Paths

If a URL that contains a path (e.g. "https://example.com/test") is requested from the API, the path will always be crawled and the results saved along with other feeds previously found on that site. However, only the results from that particular crawl are returned. This is done for the following reasons:

  • To reduce the chances of returning feeds that are irrelevant to the requested path on sites that may have many feeds.
  • To increase the chances of discovering feeds that may not be easily discoverable from the site's homepage.

In the case of sites that have only have few feeds or that have well-formed feed paths and feed discovery, the difference between the results returned from a crawl of the path and the list of all feeds at the site should be none or negligible.

Query Parameters

The API accepts the following query parameters:

  • url: The URL to search. Will return 400 Bad Request if not set.
  • info: Returns all feed metadata as below. Defaults True. If False, only found URLs are returned, and all other values will be empty or default.
  • favicon: Returns favicon as a Data Uri. Defaults True.
  • checkall: Attempts to search a range of possible feed paths for at the URL. Defaults False.
  • opml: Return the feeds as an OPML XML string. Defaults False.
curl "https://feedsearch.auctorial.com/api/v1/search?url=arstechnica.com&info=true&favicon=true&checkall=false&opml=false"

API Response

The API returns a list of found feeds in JSON format, with attached metadata about the feed.

  • bozo: Set to 1 when feed data is not well formed or may not be a feed. Defaults 0.
  • content_length: Length of the feed in bytes.
  • content_type:Content-Type/Media-Type value of the returned feed.
  • description: Feed description.
  • favicon:URL of feed or site Favicon
  • favicon_data_uri:Data Uri of the Favicon.
  • hubs: List of Websub hubs for the feed if available.
  • is_push: True if the feed contains valid Websub data.
  • last_seen: Date that the feed was last seen by the crawler.
  • last_updated: Date of the latest entry in the feed, at the time the feed was last crawled.
  • score: Computed relevance of feed url value to requested search URL. May be safely ignored.
  • self_url: The ref="self" value returned from feed links. In some cases may be different from feed url.
  • site_name: Name of the feed's website.
  • site_url: URL of the feed's website.
  • title: Feed Title.
  • url: URL link to the feed.
  • velocity: A calculation of the mean number of entries per day at the time the feed was fetched.
  • version: Detected feed type version (e.g. "rss20", "atom10", "https://jsonfeed.org/version/1").
[
  {
    "bozo": 0,
    "content_length": 82139,
    "content_type": "text/xml; charset=UTF-8",
    "description": "Serving the Technologist for more than a decade. IT news, reviews, and analysis.",
    "favicon": "https://cdn.arstechnica.net/favicon.ico",
    "favicon_data_uri": "data:image/png;base64,AAABAAMAIC...",
    "hubs": [
      "http://pubsubhubbub.appspot.com/"
    ],
    "is_push": true,
    "last_seen": "2019-07-05T19:00:00+00:00",
    "last_updated": "2019-07-05T16:00:30+00:00",
    "score": 27,
    "self_url": "http://feeds.arstechnica.com/arstechnica/index",
    "site_name": "Ars Technica",
    "site_url": "https://arstechnica.com/",
    "title": "Ars Technica",
    "url": "http://feeds.arstechnica.com/arstechnica/index",
    "velocity": 7.827,
    "version": "rss20"
  }
]

Documentation

Feedsearch is written as an Python library, and is available as a Python package on PyPI.

Further documentation and source code can be found at the Feedsearch-Crawler GitHub repository.