Articles are the raw materials ContentGems works with: It finds freely available Articles online, analyzes them, recommends the most relevant ones to you, and gives you tools to curate and share them.

Technical details

An Article is a web page with text content that was included in a Feed, and that was subsequently analyzed and indexed by ContentGems so that it can be found, based on your Filter's query terms and filter settings.

An Article is stored in the ContentGems Articles Index. The following fields are available for filtering in your Filters (Please see the separate document about Filters for how to do this):

  • Title
  • First paragraph
  • Body text
  • Classifications (e-commerce, job postings, sexually explicit)
  • Text body number of characters
  • Word count (body text)
  • Word count (title)
  • Feeds containing this Article
  • Web Domain
  • Web Domain suffixes
  • Found at date
  • Images
  • Popularity
  • Hashtags (if shared on Twitter)

ContentGems also creates a content based fingerprint for each Article. It uses this fingerprint to detect articles with very similar content and to deduplicate them before it recommends Articles to you.

Articles are discovered by ContentGems when they are included in one or more Feeds. ContentGems indexes most Articles in a Feed. However there are a few reasons that will cause ContentGems to reject an Article:

  • If the content is not written in the English language.
  • If the Article belongs to a Web Domain that we consider unsuitable for the purposes of ContentGems. This could be because we consider the Web Domain content to be of low quality, or the Web Domain contains primarily non-text based content.

Each Article gets a "Found at" timestamp when ContentGems first indexes it. In most cases this time is the publishing time of the article, however there is a chance that an older article is included in the recommendations. This can happen when someone links to an article from Social media some time after the article was first published. If ContentGems hasn't encountered the article in the past, it will treat it like new content.