In order to have a river-style (Dave Winer) feed about book reviews in some great newspapers (NYT, The Economist, Le Monde, Japan Times), I created a RSSMix feed of the 4 and "borrowed" a script from : http://www.idiotinside.com/2017/06/08/parse-rss-feed-with-python/

Here is the script :

# coding: utf-8

import os
import sys
import feedparser
import console

#source : http://www.idiotinside.com/2017/06/08/parse-rss-feed-with-python/

feedparser._HTMLSanitizer.acceptable_elements.update(['iframe'])

feed = feedparser.parse("http://www.rssmix.com/u/8265752/rss.xml")
# RSSmix of Books reviews from : NYT, TE, LM, JT

feed_title = feed['feed']['title']
feed_entries = feed.entries

for entry in feed.entries:
    article_title = entry.title
    article_link = entry.link
    article_published_at = entry.published # Unicode string
    article_published_at_parsed = entry.published_parsed # Time object
    article_description = entry.description
    article_summary = entry.summary
    #article_tags = entry.tags.label    <--------- PB

    console.set_color(0,1,0)
    print ("{}".format(article_title))
    console.set_color(1,1,1)
    print ("{}".format(article_published_at))
    console.set_color(0,0.75,1)
    print ("{}".format(article_link))
    console.set_color(1,1,1)
    print ("{}".format(article_summary))
    #print ("{}".format(article_tags))   <--------- PB
    print (" ")
    print ("....................")
    print (" ")

file_name = os.path.basename(sys.argv[0])
print(file_name)

All in all, it works.
I nevertheless encounter a few problems :

  • I would like to position the page at the most recent feed (top of the output), whereas the script positions it at the bottom
  • I cannot figure out how to grab the entries' tags which would allow me to "filter" some entries
  • It seems that the output keeps on growing… How do I eliminate entries e.g. older than 30 days ?

Thanks in advance for your help.