Hello i was working through a collective intelligence book and its supposed to write to a text file called blogdata.txt after processing certain info within feedlist.txt. However,blogdata.txt is never created. The code doesnt give me an error or anything. But when i hit run it runs for a while but then nothing happens. Here is the code.
import feedparser
import re
# Returns title and dictionary of word counts for an RSS feed
def getwordcounts(url):
# Parse the feed
d=feedparser.parse(url)
wc={}
# Loop over all the entries
for e in d.entries:
if 'summary' in e: summary=e.summary
else: summary=e.description
# Extract a list of words
words=getwords(e.title+' '+summary)
for word in words:
wc.setdefault(word,0)
wc[word]+=1
return d.feed.title,wc
def getwords(html):
# Remove all the HTML tags
txt=re.compile(r'<[^>]+>').sub('',html)
# Split words by all non-alpha characters
words=re.compile(r'[^A-Z^a-z]+').split(txt)
# Convert to lowercase
return [word.lower() for word in words if word!='']
apcount={}
wordcounts={}
for feedurl in file('feedlist.txt'):
title,wc=getwordcounts(feedurl)
wordcounts[title]=wc
for word,count in wc.items():
apcount.setdefault(word,0)
if count>1:
apcount[word]+=1
wordlist=[]
for w,bc in apcount.items():
frac=float(bc)/len(feedlist)
if frac>0.1 and frac<0.5:
wordlist.append(w)
out=file('blogdata.txt','w')
out.write('Blog')
for word in wordlist: out.write('\t%s' % word)
out.write('\n')
for blog,wc in wordcounts.items():
out.write(blog)
for word in wordlist:
if word in wc: out.write('\t%d' % wc[word])
else: out.write('\t0')
out.write('\n')
Now here is feelist.txt
http://feeds.feedburner.com/37signals/beMH
http://feeds.feedburner.com/blogspot/bRuz
http://battellemedia.com/index.xml
http://blog.guykawasaki.com/index.rdf
http://blog.outer-court.com/rss.xml
http://feeds.searchenginewatch.com/sewblog
http://blog.topix.net/index.rdf
http://blogs.abcnews.com/theblotter/index.rdf
http://feeds.feedburner.com/ConsumingExperienceFull
http://flagrantdisregard.com/index.php/feed/
http://featured.gigaom.com/feed/
http://gizmodo.com/index.xml
http://gofugyourself.typepad.com/go_fug_yourself/index.rdf
http://googleblog.blogspot.com/rss.xml
http://feeds.feedburner.com/GoogleOperatingSystem
http://headrush.typepad.com/creating_passionate_users/index.rdf
http://feeds.feedburner.com/instapundit/main
http://jeremy.zawodny.com/blog/rss2.xml
http://joi.ito.com/index.rdf
http://feeds.feedburner.com/Mashable
http://michellemalkin.com/index.rdf
http://moblogsmoproblems.blogspot.com/rss.xml
http://newsbusters.org/node/feed
http://beta.blogger.com/feeds/27154654/posts/full?alt=rss
http://feeds.feedburner.com/paulstamatiou
http://powerlineblog.com/index.rdf
http://feeds.feedburner.com/Publishing20
http://radar.oreilly.com/index.rdf
http://scienceblogs.com/pharyngula/index.xml
http://scobleizer.wordpress.com/feed/
http://sethgodin.typepad.com/seths_blog/index.rdf
http://rss.slashdot.org/Slashdot/slashdot
http://thinkprogress.org/feed/
http://feeds.feedburner.com/andrewsullivan/rApM
http://wilwheaton.typepad.com/wwdnbackup/index.rdf
http://www.43folders.com/feed/
http://www.456bereastreet.com/feed.xml
http://www.autoblog.com/rss.xml
http://www.bloggersblog.com/rss.xml
http://www.bloglines.com/rss/about/news
http://www.blogmaverick.com/rss.xml
http://www.boingboing.net/index.rdf
http://www.buzzmachine.com/index.xml
http://www.captainsquartersblog.com/mt/index.rdf
http://www.coolhunting.com/index.rdf
http://feeds.copyblogger.com/Copyblogger
http://feeds.feedburner.com/crooksandliars/YaCP
http://feeds.dailykos.com/dailykos/index.xml
http://www.deadspin.com/index.xml
http://www.downloadsquad.com/rss.xml
http://www.engadget.com/rss.xml
http://www.gapingvoid.com/index.rdf
http://www.gawker.com/index.xml
http://www.gothamist.com/index.rdf
http://www.huffingtonpost.com/raw_feed_index.rdf
http://www.hyperorg.com/blogger/index.rdf
http://www.joelonsoftware.com/rss.xml
http://www.joystiq.com/rss.xml
http://www.kotaku.com/index.xml
http://feeds.kottke.org/main
http://www.lifehack.org/feed/
http://www.lifehacker.com/index.xml
http://littlegreenfootballs.com/weblog/lgf-rss.php
http://www.makezine.com/blog/index.xml
http://www.mattcutts.com/blog/feed/
http://xml.metafilter.com/rss.xml
http://www.mezzoblue.com/rss/index.xml
http://www.micropersuasion.com/index.rdf
http://www.neilgaiman.com/journal/feed/rss.xml
http://www.oilman.ca/feed/
http://www.perezhilton.com/index.xml
http://www.plasticbag.org/index.rdf
http://www.powazek.com/rss.xml
http://www.problogger.net/feed/
http://feeds.feedburner.com/QuickOnlineTips
http://www.readwriteweb.com/rss.xml
http://www.schneier.com/blog/index.rdf
http://scienceblogs.com/sample/combined.xml
http://www.seroundtable.com/index.rdf
http://www.shoemoney.com/feed/
http://www.sifry.com/alerts/index.rdf
http://www.simplebits.com/xml/rss.xml
http://feeds.feedburner.com/Spikedhumor
http://www.stevepavlina.com/blog/feed
http://www.talkingpointsmemo.com/index.xml
http://www.tbray.org/ongoing/ongoing.rss
http://feeds.feedburner.com/TechCrunch
http://www.techdirt.com/techdirt_rss.xml
http://www.techeblog.com/index.php/feed/
http://www.thesuperficial.com/index.xml
http://www.tmz.com/rss.xml
http://www.treehugger.com/index.rdf
http://www.tuaw.com/rss.xml
http://www.valleywag.com/index.xml
http://www.we-make-money-not-art.com/index.rdf
http://www.wired.com/rss/index.xml
http://www.wonkette.com/index.xml
Can anyone tell me whats wrong? Thank you so much for your help.