Nerdy Sermons: Find RSS Feed URL of a Webpage

Friday, March 02, 2012

Find RSS Feed URL of a Webpage

Given a URL of a web page, one can programatically search through the meta tags of the webpage's content for alternate URL links (like atom or RSS feed links for the same) to thereon further use them to parse and process the content of the webpage. Typically this is the way Google Reader works. Here I present a very simple implementation of the same in pharo smalltalk.


Object subclass: #RSSReader

 instanceVariableNames: ''

 classVariableNames: ''

 poolDictionaries: ''

 category: 'VamsiExperiments'





getURLContent: url
  "## Comment : Supply the url String of the webpage,

   ## example: http://nerdysermons.blogspot.in"

   | urlContent |

   urlContent := (url asUrl retrieveContents contents asString).

^urlContent

 

 



findAlternateLinksInUrlContent: urlContent
  "## Comment : The above fetched page content to

## be passed here to get an ordered collection

   ## of alternate links"       

   | links|

   links := OrderedCollection new.

   urlContent linesDo:  [:line |

   (line findString: '<link rel="alternate"') > 0

      ifTrue: [

        links add: (line findTokens:'"' includes: 'http://').

      ].   

   ].


^links.

Nerdy Sermons

Friday, March 02, 2012

Find RSS Feed URL of a Webpage

0 comments:

Post a Comment

Stack Profiles

Archive

Search By Category

About the Author

Followers

Facebook Page

Feedjit

Nerdy Sermons

Friday, March 02, 2012

Find RSS Feed URL of a Webpage

0 comments:

Post a Comment

Stack Profiles

Archive

Search By Category

Subscribe To

About the Author

Followers

Facebook Page

Feedjit