Wednesday, April 5, 2017

How to parse the OPML file.

For example: the Feedly (stylized as feedly) is a news aggregator application for various web browsers and mobile devices can let you to export and import the opml file.

What is XML?
The Extensible Markup Language (XML) is a markup language much like HTML or SGML. This is recommended by the World Wide Web Consortium and available as an open standard.

Today I will show you how to parse the opml file type with python 2.7 version and xml python module.
This is the source script:
from xml.etree import ElementTree
import sys

file_opml = sys.argv[1]
def extract_rss_urls_from_opml(filename):
    urls = []
    with open(filename, 'rt') as f:
        tree = ElementTree.parse(f)
    for node in tree.findall('.//outline'):
        url = node.attrib.get('xmlUrl')
        if url:
            urls.append(url)
    return urls
urls = extract_rss_urls_from_opml(file_opml)
print urls
The result is a list with all your rss links.

No comments:

Post a Comment