analitics

Pages

Showing posts with label free. Show all posts
Showing posts with label free. Show all posts

Wednesday, June 2, 2010

The HTMLParser module - just simple example

Basically this module is for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
On HTMLParse docs.
You will see the same example but with no explanation. The example is :

import HTMLParser
from HTMLParser import *
import urllib2 
from urllib2 import urlopen

class webancors(HTMLParser):
 def __init__(self, url):
  HTMLParser.__init__(self)
  r = urlopen(url)
  self.feed(r.read())

 def handle_starttag(self, tag, attrs):
  if tag == 'a' and attrs:
   print "Link: %s" % attrs[0][1]
I named the python file : spiderweb.py
I use python to import this file:

>>> import spiderweb
>>> spiderweb.webancors('http://www.yahoo.com')
Link: y-mast-sprite y-mast-txt web
Link: y-mast-link images
Link: y-mast-link video
Link: y-mast-link local
Link: y-mast-link shopping
Link: y-mast-link more
Link: p_13838465-sa-drawer
Link: y-hdr-link

>>> 
The method handle_starttag takes two arguments from HTMLParser.
This arguments, tag and attrs is used to return values.
Note :
The HTMLParser module has been renamed to html.parser in Python 3.0. The 2to3 tool will automatically adapt imports when converting your sources to 3.0.
Use "http://" not just "www". If don't use "http://" you see errors.
Seam urllib2 have some troubles with:

  File "/usr/lib/python2.5/urllib2.py", line 241, in get_type
    raise ValueError, "unknown url type: %s" % self.__original

You can use all functions HTTParser class.

Wednesday, May 19, 2010

The beauty of Python: Simple functions - part 1

Validation of a condition and return result in 'Yes' or 'No'.

>>> def valid(x,y):
...     return ('Yes' if x==y else 'No') 
... 
>>> valid(2,3)
'No'
>>> valid(2,2)
'Yes'


Some usefull functions from string module .

>>> import string 
>>> var_template_string=string.Template(" The $subj $vb $something")
>>> var_template_string.substitute(subj="PC", vb="working", something="now")
' The PC working now'
>>> some_string_dictionary={'subj':'car', 'vb':'is', 'something':'blue'}
>>> var_template_string.substitute(some_string_dictionary)
' The car is blue'
>>> some_string_dictionary


Some example with re module and html tag

>>> import re
>>> t='<p>'
>>> some_tag = re.compile(r'<(.*?)(\s|>)')
>>> m = some_tag.match(t)
>>> print m
<_sre.SRE_Match object at 0xb7f79da0>
>>> dir(m)
['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start']
>>> print m.start()
0
>>> print m.groups()
('p', '>')
>>> print m.group()
<p>
>>> print m.span()
(0, 3)

The re module has many usefull functions.
This is just some examples to show the simplicity of python language.