analitics

Pages

Wednesday, June 2, 2010

The HTMLParser module - just simple example

Basically this module is for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
On HTMLParse docs.
You will see the same example but with no explanation. The example is :

import HTMLParser
from HTMLParser import *
import urllib2 
from urllib2 import urlopen

class webancors(HTMLParser):
 def __init__(self, url):
  HTMLParser.__init__(self)
  r = urlopen(url)
  self.feed(r.read())

 def handle_starttag(self, tag, attrs):
  if tag == 'a' and attrs:
   print "Link: %s" % attrs[0][1]
I named the python file : spiderweb.py
I use python to import this file:

>>> import spiderweb
>>> spiderweb.webancors('http://www.yahoo.com')
Link: y-mast-sprite y-mast-txt web
Link: y-mast-link images
Link: y-mast-link video
Link: y-mast-link local
Link: y-mast-link shopping
Link: y-mast-link more
Link: p_13838465-sa-drawer
Link: y-hdr-link

>>> 
The method handle_starttag takes two arguments from HTMLParser.
This arguments, tag and attrs is used to return values.
Note :
The HTMLParser module has been renamed to html.parser in Python 3.0. The 2to3 tool will automatically adapt imports when converting your sources to 3.0.
Use "http://" not just "www". If don't use "http://" you see errors.
Seam urllib2 have some troubles with:

  File "/usr/lib/python2.5/urllib2.py", line 241, in get_type
    raise ValueError, "unknown url type: %s" % self.__original

You can use all functions HTTParser class.

Saturday, May 22, 2010

The beauty of Python: subprocess module - part 3

This is just a simple example about module subprocess.
We need to install "espeak" . On fedora usse this command:
#yum install espeak

Now the example is:

>>> import subprocess   
>>> subprocess.call(["espeak", "-s 120","-p 100","This is a test"])
0
>>>   

This will speak "This is a test".
This is all .

The beauty of Python: Some examples with os and sys modules - part 2

Today i will tell about sys and os modules.
The module os has OS routines for Mac, NT, or Posix depending on what system we're on.
The module sys provides access to some objects used or maintained by the interpreter and to functions that interact strongly with the interpreter.

Some useful functions with sys module.
>>> print sys.version
2.5.2 (r252:60911, Oct  5 2008, 19:24:49) 
[GCC 4.3.2]
>>> print sys.version_info
(2, 5, 2, 'final', 0)
>>> print sys.subversion
('CPython', 'tags/r252', '60911')
>>> print sys.platform 
linux2
>>> print sys.ps1
>>> 
>>> print sys.ps2
... 
>>> print sys.prefix
/usr
>>> print sys.path
['', '/usr/lib/python2.5', '/usr/lib/python2.5/plat-linux2', '/usr/lib/python2.5/lib-tk',
'/usr/lib/python2.5/lib-dynload', '/usr/local/lib/python2.5/site-packages',
 '/usr/lib/python2.5/site-packages', '/usr/lib/python2.5/site-packages/Numeric',
'/usr/lib/python2.5/site-packages/PIL', '/usr/lib/python2.5/site-packages/gst-0.10',
 '/var/lib/python-support/python2.5', '/usr/lib/python2.5/site-packages/gtk-2.0', 
'/var/lib/python-support/python2.5/gtk-2.0']
>>> print sys.modules.keys()
['apt.os', 'email.iterators', 'apport.sys', 'random', 'apport.atexit', 'subprocess',
 'email.MIMEImage', 'gc', 'apport.pwd', 'os.path', 'encodings.encodings', 'email.mime',
 'email.MIMEText', 'xml', 'email.time', 'struct', 'tempfile', 'base64', 'apt.cache',
 'pyexpat.errors', 'apt_pkg', 'apport', 'email.binascii', 'email.Parser', 'zipimport',
 'apport.xml', 'xml.dom.copy', 'encodings.utf_8', 'apt.apt_pkg', 'email.quoprimime',
 'email.mime.text', 'email.urllib', 'email.FeedParser', 'signal', 'email.encoders',
 'pyexpat.model', 'apport.packaging_impl', 'apport.cStringIO', 'quopri',
 'email.Message', 'cStringIO', 'zlib', 'locale', 'email.charset', 'apport.fileutils',
 'xml.parsers.expat', 'atexit', 'email.quopriMIME', 'encodings', 'email.Generator',
 'apport.warnings', 'apport.problem_report', 'apt.fcntl', 'email.MIMEAudio', 'urllib',
 're', 'apt.select', 'email.quopri', 'apport.report', 'email.mime.base', 'email.errors',
 'email', 'math', 'fcntl', 'apport.os', 'apt.progress', 'UserDict', 'exceptions',
 'apport.grp', 'apport.shutil', 'codecs', 'xml.dom.domreg', 'email.Header', '_locale',
 'email.Iterators', 'socket', 'thread', 'traceback', 'apt.apt', 'e,
 'SUDO_COMMAND': '/bin/su', 'SUDO_GID': '999', 'SDL_VIDEO_CENTERED': '1',
 'PWD': '/home/mint/Desktop', 'COLORTERM': 'gnome-terminal', 'MAIL': '/var/mail/root'}
mail.Charset', 'xml.dom.xmlbuilder', 'os', 'marshal', 'apport.stat', 'apport.re',
 'apt.gettext', 'email.uu', '_sre', 'unittest', '__builtin__', 'apport.apport',
 'xml.parsers', 'apport.fnmatch', 'apport.urllib', 'operator', 'xml.parsers.pyexpat',
 'email.Errors', 'select', 'apt.string', 'apport.glob', 'apt.warnings', 'email.socket',
 'posixpath', 'email.base64MIME', 'errno', '_socket', 'binascii', 'email.Utils',
 'sre_constants', 'email.MIMEMessage', 'email._parseaddr', 'email.sys',
 'apport.traceback', 'apt.package', 'apt.random', 'xml.dom.NodeFilter',
 'email.MIMENonMultipart', '_codecs', 'apport.unittest', 'apport.apt', 'email.os',
 'email.utils', 'pwd', 'apport.time', 'copy', '_struct', '_types', 'email.email',
 'apt.cdrom', 'uu', 'xml.dom.minidom', 'apport_python_hook', 'apt', 'email.random',
 'posix', 'encodings.aliases', 'apt.sys', 'fnmatch', 'sre_parse', 'pickle', 'copy_reg',
 'sre_compile', '_random', 'site', 'email.base64', 'apt.errno', '__main__', 'problem_report',
 'pyexpat', 'email.MIMEBase', 'email.message', 'string', 'email.mime.nonmultipart',
 'apport.subprocess', 'shutil', 'strop', 'grp', 'encodings.codecs', 'gettext',
 'email.warnings', 'xml.dom.minicompat', 'email.MIMEMultipart', 'types', 'apport.tempfile',
 'stat', '_ssl', 'warnings', 'encodings.types', 'glob', 'email.re', 'sys', 'email.Encoders',
 'readline', 'email.cStringIO', 'xml.dom', 'xml.dom.xml', 'apport.signal', 'sitecustomize',
 'email.mime.email', 'email.base64mime', 'email.mime.multipart', 'apport.packaging',
 'urlparse', 'linecache', 'email.string', 'apt.re', 'time', 'gzip']


And now, some useful functions with os module.
>>> print os.uname()
('Linux', 'mint', '2.6.27-7-generic', '#1 SMP Fri Oct 24 06:42:44 UTC 2008', 'i686')
>>> print os.ttyname(1)
/dev/pts/0
>>> print os.times()
(0.050000000000000003, 0.02, 0.0, 0.0, 17186002.649999999)
>>> print os.environ
{'USERNAME': 'root', 'LANG': 'en_US.UTF-8', 'TERM': 'xterm', 'SHELL': '/bin/bash',
 'XDG_SESSION_COOKIE': '842d38513df1a6bb7490c8a14bf69489-1274456064.963733-1686354756',
 'SUDO_COMMAND': '/bin/su', 'SHLVL': '1', 'RUNNING_UNDER_GDM': 'yes', 'SUDO_UID': '999',
 'SUDO_GID': '999', 'PWD': '/home/mint/Desktop', 'LOGNAME': 'root', 'USER': 'root',
 'COLORTERM': 'gnome-terminal',
 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games',
 'MAIL': '/var/mail/root', 'SUDO_USER': 'mint', 'HOME': '/root', 'DISPLAY': ':0.0',
 '_': '/usr/bin/python', 'XAUTHORITY': '/home/mint/.Xauthority'}
>>> print os.mkdir('aaa')
None
>>> print os.mkdir('aaa')
Traceback (most recent call last):
File "", line 1, in 
OSError: [Errno 17] File exists: 'aaa'
>>> print os.listdir('/')
['media', 'root', 'sbin', 'usr', 'lib', 'tmp', 'home', 'var', 'cdrom', 'etc',
 'rofs', 'bin', 'boot', 'dev', 'initrd.img', 'mnt', 'opt', 'proc', 'srv',
 'sys', 'vmlinuz']

These is just a brief tutorial about sys and os modules.