On HTMLParse docs.
You will see the same example but with no explanation. The example is :
import HTMLParser
from HTMLParser import *
import urllib2
from urllib2 import urlopen
class webancors(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
r = urlopen(url)
self.feed(r.read())
def handle_starttag(self, tag, attrs):
if tag == 'a' and attrs:
print "Link: %s" % attrs[0][1]
I named the python file : spiderweb.py I use python to import this file:
>>> import spiderweb
>>> spiderweb.webancors('http://www.yahoo.com')
Link: y-mast-sprite y-mast-txt web
Link: y-mast-link images
Link: y-mast-link video
Link: y-mast-link local
Link: y-mast-link shopping
Link: y-mast-link more
Link: p_13838465-sa-drawer
Link: y-hdr-link
>>>
The method handle_starttag takes two arguments from HTMLParser.This arguments, tag and attrs is used to return values.
Note :
The HTMLParser module has been renamed to html.parser in Python 3.0. The 2to3 tool will automatically adapt imports when converting your sources to 3.0.
Use "http://" not just "www". If don't use "http://" you see errors.
Seam urllib2 have some troubles with:
File "/usr/lib/python2.5/urllib2.py", line 241, in get_type
raise ValueError, "unknown url type: %s" % self.__original
You can use all functions HTTParser class.

色情網自拍影片色情文章比基尼成人動畫色瞇瞇影片網小弟貼影片bt成人成人 影片日本成人網站日本成人網站破解日本成人網址日本成人線上免費日本成人免費影片日本成人動畫日本曾根日本有碼 dvd 專賣店日本有碼進口dvd專賣店日本東洋影片視訊 辣妹g8成人下載av短片-免費a片亞亞 dvd 光碟嘿咻kiss168cu成人bt情色 網4u成人0401影音視訊交友愛情館本土自拍xd成人圖區新人淚成人色網kkg亞洲免費影片av影片欣賞性行為補給站999成人性站最愛78論壇最色情的網站最色情的遊戲最多人聊天室最大a片網
ReplyDeleteVenture a small fish to catch a great one. ............................................................
ReplyDelete君子如水,隨方就圓,無處不自在。 ..................................................
ReplyDelete你不能決定生命的長度,但你可以控制它的寬度..................................................
ReplyDelete很棒的分享~留言支持!.................................................................
ReplyDelete人因夢想而偉大,要堅持自己的理想哦......................................................................
ReplyDelete