analitics

Pages

Monday, March 6, 2017

The pattern python module - part 001.

This is a very short presentation of pattern python module.
This python module is full of options and features.
I will try to show you some parts useful for most python users.
About pattern python module:
Pattern is a web mining module for the Python programming language.
It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and visualization.
Pattern developer documentation
ModuleFunctionality
pattern.web Asynchronous requests, web services, web crawler, HTML DOM parser.
pattern.db Wrappers for databases (MySQL, SQLite) and CSV-files.
pattern.text Base classes for parsers, parse trees and sentiment analysis.
pattern.search Pattern matching algorithm for parsed text (syntax & semantics).
pattern.vector Vector space model, clustering, classification.
pattern.graph Graph analysis & visualization.

I used with Fedora linux and you can see the instalation of this python module:
[root@localhost ~]# pip install pattern
Collecting pattern
  Downloading pattern-2.6.zip (24.6MB)
    100% |████████████████████████████████| 24.6MB 61kB/s 
Installing collected packages: pattern
  Running setup.py install for pattern ... done
Successfully installed pattern-2.6

Frequently used single character variable names:
Variable Meaning Example
a array, all a = [normalize(w) for w in words]
b boolean while b is False:
d distance, document d = distance(v1, v2)
e element e = html.find('#nav')
f file, filter, function f = open('data.csv', 'r')
i index for i in range(len(matrix)):
j index for j in range(len(matrix[i])):
k key for k in vector.keys():
n list length n = len(a)
p parser, pattern p = pattern.search.compile('NN')
q query for r in twitter.search(q):
r result, row for r in csv('data.csv):
s string s = s.decode('utf-8').strip()
t time t = time.time() - t0
v value, vector for k, v in vector.items():
w word for i, w in enumerate(sentence.words):
x horizontal position node.x = 0
y vertical position node.y = 0
Pattern contains part-of-speech taggers for a number of languages (including English, Spanish, German, French and Dutch). Part-of-speech tagging is useful in many data mining tasks. A part-of-speech tagger takes a string of text and identifies the sentences and the words in the text along with their word type. 


LanguageCode Speakers Example countries
Spanish es 350M Argentina (40), Colombia (40), Mexico (100), Spain (45)
English en 340M Canada (30), United Kingdom (60), United States (300)
German de 100M Austria (10), Germany (80), Switzerland (7)
French fr 70M France (65), Côte d'Ivoire (20)
Italian it 60M Italy (60)
Dutch nl 27M The Netherlands (25), Belgium (6), Suriname (1)
import pattern.en  
import pattern.es
import pattern.du  
import pattern.de
You can deal with many websites, see examples:
from pattern.web import Wikipedia
from pattern.web import Yahoo
from pattern.web import Twitter
from pattern.web import Facebook
from pattern.web import Flickr
from pattern.web import GMAIL
from pattern.web import GOOGLE
Now, about pattern.db.
The pattern.db module contains wrappers for databases (SQLite, MySQL), Unicode CSV files and Python's datetime. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.
import pattern 
from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW 
db = Database('people')
db.create('area_people',fields=(
pk(),
field('name', STRING(80), index=True),
field('type', STRING(20)),
field('date_birth', DATE, default=None),
field('date_created', DATE, default=NOW)
))
db.area_people.append(name=u'George', type='male')
1
print db.area_people.rows()[0]
(1, u'George', u'male', None, Date('2017-03-06 22:38:13'))