This python module is full of options and features.
I will try to show you some parts useful for most python users.
About pattern python module:
Pattern is a web mining module for the Python programming language.
It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and visualization.
Pattern developer documentation
Module | Functionality |
pattern.web | Asynchronous requests, web services, web crawler, HTML DOM parser. |
pattern.db | Wrappers for databases (MySQL, SQLite) and CSV-files. |
pattern.text | Base classes for parsers, parse trees and sentiment analysis. |
pattern.search | Pattern matching algorithm for parsed text (syntax & semantics). |
pattern.vector | Vector space model, clustering, classification. |
pattern.graph | Graph analysis & visualization. |
I used with Fedora linux and you can see the instalation of this python module:
[root@localhost ~]# pip install pattern
Collecting pattern
Downloading pattern-2.6.zip (24.6MB)
100% |████████████████████████████████| 24.6MB 61kB/s
Installing collected packages: pattern
Running setup.py install for pattern ... done
Successfully installed pattern-2.6
Frequently used single character variable names:
Variable | Meaning | Example |
a | array, all | a = [normalize(w) for w in words] |
b | boolean | while b is False: |
d | distance, document | d = distance(v1, v2) |
e | element | e = html.find('#nav') |
f | file, filter, function | f = open('data.csv', 'r') |
i | index | for i in range(len(matrix)): |
j | index | for j in range(len(matrix[i])): |
k | key | for k in vector.keys(): |
n | list length | n = len(a) |
p | parser, pattern | p = pattern.search.compile('NN') |
q | query | for r in twitter.search(q): |
r | result, row | for r in csv('data.csv): |
s | string | s = s.decode('utf-8').strip() |
t | time | t = time.time() - t0 |
v | value, vector | for k, v in vector.items(): |
w | word | for i, w in enumerate(sentence.words): |
x | horizontal position | node.x = 0 |
y | vertical position | node.y = 0 |
Language | Code | Speakers | Example countries |
Spanish | es | 350M | Argentina (40), Colombia (40), Mexico (100), Spain (45) |
English | en | 340M | Canada (30), United Kingdom (60), United States (300) |
German | de | 100M | Austria (10), Germany (80), Switzerland (7) |
French | fr | 70M | France (65), Côte d'Ivoire (20) |
Italian | it | 60M | Italy (60) |
Dutch | nl | 27M | The Netherlands (25), Belgium (6), Suriname (1) |
import pattern.en
import pattern.es
import pattern.du
import pattern.de
You can deal with many websites, see examples: from pattern.web import Wikipedia
from pattern.web import Yahoo
from pattern.web import Twitter
from pattern.web import Facebook
from pattern.web import Flickr
from pattern.web import GMAIL
from pattern.web import GOOGLE
Now, about pattern.db.The pattern.db module contains wrappers for databases (SQLite, MySQL), Unicode CSV files and Python's datetime. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.
import pattern
from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW
db = Database('people')
db.create('area_people',fields=(
pk(),
field('name', STRING(80), index=True),
field('type', STRING(20)),
field('date_birth', DATE, default=None),
field('date_created', DATE, default=NOW)
))
db.area_people.append(name=u'George', type='male')
1
print db.area_people.rows()[0]
(1, u'George', u'male', None, Date('2017-03-06 22:38:13'))