analitics

Pages

Wednesday, June 28, 2017

The pyquery python module.

This tutorial is about pyquery python module and python 2.7.13 version.
First I used pip command to install it.
C:\Python27>cd Scripts

C:\Python27\Scripts>pip install pyquery
Collecting pyquery
  Downloading pyquery-1.2.17-py2.py3-none-any.whl
Requirement already satisfied: lxml>=2.1 in c:\python27\lib\site-packages (from pyquery)
Requirement already satisfied: cssselect>0.7.9 in c:\python27\lib\site-packages (from pyquery)
Installing collected packages: pyquery
Successfully installed pyquery-1.2.17
I try to install with pip and python 3.4 version but I got errors.
The development team tells us about this python module:
pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.
Let's try a simple example of this python module.
The base of this example is found links by HTML tag.
from pyquery import PyQuery
 
seeds = [
    'https://twitter.com',
    'http://google.com'
]
 
crawl_frontiers = []
 
def start_crawler():
    crawl_frontiers = crawler_seeds()
 
    print(crawl_frontiers)
 
def crawler_seeds():
    frontiers = []
    for index, seed in enumerate(seeds):
        frontier = {index: read_links(seed)}
        frontiers.append(frontier)
 
    return frontiers
 
def read_links(seed):
    crawler = PyQuery(seed)
    return [crawler(tag_a).attr("href") for tag_a in crawler("a")]
 
start_crawler()
The read_links function takes links from seeds array.
To do that, I need to read the links and put in into another array crawl_frontiers.
The frontiers array is used just for crawler process.
Also, this simple example allows you to understand better the arrays.
You can read more about this python module here.

Saturday, June 17, 2017

Translate with goslate python module .

This python module comes with many features and this is the main reason I make this tutorial.
We can read about this python module here.
Google has updated its translation service recently with a ticket mechanism to prevent simple crawler program like goslate from accessing.
Though a more sophisticated crawler may still work technically, however it would have crossed the fine line between using the service and breaking the service. goslate will not be updated to break google’s ticket mechanism. Free lunch is over. Thanks for using.


Let's install this python module with python 2.7 version and pip:

C:\Python27>cd Scripts

C:\Python27\Scripts>pip install goslate
Collecting goslate
  Downloading goslate-1.5.1.tar.gz
Requirement already satisfied: futures in c:\python27\lib\site-packages (from goslate)
Installing collected packages: goslate
  Running setup.py install for goslate ... done
Successfully installed goslate-1.5.1
Let's test a simple example from English to Romanian:
C:\Python27>python.exe
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import goslate
>>> gs = goslate.Goslate()
>>> print(gs.translate('I\'m not here','ro'))
Eu nu sunt aici
Using detail dictionary explanation for a single word/phrase:
>>> gs.lookup_dictionary('internet', 'ro')
[[[u'Internet', u'internet', None, None, 2]], [[u'noun', [u'Internet'], [[u'Internet', 
[u'Internet'], None, 0.43686765]], u'Internet', 1]], u'en', None, None, None, 0.73151749,
 None, [[u'en'], None, [0.73151749], [u'en']]]
In my opinion, I have no idea what they might use, perhaps in chat applications, specific translations, and text detection.