python-catalin

Thursday, July 11, 2019

Python 3.7.3 : Three examples with BeautifulSoup.

Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree., see the pypi webpage.
This python module was created by Leonard Richardson.
A large definition can be this:
Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
This python module can do that but the input format and output format is different.
The input can be a webpage like an URL or webpage with all pieces of information and the output depends by the this and the user choices.
Les's see some examples:
First example show you how to take content of the first row table from a wikipedia webpage.

# get table from wikipedia 
import requests
from bs4 import BeautifulSoup
website_url = requests.get('https://en.wikipedia.org/w/index.php?title=Table_of_food_nutrients').text
soup = BeautifulSoup(website_url,'lxml')

my_table = soup.find('table',{'class':'wikitable collapsible collapsed'})
links = my_table.findAll('a')
Food = []
for link in links:
    Food.append(link.get('title'))

print(Food)

The next example takes all files from a page


# get links using the url
import urllib
from bs4 import BeautifulSoup
page = urllib.request.urlopen('http://____share.net/filmes/').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', href=True):
    print (anchor['href'])

The last example takes all images from the search query of imgur website:

# get images from imgur search query
import urllib
from bs4 import BeautifulSoup
url = 'https://imgur.com/search/score?q=cyborg'
with urllib.request.urlopen(url) as f:
    soup = BeautifulSoup(f.read(),'lxml')

a_tags = soup.findAll("a",{"class":"image-list-link"})
img_tags = [a.find("img") for a in a_tags]
print(img_tags)
srcs = []
for s in img_tags:
    src_tags=('http:'+s['src'])
    srcs.append(src_tags)

print(srcs)

As a conclusion, this module will pose problems for those who do not understand how to scroll through the source code, the content of web pages, how to read 'lxml', 'page', etc.
It will greatly help your Chrome F12 key to access parts of web content.

Python 3.7.3 : Testing the Bokeh python module.

This python module has a beautiful website:
Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
Let's install this python module with the pip tool:

C:\Python373>cd Scripts

C:\Python373\Scripts>pip install bokeh
Collecting bokeh
...
Successfully built bokeh
Installing collected packages: PyYAML, tornado, bokeh
Successfully installed PyYAML-5.1.1 bokeh-1.2.0 tornado-6.0.3

Let's test it with a simple example:

from bokeh.plotting import figure, output_file, show

output_file("test.html")
plot = figure()
plot.line([1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 7, 6, 1, 5, 6, 7, 9, 1], line_width=2)
show(plot)

This will create a file into my python folder C:/Python373/test.html.
The HTML webpage comes with the graph and additional tool like Pan, Box Zoom, Wheel Zoom, Save, Reset and help.
If you need sample data that is not included in the Bokeh GitHub repository or released packages then you need to download it.

>>> import bokeh.sampledata
>>> bokeh.sampledata.download()
Creating C:\Users\catafest\.bokeh directory
Creating C:\Users\catafest\.bokeh\data directory
Using data directory: C:\Users\catafest\.bokeh\data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3171836 bytes)
    229376 [  7.23%]

Bokeh comes with support for working with Geographical data: Mercator, Google Maps, GeoJSON Data.
You can see all example into the webpage gallery.

Wednesday, July 10, 2019

Python 3.7.3 : About python version 3.7.3.

All versions of python come with many features and changes with every released version.
A full list of these changes can be found at PEP official webpage and this documentation webpage.
The goal of these tutorials is to fix the learning area by each python version and have a good picture of these features.
Let's start with the first step - python modules.
Several of the standard library Python packages have been reorganized or moved with a few notable changes:

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> import profile
>>> import urllib
>>> import urllib3

The division math operation has new features to explicitly convert integers to floats when working with integer variables:

>>> one = 1
>>> two = 2
>>> three = 3
>>> float(three)
3.0
>>> one/two
0.5
>>> one//three
0
>>> one//two
0
>>> float(one)/three
0.3333333333333333
>>>

You can use the pathlib library which provides the Path() object to fulfill all your path manipulation needs.

>>> from pathlib import Path
>>> folder1 = Path('/folder1')
>>> config_path = folder1 / 'subfolder1'
>>> config_path
WindowsPath('/folder1/subfolder1')
>>> str(config_path)
'\\folder1\\subfolder1'
>>> config_path.name
'subfolder1'

You can use operators with matrix:

>>> import numpy as np
>>> x = np.array([[11, 33], [22, 55]])
>>> y = np.array([[1, 3], [2, 5]])
>>> x @ y
array([[ 77, 198],
       [132, 341]])
>>> x * y
array([[ 11,  99],
       [ 44, 275]])

The list and dictionaries can easily be emptied using the .clear method:

>>> my_list = ['a','b','c']
>>> my_list.clear()

The print function is changed.

>>> print('hello')
hello
>>> a = ''
>>> f = open('my_file.txt', 'w')
>>> print(a, file=f)
>>> f.close()

The function annotations can provide information on inputs/outputs:

>>> def a_to_b(x: str) -> str:
...     return x.replace('a','b')
...
>>> a_to_b("abcdcba!?!")
'bbcdcbb!?!'

Fix the sensible comparison:

>>> 'True' > True
Traceback (most recent call last):
  File "", line 1, in 
TypeError: '>' not supported between instances of 'str' and 'bool'

With Python 3.6 we have a new type of strings: f-strings and string interpolation:

>>> var = 76/3
>>> f'The value is {var}.'
'The value is 25.333333333333332.'

You can use underscores in numbers:

>>> int_a = 1_000_000_000
>>> hex_b = 0b_0011_1111_0100_1110
>>> print(int_a,hex_b)
1000000000 16206

The new Unicode strings and variable (including emoji) names to be used.
An LRU cache decorator for your functions: functools.lru_cache.
An enumerated type in the standard library: Enum.
Use the standard ipaddress:

>>> import ipaddress
>>> ipaddress.ip_address('192.168.0.1')
IPv4Address('192.168.0.1')
>>> ipaddress.ip_address('2001:db8::')
IPv6Address('2001:db8::')

In Python 3, decimals are rounded to the nearest even number (.5).
The input() function was fixed in Python 3 so that it always stores the user inputs as str objects.
In Python 3, the range() was implemented like the xrange() in older version.
That range got a new __contains__ method in Python 3.x.
You can simply convert the iterable object into a list via the list() function.

>>> print(range(3))
range(0, 3)
>>> print(type(range(3)))

>>> print(list(range(3)))
[0, 1, 2]

With advanced unpacking and range you can do this:

>>> a, b = range(2)
>>> print(a,b)
0 1
>>> a, b, *rest = range(6)
>>> print(a,b)
0 1
>>> print(rest)
[2, 3, 4, 5]
>>> a, *rest, b = range(6)
>>> print( a,b, rest)
0 5 [1, 2, 3, 4]

Get the first and the last of the open file with:

first, *_, last = f.readlines()

Keyword only arguments can be done with:

def f(a, b, *args, option=True):

The only way to access it is to explicitly call f(a, b, option=True).
If you don't want to collect *args the use this:

def f(a, b, *, option=True):

You can just use os.stat(file, follow_symlinks=False) instead of os.lstat.
The next function can be call only into this way:

>>> my_generator = (letter for letter in 'abcd')
>>> next(my_generator)
'a'
>>> next(my_generator)
'b'

The for-loop variables don’t leak into the global namespace anymore:

>>> i = 1
>>> print('comprehension:', [i for i in range(6)])
comprehension: [0, 1, 2, 3, 4, 5]
>>> print('i is ', i)
i is  1

The async and await are now reserved keywords.
More useful exceptions and also change the comma with the keyword as:

>>> try:
...     f = open('my_file.txt')
... except OSError as e:
...     if e.errno == errno.ENOENT:
...             #

...     else:

...     raise

In Python 3, the .keys() method instead returns an iterator object instead of a list.

>>> my_dict = {'a': 11, 'b': 12, 'c': 13, 'd': 14}
>>> my_dict.keys()
dict_keys(['a', 'b', 'c', 'd'])
>>> my_dict_keys=list(my_dict.keys())
>>> my_dict_keys[3]
'd'
>>> my_dict.keys()[3]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'dict_keys' object is not subscriptable

Keyword-only arguments and positional parameters are valid in Python 3.7.3.
The chained Exceptions provide by python 3.7.3 has more information and the original exception is printed out, along with the original traceback.

>>> raise exception from e
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'exception' is not defined
>>> raise NotImplementedError from OSError
OSError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "", line 1, in 
NotImplementedError

These were briefly some information about python 3.7.3 that might be of use to you.

Tuesday, July 9, 2019

Python 3.7.3 : Testing Wagtail.

The Wagtail is a beautiful project and can be integrated with Django.
In this tutorial, I will show you the steps for the first install of a basic website.

C:\Python373>python -m virtualenv venv_wagtail
Using base prefix 'C:\\Python373'
New python executable in C:\Python373\venv_wagtail\Scripts\python.exe
Installing setuptools, pip, wheel...
done.

C:\Python373>venv_wagtail\Scripts\activate.bat

(venv_wagtail) C:\Python373>pip install wagtail
Collecting wagtail
...
Successfully built django-treebeard draftjs-exporter
Installing collected packages: sqlparse, pytz, Django, django-taggit, chardet, i
dna, urllib3, certifi, requests, django-treebeard, draftjs-exporter, beautifulso
up4, django-modelcluster, djangorestframework, Willow, six, Pillow, webencodings
, html5lib, Unidecode, wagtail
Successfully installed Django-2.2.3 Pillow-5.4.1 Unidecode-1.1.1 Willow-1.1 beau
tifulsoup4-4.6.0 certifi-2019.6.16 chardet-3.0.4 django-modelcluster-4.4 django-
taggit-0.24.0 django-treebeard-4.3 djangorestframework-3.9.4 draftjs-exporter-2.
1.6 html5lib-1.0.1 idna-2.8 pytz-2019.1 requests-2.22.0 six-1.12.0 sqlparse-0.3.
0 urllib3-1.25.3 wagtail-2.5.1 webencodings-0.5.1

(venv_wagtail) C:\Python373>wagtail start mysite
Creating a Wagtail project called mysite

(venv_wagtail) C:\Python373>cd mysite

(venv_wagtail) C:\Python373\mysite>python manage.py migrate

(venv_wagtail) C:\Python373\mysite>python manage.py createsuperuser
Username (leave blank to use 'catafest'):
Email address: catafest@yahoo.com
Password:
Password (again):
This password is too short. It must contain at least 8 characters.
Bypass password validation and create user anyway? [y/N]: n
Password:
Password (again):
Superuser created successfully.

(venv_wagtail) C:\Python373\mysite>manage.py runserver
Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).
July 09, 2019 - 21:20:46
Django version 2.2.3, using settings 'mysite.settings.dev'
Starting development server at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.

Success! mysite has been created

Let's see the result of the first Wagtail test on my YouTube channel:

Python 3.7.3 : The python-slugify python module.

This python module named python-slugify can handle Unicode.
You can see this python module source code and examples at GITHUB webpage.
The install step with pip python tool is easy:

C:\Python373>cd Scripts
C:\Python373\Scripts>pip install python-slugify
Collecting python-slugify
...
Successfully built python-slugify
Installing collected packages: text-unidecode, python-slugify
Successfully installed python-slugify-3.0.2 text-unidecode-1.2

Let's see some simple example.

>>> from slugify import slugify
>>> txt = "___This is a test___"
>>> regex_pattern = r'[^-a-z0-9_]+'
>>> r = slugify(txt, regex_pattern=regex_pattern)
>>> print(r)
___this-is-a-test___

Remove an email address from a string:

>>> txt_email = "___My mail is catafest@yahoo.com *&@!@#$$76"
>>> regex_pattern =r'[\w\.]+\@[\w]+(?:\.[\w]{3}|\.[\w]{2}\.[\w]{2})\b'
>>> r = slugify(txt_email, regex_pattern=regex_pattern)
>>> print(r)
___my mail is - *&@!@#$$76

You can write your data into one text file:

>>> ftxt = open('C:\\Python373\\soup.txt','w')
>>> ftxt.write(soup)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: write() argument must be str, not BeautifulSoup
>>> ftxt.write(str(soup))

If you got errors the use the slugify to fix it:

ftxt.write(slugify(str(soup)))

Monday, July 8, 2019

Python 3.7.3 : Using attrgetter from operator python module.

Return a callable object that fetches attr from its operand. If more than one attribute is requested, returns a tuple of attributes. The attribute names can also contain dots. see documentation.
The attrgetter operator works more or less similar to itemgetter, except that it looks up an attribute instead of an index.
For example to test this operator we need to create a class to access the attribute on the class.
I create a class named Person with name and surname initialization.
I will get the attributes from this class and show it.

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import operator
>>> class Person:
...     def __init__(self, name, surname):
...             self.n = name
...             self.s = surname
...
>>> my_per = Person('Festila', 'Catalin George')
>>> getter = operator.attrgetter("n")
>>> print(getter)
operator.attrgetter('n')
>>> print(getter(my_per))
Festila
>>> another_getter = operator.attrgetter("s")
>>> print(another_getter)
operator.attrgetter('s')
>>> print(another_getter(my_per))
Catalin George
>>> all = operator.attrgetter("n","s")
>>> print(all)
operator.attrgetter('n', 's')
>>> print(all(my_per))
('Festila', 'Catalin George')

To restrict setting an attribute outside of constructor you need to use a private attribute starting with an underscore, and a read-only property for public access.
See the new class with the attribute s restricted:

class Person:
    def __init__(self, name, surname):
            self.n = name
            self.s = surname
s = property(operator.attrgetter("_s"))

Let's test it in another way:

>>> from operator import attrgetter
>>> test=attrgetter('something')
>>> print(test)
operator.attrgetter('something')
>>> print(type(test))

>>> print(test(something))
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'something' is not defined
>>> print(type(test(something)))
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'something' is not defined

Because something is not initialized correctly the error is expected for both: test(something) and type(test(something)

>>> something = 'all'
>>> print(type(test))

>>> print(type(test(something)))
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'str' object has no attribute 'something'
>>> something = 1
>>> print(type(test(something)))
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'int' object has no attribute 'something'

Friday, July 5, 2019

Python 3.7.3 : Use python with MySQL and MariaDB.

If you want to use MariaDB databases with python then you need to install the MySQL.
Use the pip tool to install the mysql-connector-python python module:

C:\Python373\Scripts>pip install mysql-connector-python
Collecting mysql-connector-python
...
Installing collected packages: mysql-connector-python
Successfully installed mysql-connector-python-8.0.16

The information about python module named mysql-connector-python version 8.0.16 can be found at this web page.
For the test, I install MariaDB from the official website with user root and my password.
Let's test the python module for MariaDB:

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import mysql
>>> import mysql.connector as mariadb

After install, start the command prompt of MariaDB and test it:

C:\Windows\System32>mysql -u root -p
Enter password: ****
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 9
Server version: 10.4.6-MariaDB mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.001 sec)
MariaDB [information_schema]> USE information_schema;
Database changed
MariaDB [information_schema]> SHOW TABLES;
...
| INNODB_TABLESPACES_SCRUBBING          |
| INNODB_SYS_SEMAPHORE_WAITS            |
+---------------------------------------+
77 rows in set (0.001 sec)

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import mysql.connector as mariadb
>>> mariadb_connection = mariadb.connect(user='root', password='test', database=
'information_schema')
>>> cursor = mariadb_connection.cursor()
>>> cursor.execute("USE information_schema")
>>> cursor.execute("SHOW TABLES")
>>> result = cursor.fetchall()
>>> print(result)
[('ALL_PLUGINS',), ('APPLICABLE_ROLES',), ('CHARACTER_SETS',), ('CHECK_CONSTRAIN
TS',), ('COLLATIONS',), ('COLLATION_CHARACTER_SET_APPLICABILITY',), ('COLUMNS',)
, ('COLUMN_PRIVILEGES',), ('ENABLED_ROLES',), ('ENGINES',), ('EVENTS',), ('FILES
',), ('GLOBAL_STATUS',), ('GLOBAL_VARIABLES',), ('KEY_CACHES',), ('KEY_COLUMN_US
AGE',), ('OPTIMIZER_TRACE',), ('PARAMETERS',), ('PARTITIONS',), ('PLUGINS',), ('
PROCESSLIST',), ('PROFILING',), ('REFERENTIAL_CONSTRAINTS',), ('ROUTINES',), ('S
CHEMATA',), ('SCHEMA_PRIVILEGES',), ('SESSION_STATUS',), ('SESSION_VARIABLES',),
 ('STATISTICS',), ('SYSTEM_VARIABLES',), ('TABLES',), ('TABLESPACES',), ('TABLE_
CONSTRAINTS',), ('TABLE_PRIVILEGES',), ('TRIGGERS',), ('USER_PRIVILEGES',), ('VI
EWS',), ('GEOMETRY_COLUMNS',), ('SPATIAL_REF_SYS',), ('CLIENT_STATISTICS',), ('I
NDEX_STATISTICS',), ('INNODB_SYS_DATAFILES',), ('USER_STATISTICS',), ('INNODB_SY
S_TABLESTATS',), ('INNODB_LOCKS',), ('INNODB_MUTEXES',), ('INNODB_CMPMEM',), ('I
NNODB_CMP_PER_INDEX',), ('INNODB_CMP',), ('INNODB_FT_DELETED',), ('INNODB_CMP_RE
SET',), ('INNODB_LOCK_WAITS',), ('TABLE_STATISTICS',), ('INNODB_TABLESPACES_ENCR
YPTION',), ('INNODB_BUFFER_PAGE_LRU',), ('INNODB_SYS_FIELDS',), ('INNODB_CMPMEM_
RESET',), ('INNODB_SYS_COLUMNS',), ('INNODB_FT_INDEX_TABLE',), ('INNODB_CMP_PER_
INDEX_RESET',), ('user_variables',), ('INNODB_FT_INDEX_CACHE',), ('INNODB_SYS_FO
REIGN_COLS',), ('INNODB_FT_BEING_DELETED',), ('INNODB_BUFFER_POOL_STATS',), ('IN
NODB_TRX',), ('INNODB_SYS_FOREIGN',), ('INNODB_SYS_TABLES',), ('INNODB_FT_DEFAUL
T_STOPWORD',), ('INNODB_FT_CONFIG',), ('INNODB_BUFFER_PAGE',), ('INNODB_SYS_TABL
ESPACES',), ('INNODB_METRICS',), ('INNODB_SYS_INDEXES',), ('INNODB_SYS_VIRTUAL',
), ('INNODB_TABLESPACES_SCRUBBING',), ('INNODB_SYS_SEMAPHORE_WAITS',)]

Thursday, July 4, 2019

Python 3.7.3 : Using itemgetter from operator python module.

The operator module exports a set of efficient functions corresponding to the intrinsic operators of Python. see documentation.
Today I will show you how to use itemgetter from this python module with python 3.7.3.
Let's see how to sort my two dictionaries named my_dict and my_dict2, using the classical lambda function;

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> my_dict = {'a': 11, 'b': 12, 'c': 13, 'd': 14}
>>> sorted(my_dict.items(), key = lambda x:x[1])
[('a', 11), ('b', 12), ('c', 13), ('d', 14)]
>>> my_dict2 = {'a': 31, 'b': 12, 'c': 33, 'd': 14}
>>> sorted(my_dict2.items(), key = lambda x:x[1])
[('b', 12), ('d', 14), ('a', 31), ('c', 33)]

Using the operator python module with my examples for both dictionaries:

>>> import operator
>>> sorted(my_dict.items(), key = operator.itemgetter(1))
[('a', 11), ('b', 12), ('c', 13), ('d', 14)]
>>> sorted(my_dict2.items(), key = operator.itemgetter(1))
[('b', 12), ('d', 14), ('a', 31), ('c', 33)]
>>> sorted(my_dict2.items(), key = operator.itemgetter(1), reverse=True)
[('c', 33), ('a', 31), ('d', 14), ('b', 12)]

The operator.itemgetter returns a callable object that fetches the item from its operand using the operand’s __getitem__() method.
If multiple items are specified, returns a tuple of lookup values.
Let's see another example:

>>> my_arr = []
>>> my_arr.append(["A","Z",77])
>>> my_arr.append(["bA","Zc",11])
>>> my_arr.append(["d","c",111])
>>> print(my_arr)
[['A', 'Z', 77], ['bA', 'Zc', 11], ['d', 'c', 111]]
>>> my_arr.sort(key=operator.itemgetter(1))
>>> my_arr
[['A', 'Z', 77], ['bA', 'Zc', 11], ['d', 'c', 111]]
>>> my_arr.sort(key=operator.itemgetter(1,2))
>>> my_arr
[['A', 'Z', 77], ['bA', 'Zc', 11], ['d', 'c', 111]]
>>> my_arr.sort(key=operator.itemgetter(2))
>>> my_arr
[['bA', 'Zc', 11], ['A', 'Z', 77], ['d', 'c', 111]]
>>> my_arr.sort(key=operator.itemgetter(2,1))
>>> my_arr
[['bA', 'Zc', 11], ['A', 'Z', 77], ['d', 'c', 111]]

After using the sort with itemgetter the array is changed it:


>>> i0 = operator.itemgetter(0)
>>> i0(my_arr)
['bA', 'Zc', 11]>>> i1 = operator.itemgetter(1)
>>> i1(my_arr)
['A', 'Z', 77]
>>> i2 = operator.itemgetter(2)
>>> i2(my_arr)
['d', 'c', 111]
>>> print(my_arr)
[['bA', 'Zc', 11], ['A', 'Z', 77], ['d', 'c', 111]]

Monday, July 1, 2019

Python 3.7.3 : Using the Pony python module.

The development team of the Pony project comes with this intro:
Using Pony object-relational mapper you can concentrate on writing the business logic of your application and use Python syntax for interacting with the database. Pony translates such queries into SQL and executes them in the database in the most efficient way.
The Pony python module can be installed on Python 2.7 or Python 3.
Today I tested with the Python 3.7.3 version:

C:\Python373>cd Scripts
C:\Python373\Scripts>pip install pony
Collecting pony
...
Successfully built pony
Installing collected packages: pony
Successfully installed pony-0.7.10

Let's start with one simple example:

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pony
>>> from pony import orm
>>> from pony.orm import *
>>> mydb = Database()
>>> class Person(mydb.Entity):
...     name = Required(str)
...     surname = Required(str)
...     age = Required(int)
...     house = Set('Home')
...
>>> class Home(mydb.Entity):
...     owner = Required(Person)
...     address = Required(str)
...     floors = Required(int)
...     perimeter = Required(int)
...
>>> show(Person)
class Person(Entity):
    id = PrimaryKey(int, auto=True)
    name = Required(str)
    surname = Required(str)
    age = Required(int)
    house = Set(Home)
>>> show(Home)
class Home(Entity):
    id = PrimaryKey(int, auto=True)
    owner = Required(Person)
    address = Required(str)
    floors = Required(int)
    perimeter = Required(int)

Let's see the relationships of this example.
The house variable is set to the type named Home.
The owner variable has type Person.
The Pony supports 4 database types: SQLite, MySQL, PostgreSQL, and Oracle.
Use absolute path, see the error generated by bind function when saving it to the database file:

>>> mydb.bind(provider='sqlite', filename='err_my.sqlite', create_db=True)
...
ValueError: When in interactive mode, please provide absolute file path. Got: 'err_my.sqlite'

The correct way to bind the database is:

mydb.bind(provider='sqlite', filename='C:\\Python373\\mydb.sqlite', create_db=True)

Using the Pony python module with databases is simple to use, see the documentation webpage:

# SQLite
db.bind(provider='sqlite', filename=':memory:')
# or
db.bind(provider='sqlite', filename='database.sqlite', create_db=True)
# PostgreSQL
db.bind(provider='postgres', user='', password='', host='', database='')
# MySQL
db.bind(provider='mysql', host='', user='', passwd='', db='')
# Oracle
db.bind(provider='oracle', user='', password='', dsn='')

The next step is the mapping process for declared entities to the corresponding tables in the database, like: creates tables, foreign key references, and indexes if necessary:

>>> mydb.generate_mapping(create_tables=True)

Let's fill the database using these objects:

data001 = Person(name='John',surname = 'Unknown', age=18)
data002 = Person(name='Unknown',surname = 'Unknown', age=81)
add001 = Home(owner = data001,address = '18', floors = 1, perimeter=0)
add002 = Home(owner = data002,address = '11', floors = 0, perimeter=1)

These objects will be saved only after the commit function is called.

>>> commit()

Let's see the next steps:
- Writing queries:

>>> select (var_p for var_p in Person)

>>> select (var_p for var_p in Person)[:]
[Person[1], Person[2]]
>>> select (var_p for var_p in Person)[:].show()
id|name   |surname|age
--+-------+-------+---
1 |John   |Unknown|18
2 |Unknown|Unknown|81

- Getting objects:

>>> p1 = Person[1]
>>> print(p1)
Person[1]
>>> print(p1.name)
John

- Updating an object:

>>> m = Person.get(name='John')
>>> print(m.age)
18

Even I used an SQLite database, the databases can be used writing queries according to the databases

>>> x = 25
>>> Person.select_by_sql('SELECT * FROM Person p WHERE p.age < $x')
[Person[1]]

Instead of creating models manually, you can use the examples from the Pony distribution package:

>>> from pony.orm.examples.estore import *
...
SELECT "Product"."id", "Product"."name", "Product"."description", "Product"."pic
ture", "Product"."price", "Product"."quantity"
FROM "Product" "Product"
WHERE 0 = 1

COMMIT
COMMIT
PRAGMA foreign_keys = true
CLOSE CONNECTION
RELEASE CONNECTION

You can see the database diagram at the eStore webpage.
This tutorial can be continued with additional pieces of information and examples.
You can read more at Pony ORM webpage.

Thursday, June 20, 2019

Python 3.7.3 : Read and save RSS data from goodreads website.

Today I will show you how to parse data from www.goodreads.com using the feedparser and save all into CSV file.
The Goodreads website comes with hundreds of great book recommendations from fellow readers, beloved authors, and let you add your favorite books.
The main goal was to have a structured link from the RSS file and the CSV file.
This issue was solve with arrays for each type of data.
First, let's install the feedparser python module with the pip tool:

C:\Python373\Scripts>pip install feedparser
Collecting feedparser
...
Successfully built feedparser
Installing collected packages: feedparser
Successfully installed feedparser-5.2.1

You need to get the RSS link with your books from your account.
The example is simple and has commented lines to understand easily how can I solve this issue.
This is the source code for reading all RSS data and put on the CSV file:

import feedparser
import csv
bookread_rss = "your RSS with data account"
feeds = feedparser.parse(bookread_rss)
print ("aditional RSS data")
print (feeds['feed']['title'])
print (feeds['feed']['link'])
print (feeds.feed.subtitle)
print (len(feeds['entries']))
print (feeds.version)
print (feeds.headers)
print (feeds.headers.get('content-type'))
print ("read RSS items")
# empty arrays for values by type
dates = []
titles = []
authors = []
links = []
pages =[]

# create the name of the CSV file
file_csv = 'my_goodreads_books.csv'

# prepare the CSV file with fix for error
# UnicodeEncodeError: 'charmap' codec can't encode character '\u0435' in position
# 30: character maps to 
cvs_out = csv.writer(open(file_csv, 'w',newline='',encoding="utf-8"))

#print(feeds)
for post in feeds.entries:
    date = "%d/%02d/%02d" % (post.published_parsed.tm_year,\
        post.published_parsed.tm_mon, \
        post.published_parsed.tm_mday,)
    # uncomment and will print on console
    #print("___")
    #print("post date: " + date)
    #print("post title: " + post.title)
    #print("post author: " + post.author_name)
    #print("post link: " + post.link)
    #print("post pages: " + post.num_pages)

    dates.append(date)
    titles.append(post.title)
    authors.append(post.author_name)
    links.append(post.link)
    pages.append(post.num_pages)

for d,t,a,l,p in zip(dates,titles,authors,links,pages):
    cvs_out.writerow((d,t,a,l,p))

The result will print you some info, see my example:

C:\Python373>python bookreader_rss_001.py
aditional RSS data
Catalin's bookshelf: all
https://www.goodreads.com/review/list_rss/52019632?key=pyfTLqvJXpg-_ghi4a6ZTZfJV
gLVXC8TcWyaBSyoiScgfXq3&shelf=%23ALL%23
Catalin's bookshelf: all
100
rss20
{'Server': 'Server', 'Date': 'Thu, 20 Jun 2019 12:22:55 GMT', 'Content-Type': 'a
pplication/xml; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'c
lose', 'Status': '200 OK', 'X-Frame-Options': 'ALLOWALL', 'X-XSS-Protection': '1
...

All date will be put into the my_goodreads_books.csv file.

Tuesday, June 18, 2019

Python 3.7.3 : Stemming with nltk.

Today I will start another tutorial about nltk python module and stemming.
The stemming is the process of producing morphological variants of a root/base word.
Stemming programs are commonly referred to as stemming algorithms or stemmers to reduces the words.
Errors in Stemming can be overstemming and understemming.
These two words are stemmed to the same root that are of different stems then the term is overstemming.
When two words are stemmed to same root that are not of different stems then the term used is understemming.
Applications of stemming are used in information retrieval systems like search engines or is used to determine domain vocabularies in domain analysis.
Let install this python module named nltk with pip tool:

C:\Python373\Scripts>pip install nltk
Collecting nltk
...
Successfully installed nltk-3.4.1 six-1.12.0

The nltk python module work with human language data for applying in statistical natural language processing (NLP).
It contains text processing libraries for tokenization, parsing, classification, stemming, tagging, graphical demonstrations, sample data sets, and semantic reasoning.
The next step is to download the models and data, see more at this official webpage.
First run this lines of code to update the nltk python module.

import nltk
nltk.download()

Let's test a simple implementation of stemming words using nltk python module:

from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize 
   
my_porter = PorterStemmer() 
   
quote = "Deep in the human unconscious is a pervasive need for a logical universe that makes sense."

words = word_tokenize(quote) 
   
for w in words: 
    print(w, " : ", my_porter.stem(w))

The result is something like this:

C:\Users\catafest>python stemming_001.py
Deep  :  deep
in  :  in
the  :  the
human  :  human
unconscious  :  unconsci
is  :  is
a  :  a
pervasive  :  pervas
need  :  need
for  :  for
a  :  a
logical  :  logic
universe  :  univers
that  :  that
makes  :  make
sense  :  sens
.  :  .

C:\Users\catafest>

You can read more about the stemming at Wikipedia.

Python 3.7.3 : Using getters and setters in object-oriented.

The main purpose of using getters and setters in object-oriented programs is to ensure data encapsulation.
Let's start with a simple example.
I created a class named my_class init with one variable named my_variable:

self._my_variable = my_variable

A new, initialized instance can be obtained by this line of code:

test_it = my_class()

The example use getter and setter methods to use this variable.

C:\Users\catafest>python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> class my_class:
...     def __init__(self, my_variable = 0):
...          self._my_variable = my_variable
...
...     # getter method
...     def get_my_variable(self):
...         return self._my_variable
...
...     # setter method
...     def set_my_variable(self, x):
...         self._my_variable = x
...
>>> test_it = my_class()
>>> test_it.set_my_variable(1976)
>>> print(test_it.get_my_variable())
1976
>>> print(test_it._my_variable)
1976

In Python property() is a built-in function that creates and returns a property object.
Python has four arguments property: fget, fset, fdel, doc.

fget is a function for retrieving an attribute value;
fset is a fuction for setting an attribute value;
fdel is a function for deleting an attribute value;
doc creates a docstring for attribute.

This line of code will setting the my_variable using setter:

test_it.set_my_variable(1976)

This line of code will retrieving my_variable using getter:

print(test_it.get_my_variable())

A property object has three methods, getter(), setter(), and delete() to specify fget, fset and fdel individually.
How my example changes it:

>>> class my_class:
...      def __init__(self):
...           self._my_variable = 0
...
...      # function to get value of _my_variable
...      def get_my_variable(self):
...          print("getter method called")
...          return self._my_variable
...
...      # function to set value of _my_variable
...      def set_my_variable(self, a):
...          print("setter method called")
...          self._my_variable = a
...
...      # function to delete _my_variable attribute
...      def del_my_variable(self):
...          del self._my_variable
...
...      my_variable = property(get_my_variable, set_my_variable, del_my_variabl
e)
...
>>> test_it = my_class()
>>> test_it.my_variable = 1976
setter method called
>>> print(test_it.my_variable)
getter method called
1976

Using decorator with python @property is one of the built-in decorators.
The main purpose of any decorator is to change your class methods or attributes.
The user of your class no need to make any change in their code.
This is the final result:

>>> class my_class:
...      def __init__(self):
...           self._my_variable = 0
...
...      # using property decorator
...      # a getter function
...      @property
...      def my_variable(self):
...          print("getter method called")
...          return self._my_variable
...
...      # a setter function
...      @my_variable.setter
...      def my_variable(self, my_out):
...          if(my_out < 1976):
...             raise ValueError("... this my_variable has a criteria!!")
...          print("setter method called")
...          self._my_variable = my_out
...
>>> test_it = my_class()
>>> test_it.my_variable = 1979
setter method called
>>> print(test_it.my_variable)
getter method called
1979
>>> test_it.my_variable = 1975
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 16, in my_variable
ValueError: ... this my_variable has a criteria!!

This last example show you how to use @property decorator to create getters and setters in pythonic way.

Monday, June 17, 2019

Python Qt5 : the most simple QTreeWidget - part 002.

This tutorial uses PyQt5 and Python version 3.7.3.
Let's install the PyQt5 python module with the pip tool:

C:\Python373\Scripts>pip install PyQt5
Collecting PyQt5
...
Successfully installed PyQt5-5.12.2 PyQt5-sip-4.19.17

Let's see one simple example with comments about how to use QTreeWidget.

import sys
from PyQt5.QtWidgets import QTreeWidget, QTreeWidgetItem, QApplication, QWidget

if __name__ == '__main__':
    # create a empty my_app application
    my_app = ''
    # test this my_app to create instance
    if QApplication.instance() != None:
        my_app = QApplication.instance()
    else:
        my_app = QApplication(sys.argv)
    # create a QTreeWidgetItem with tree columns
    my_tree= QTreeWidgetItem(["Column A", "Column B", "Column C"])
    # add date using a for loop 
    for i in range(6):
        list_item_row = QTreeWidgetItem(["Child A-" + str(i), "Child B-" + str(i), "Child C-" + str(i)])
        my_tree.addChild(list_item_row)
    # create my_widget widget
    my_widget = QWidget()
    my_widget.resize(640, 180)
    # create a QTreeWidget named my_tree_widget 
    my_tree_widget = QTreeWidget(my_widget)
    # set the size
    my_tree_widget.resize(640, 180)
    # set the number of columns 
    my_tree_widget.setColumnCount(3)
    # add labels for each column 
    my_tree_widget.setHeaderLabels(["Column A label", "Column B label", "Column C label"])
    # add my_tree using addTopLevelItem
    my_tree_widget.addTopLevelItem(my_tree)
    # show the widget
    my_widget.show()
    # the exit of my_app
    sys.exit(my_app.exec_())

This is another simple example written in a simple way to show how versatile are Python and PyQt5.

import sys
from PyQt5.QtWidgets import *
from PyQt5.QtWidgets import QApplication, QWidget, QVBoxLayout, QTreeWidget, QTreeWidgetItem
 
my_app = QApplication(sys.argv)
my_window = QWidget()
my_layout = QVBoxLayout(my_window)
 
my_tree = QTreeWidget()
my_tree.setHeaderLabels(['Name', 'Cost ($)'])
my_item_root = QTreeWidgetItem(my_tree, ['Romania', '238,397 kmp'])
my_item_raw = QTreeWidgetItem(my_item_root, ['Black Sea', '436,402 kmp'])
 
my_layout.addWidget(my_tree)
my_window.show()
sys.exit(my_app.exec_())

If you like my simple tutorials then you subscribe or you can search my other web sites too.