python-catalin

Monday, July 22, 2019

Python 3.7.3 : The sip python module.

The official webpage pypi.org comes with this intro:
One of the features of Python that makes it so powerful is the ability to take existing libraries, written in C or C++, and make them available as Python extension modules. Such extension modules are often called bindings for the library.

SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries. It was originally developed to create PyQt, the Python bindings for the Qt toolkit, but can be used to create bindings for any C or C++ library.

SIP comprises a code generator and a Python module. The code generator processes a set of specification files and generates C or C++ code which is then compiled to create the bindings extension module. The sip Python module provides support functions to the automatically generated code.
The SIP is copyright (c) Riverbank Computing Limited and its homepage is this webpage.
Support may be obtained from the PyQt mailing list at here.
The SIP is a tool for quickly writing Python modules that interface with C++ and C libraries.
The SIP comprises a code generator and a Python module.
About the code generator:
First, the install:

C:\Python373\Scripts>pip3 install sip
Collecting sip
...
Installing collected packages: sip
Successfully installed sip-4.19.8

If you using the PyQt5 then this version includes a private copy of the module.

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyQt5 import sip

If you want to see backward compatibility the module then needs to imported and will only work if another PyQt5 module is imported first.

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyQt5 import QtCore
>>> import sip

A good example of how can you build with SIP can be seen here.
About the SIP Python module:
This provides support functions to the automatically generated code.
The import python module sip let you use all functions build by the code generator:

>>> dir(sip)
['SIP_VERSION', 'SIP_VERSION_STR', '_C_API', '__doc__', '__file__', '__loader__'
, '__name__', '__package__', '__spec__', '_unpickle_enum', '_unpickle_type', 'as
sign', 'cast', 'delete', 'dump', 'enableautoconversion', 'enableoverflowchecking
', 'getapi', 'isdeleted', 'ispycreated', 'ispyowned', 'setapi', 'setdeleted', 's
etdestroyonexit', 'settracemask', 'simplewrapper', 'transferback', 'transferto',
 'unwrapinstance', 'voidptr', 'wrapinstance', 'wrapper', 'wrappertype']

If you want to improve or create python modules with C or C ++, then this tool can help.

Sunday, July 21, 2019

Python 3.7.3 : The IMDbPY python module version 6.8.

The GitHub official webpage comes with this intro:
IMDbPY is a Python package for retrieving and managing the data of the IMDb movie database about movies, people and companies.
The last release version 6.8 was at 2019 Jul 20.
The official webpage tells us:
In the release 6.8 (codename "Apollo 11") of IMDbPY, multiple parsers were added and fixes; the new search_movie_advanced method allows advanced movie searches...
The changes of the version 6.8 can be found at GitHub webpage and come with these new features:
#224: introduce the search_movie_advanced(title, adult=None, results=None, sort=None, sort_dir=None) method
#145: names are stored in normal format (Name Surname)
#225: remove the obsolete cookie
#182: box office information
#168: parse series and episode number searching for movies
#217: grab poster from search
#218: extract MPAA rating
#220: extract actor headshot from full credits
The install on Python 3.7.3 is easy with pip3 tool:

C:\Python373\Scripts>pip3 install imdbpy
Collecting imdbpy
...
Installing collected packages: imdbpy
Successfully installed imdbpy-6.8

Let's test the new features:

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import imdb
>>> ia = imdb.IMDb()
>>> movies = ia.search_movie_advanced('debby does dallas', adult=True)
>>> print(movies)
...
>>> people = ia.search_person('Clark Gregg')
>>> print(people)

Let's test it:

import imdb
from imdb import IMDb

ia = imdb.IMDb()

# create a file to put the output 
file1 = open("_imdb_data.txt","w", encoding='utf-8') 

# get movies by movie 
# example: Alien 

def get_by_movie():
 my_movie = str(input('Type the movie name: '))
 movies = ia.search_movie_advanced(my_movie, adult=True)
 print(type(movies))
 return movies

# get filmography by id 
filmography_list = []
def get_filmography_by_id(id):
 actor_results = ia.get_person_filmography(id)
 for item in actor_results['data']['filmography']:
  filmography_list.append(str(item))
 return filmography_list

# the main function 
def main():
 a = get_by_movie()
 for i in a:
  print("________________________")
  print("i: ",i)
  # you can uncomment this to test Movie class functions
  #print("Type:",type(i))
  #print("Summary:",i.summary())
  #print("ID: ",i.getID())
  #print("Smart cannonical title: ",i.smartCanonicalTitle())
  #print("caracters ref: ",i.get_charactersRefs())
  #print("current info: ",i.get_current_info())
  #print("cinematographic process: ",i.get('cinematographic process'))
  #print(i["title"])
  #print informations items from Movie class
  print("~~~~~~~~~~~~~~~~~~~~~~~~")
  for k, v in i.items():
   print(k, v)
   # write to the file the value of a
   txt = str(k)+":"+str(v)+"\n" 
   file1.write(txt)
  print("------------------------")
  # add a new line on each movie
  file1.write('-----^-----\n')
 #get filmography by id 
 id_filmography=get_filmography_by_id('0078748')
 # print the filmography
 for item in id_filmography:
  print(item)

 #after write, close the file 
 file1.close()

if __name__ == '__main__':
    main()

This is the first part of the output file named _imdb_data:

title:Alien
certificates:['R']
runtimes:['117']
genres:['Horror', 'Sci-Fi']
rating:8.5
votes:719508
metascore:89
gross:78900000
plot:After a space merchant vessel perceives an unknown transmission as a distress call, its landing on the source moon finds one of the crew attacked by a mysterious lifeform, and they soon realize that its life cycle has merely begun.
directors:[]
cast:[, , , ]
cover url:https://m.media-amazon.com/images/M/MV5BMmQ2MmU3NzktZjAxOC00ZDZhLTk4YzEtMDMyMzcxY2IwMDAyXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UX67_CR0,0,67,98_AL_.jpg
year:1979
kind:movie
canonical title:Alien
long imdb title:Alien (1979)
long imdb canonical title:Alien (1979)
smart canonical title:Alien
smart long imdb canonical title:Alien (1979)
full-size cover url:https://m.media-amazon.com/images/M/MV5BMmQ2MmU3NzktZjAxOC00ZDZhLTk4YzEtMDMyMzcxY2IwMDAyXkEyXkFqcGdeQXVyNzkwMjQ5NzM@.jpg
-----^-----
title:Aliens
certificates:['R']
runtimes:['137']
...

Saturday, July 20, 2019

Python 3.7.3 : Use BeautifulSoup to parse Instagram account.

This example is a bit more complex because it parses the source code in a more particular way depending on it.
The basic idea of this script is to take the content of an Instagram account in the same way as a web browser.
For my account I found a parsing error, I guess the reason is using the points, see festila.george.catalin.

    scripts_content = json.loads(scripts[0].text.strip())
IndexError: list index out of range

In this case comment this line of code and will work:
For the other accounts I've tried, it works very well with the default script.
This is the script I used:

import requests
from bs4 import BeautifulSoup
import json
import re

from pprint import pprint

instagram_url = 'https://instagram.com'
#example user instagram profile_url = sherwoodseries
profile_url=str(input("name of the instagram user: "))


#UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f48b' in posit ion 5022: character maps to 
#fix write text file with  encoding='utf-8'
file1 = open("_shared_data.txt","w", encoding='utf-8') 

#profile_url = 'festila.george.catalin'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(f"{instagram_url}/{profile_url}", headers = headers)

if response.ok:
    html = response.text
    bs_html = BeautifulSoup(html, "html.parser")
    print(bs_html)
    # get info from ... type="application/ld+json">{"@context":"http:\/\/schema.org","@type":"Person","name":
    scripts = bs_html.select('script[type="application/ld+json"]')
    #scripts_content = json.loads(scripts[0].text.strip())
    #pprint(scripts_content)

    #print scripts_content like json 
    #print(json.dumps(scripts_content,indent = 4,sort_keys = True))

    #print just part of source code get by 'script' (0 .. n), see n = 6 
    #print(bs_html.find_all('script')[6])
    script_tag = bs_html.find('script', text=re.compile('window\._sharedData'))
    shared_data = script_tag.string.partition('=')[-1].strip(' ;')

    #get item from shared data, see "language_code":"en"
    rex_item  = re.compile('(?<=\"language_code\":\")[a-zA-Z_\- ]+(?=\")')
    rex_get_item = rex_item.findall(shared_data)  
    print(rex_get_item)
    #get url image from shared data
    rex_url  = re.compile('(?<=\"display_url\":\")[^\s\"]+(?=\")')
    rex_get_url = rex_url.findall(shared_data)  
    print(rex_get_url)
 
    # load like a json 
    result_json = json.loads(shared_data)
    pprint(result_json)
    
    data = bs_html.find_all('meta', attrs={'property': 'og:description'})
    bb = data[0].get('content').split()
    user = '%s %s %s' % (bb[-3], bb[-2], bb[-1])
    # get from bb parts 
    posts = bb[4]
    print('all string: ',bb)
    print('number of posts: ',posts)
    print('name and the user: ',user)

    # write any output show by print into _a.txt file, see example
    #file1.write(str(bs_html.find_all('script')[4]))
    #example: write to _shared_data.txt file the shared_data
    #file1.write(str(shared_data))
#after write, close the file 
#file1.close()

This is a part of the output for sherwoodseries account:

...
all string:  ['95', 'Followers,', '24', 'Following,', '56', 'Posts', '-', 'See',
 'Instagram', 'photos', 'and', 'videos', 'from', 'Sherwood', 'Series', '(@sherwo
odseries)']
number of posts:  56
name and the user:  Sherwood Series (@sherwoodseries)

Thursday, July 18, 2019

Python 3.7.3 : The pandas python module.

Since I started learning python programming language I have not found a more complex and complete module for viewing complex data.
The official documentation of this python module tells us:
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
The official webpage can be found here.
This python module is one of the most popular Python libraries for Data Science and Analytics.
You can install this python module with pip tool:

C:\Python373\Scripts>pip install pandas
Requirement already satisfied: pandas in c:\python373\lib\site-packages (0.24.2)

You can find many tutorials on web with this python module.
Today I will show you a short tutorial about this python module.
Most users use this both python modules:

import numpy as np
import pandas as pd

Most area of the pandas python module has a target into this list:
Window Functions, Aggregations, Missing Data, GroupBy, Merging/Joining, Concatenation, Date Functionality, Timedelta, Categorical Data,
Visualization, IO Tools, Sparse Data, Caveats & Gotchas, Comparison with SQL
There are two types of data structures in pandas: Series and DataFrames.
The pandas Series is a one-dimensional data structure.
The pandas DataFrame is a two (or more) dimensional data structure, like a table
Pandas provide few variants rolling, expanding and exponentially moving weights for window statistics.
Also have the sum, mean, median, variance, covariance, correlation, etc.
The several methods are available to perform aggregations on data.
Pandas provide functions for missing data like the isnull() and notnull().
Let's test the DataFrames with pandas and the Wikipedia example from my tutorial

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> # get table from wikipedia
... import requests
>>> from bs4 import BeautifulSoup
>>> website_url = requests.get('https://en.wikipedia.org/w/index.php?title=Table
_of_food_nutrients').text
>>> soup = BeautifulSoup(website_url,'lxml')
>>>
>>> my_table = soup.find('table',{'class':'wikitable collapsible collapsed'})
>>> links = my_table.findAll('a')
>>> Food = []
>>> for link in links:
...     Food.append(link.get('title'))
...
>>> print(Food)
["Cows' milk (page does not exist)", 'Buttermilk', 'Fortified milk (page does no
t exist)', 'Powdered milk', "Goats' milk", 'Malted milk', 'Hot chocolate', 'Yogu
rt', 'Milk pudding (page does not exist)', 'Custard', 'Ice cream', 'Ice milk', '
Cream', 'Cheese', 'Cheddar cheese', 'American cheese', 'Processed cheese', 'Egg
(food)', 'Scrambled', 'Omelet', 'Yolk']
>>> import pandas
>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df['Foods'] = Food
>>> print(df)
                                   Foods
0       Cows' milk (page does not exist)
1                             Buttermilk
2   Fortified milk (page does not exist)
3                          Powdered milk
4                            Goats' milk
5                            Malted milk
6                          Hot chocolate
7                                 Yogurt
8     Milk pudding (page does not exist)
9                                Custard
10                             Ice cream
11                              Ice milk
12                                 Cream
13                                Cheese
14                        Cheddar cheese
15                       American cheese
16                      Processed cheese
17                            Egg (food)
18                             Scrambled
19                                Omelet
20                                  Yolk
>>> df.describe()
              Foods
count            21
unique           21
top     Malted milk
freq              1
>>> df.apply(pd.Series.value_counts)
                                      Foods
Malted milk                               1
Ice milk                                  1
Omelet                                    1
Goats' milk                               1
Custard                                   1
Cheddar cheese                            1
American cheese                           1
Ice cream                                 1
Yolk                                      1
Cream                                     1
Cows' milk (page does not exist)          1
Yogurt                                    1
Fortified milk (page does not exist)      1
Egg (food)                                1
Powdered milk                             1
Milk pudding (page does not exist)        1
Cheese                                    1
Hot chocolate                             1
Buttermilk                                1
Processed cheese                          1
Scrambled                                 1

The last example is to show data with

>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> ts = pd.Series(np.random.randn(76), index=pd.date_range('1/1/76', periods=76
))
>>> ts.plot()

>>> plt.show()

Wednesday, July 17, 2019

Python 3.7.3 : Using the pipenv tool.

The goal of this tutorial is how to use pipenv and manage dependencies and development environments on a collaborative Python project.
The documentation area can be found here.

C:\Python373>cd Scripts

C:\Python373\Scripts>pip install --user pipenv
Collecting pipenv
...
Successfully installed pipenv-2018.11.26 virtualenv-clone-0.5.3

If you see errors like this:
WARNING: The scripts pipenv-resolver.exe and pipenv.exe are installed in 'C:\Users\....'

which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, 
use --no-warn-script-location.

You need to add this path show into the WARNING to the windows path, see command:

setx path "%path%;C:\Users\..\Python37\Scripts"

Let use this tool to install a requests package:

C:\Python373\Scripts>cd ..

C:\Python373>cd my_flask

C:\Python373\my_flask>pipenv install requests
Creating a virtualenv for this project.
Pipfile: C:\Python373\my_flask\Pipfile
Using c:\python373\python.exe (3.7.3) to create virtualenv.
[  ==] Creating virtual environment...Already using interpreter c:\python373\pyt
hon.exe
Using base prefix 'c:\\python373'
New python executable in C:\Users\catafest\.virtualenvs\my_flask-j9hGDZgP\Script
s\python.exe
Installing setuptools, pip, wheel...
done.
Successfully created virtual environment!
Virtualenv location: C:\Users\catafest\.virtualenvs\my_flask-j9hGDZgP
Creating a Pipfile for this project.
Installing requests.
Adding requests to Pipfile's [packages].
Installation Succeeded
Pipfile.lock not found, creating.
Locking [dev-packages] dependencies.
Locking [packages] dependencies.
Success!
Updated Pipfile.lock (444a6d)!
Installing dependencies from Pipfile.lock (444a6d).
  ================================ 5/5 - 00:00:04
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.

The syntax for the Pipfile is TOML, and the file is separated into sections. [dev-packages] for development-only packages, [packages] for minimally required packages, and [requires] for other requirements like a specific version of Python
If you got this error you need restarts the command prompt:

'pipenv' is not recognized as an internal or external command,
operable program or batch file.

The next command let us see output diagnostic information for use in GitHub issues:

C:\Python373\my_flask>pipenv --support | clip
C:\Python373\my_flask>pipenv --support

Check for security vulnerabilities (and PEP 508 requirements) in your environment with:

C:\Python373\my_flask>pipenv check
Checking PEP 508 requirements.
Passed!
Checking installed package safety.
All good!

Let's use the example from the documentation webpage:

import requests

response = requests.get('https://httpbin.org/ip')

print('Your IP is {0}'.format(response.json()['origin']))

The result works great.

C:\Python373\my_flask>pipenv run python my_ip.py
Your IP is ...

You can print out a tree-like structure showing your dependencies.

C:\Python373\my_flask>pipenv graph
requests==2.22.0
  - certifi [required: >=2017.4.17, installed: 2019.6.16]
...

Using this command will see a reversed tree may be more useful when you are trying to figure out conflicting sub-dependencies.

C:\Python373\my_flask>pipenv graph --reverse
certifi==2019.6.16
  - requests==2.22.0 [requires: certifi>=2017.4.17]
chardet==3.0.4
...

This command installs all the dependencies needed for development:

C:\Python373\my_flask>pipenv install --dev
Installing dependencies from Pipfile.lock (444a6d).
  ================================ 5/5 - 00:00:10
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.

You can uninstall a python module:

pipenv uninstall numpy

You can replace --all with --all-dev to just remove dev packages.
Let’s start a shell in a virtual environment to isolate the development of this app:

C:\Python373\my_flask>pipenv shell
Launching subshell in virtual environment.
Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

(my_flask-j9hGDZgP) C:\Python373\my_flask>

The package name, together with its version and a list of its own dependencies, can be frozen by updating the Pipfile.lock with the lock keyword.

(my_flask-j9hGDZgP) C:\Python373\my_flask>pipenv lock
Locking [dev-packages] dependencies.
Locking [packages] dependencies.
Success!
Updated Pipfile.lock (444a6d)!

Clears caches (pipenv, pip, and pip-tools) with:

pipenv --clear

Remove the virtualenv created:

(my_flask-j9hGDZgP) C:\Python373\my_flask>pipenv --rm

Close the pipenv shell:

(my_flask-j9hGDZgP) C:\Python373>exit

C:\Python373\my_flask>

This tool requires special attention in developing and programming with Python, so we have only traveled some of its real possibilities.

Tuesday, July 16, 2019

Python 3.7.3 : Using the werkzeug.

From the official webpage, you can see all the features of this python module:
Werkzeug is a comprehensive WSGI web application library. It began as a simple collection of various utilities for WSGI applications and has become one of the most advanced WSGI utility libraries.
It includes:
An interactive debugger that allows inspecting stack traces and source code in the browser with an interactive interpreter for any frame in the stack.
A full-featured request object with objects to interact with headers, query args, form data, files, and cookies.
A response object that can wrap other WSGI applications and handle streaming data.
A routing system for matching URLs to endpoints and generating URLs for endpoints, with an extensible system for capturing variables from URLs.
HTTP utilities to handle entity tags, cache control, dates, user agents, cookies, files, and more.
A threaded WSGI server for use while developing applications locally.
A test client for simulating HTTP requests during testing without requiring running a server.
The documentation of this python module can be found here.
You can install this python module with the pip tool:

pip install -U Werkzeug

This module can install it if you install the flask python module with python 3.7.3.
Let's test it:

C:\Python373>python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from werkzeug.wrappers import Request, Response
>>>
>>> @Request.application
... def application(request):
...     return Response('Hello, World!')
...
>>> if __name__ == '__main__':
...     from werkzeug.serving import run_simple
...     run_simple('localhost', 4000, application)
...
 * Running on http://localhost:4000/ (Press CTRL+C to quit)
127.0.0.1 - - [16/Jul/2019 13:29:44] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [16/Jul/2019 13:29:46] "GET /favicon.ico HTTP/1.1" 200 -

If you run this source code and open your browser with this http://localhost:4000/ address then you will see a text with Hello, World!.
The common uses of this python package are:

>>> from werkzeug.wrappers import Request, Response
>>> from werkzeug.routing import Map, Rule
>>> from werkzeug.exceptions import HTTPException, NotFound
>>> from werkzeug.wsgi import SharedDataMiddleware
>>> from werkzeug.formparser import parse_form_data
>>> from werkzeug.utils import redirect
>>> from werkzeug.utils import escape
>>> from werkzeug.serving import run_simple

Let's test another simple application with werkzeug python module:

from werkzeug.serving import run_simple
from werkzeug.wrappers import Request
from werkzeug.wrappers import Response

class Application(object):
    def __init__(self):
        print('init Application class')

    def dispatch_request(self, request):
        return Response("Logged in as %s" % request.authorization)

    def __call__(self, environ, start_response):
        request = Request(environ)
        auth = request.authorization
        response = self.dispatch_request(request)
        return response(environ, start_response)

if __name__ == "__main__":
    application = Application()
    run_simple("localhost", 5000, application)

The result will be this:

C:\Python373>python.exe werkzeug_001.py
init Application class
 * Running on http://localhost:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [16/Jul/2019 14:05:31] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [16/Jul/2019 14:05:32] "GET /favicon.ico HTTP/1.1" 200 -

You can add more features to this source code and check with werkzeug features of Web Server Gateway Interface (WSGI).
A full list with examples can be found here.

Monday, July 15, 2019

Python 3.7.3 : Programming Krita.

Today I wrote a python tutorial about Krita software and programming python.
The Krita software use python version 3.6.2.

==== Warning: Script not saved! ====
3.6.2 (v3.6.2:5fd33b5, Jul  8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]

The full tutorial can be found at my Blogspot (a Blogspot about the graphics area).

Sunday, July 14, 2019

Python 3.7.3 : Simple tests with zipfile python module.

You can read about this python module here.
The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.

This module does not currently handle multi-disk ZIP files. It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size). It supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file. Decryption is extremely slow as it is implemented in native Python rather than C.

C:\Python373> python -m zipfile -c test.zip test.html  textalongpath.pdf
C:\Python373> python -m zipfile -c test_folder.zip test.html  temp
C:\Python373>python -m zipfile -l  test.zip
File Name                                             Modified             Size
test.html                                      2019-07-11 10:46:58         6115
textalongpath.pdf                              2019-06-08 22:55:50           84

C:\Python373>python -m zipfile -l  test_folder.zip
File Name                                             Modified             Size
test.html                                      2019-07-11 10:46:58         6115
temp/                                          2019-07-07 21:36:42            0
temp/wlop/                                     2019-07-07 21:36:42            0

This lines of code will create two archives named test.zip and test_folder.zip with the files shown on each command.
For extraction, is need to use the -e argument:

C:\Python373>python -m zipfile -e test.zip zipfiles/

C:\Python373>python -m zipfile -e test_folder.zip zipfiles/

C:\Python373>cd zipfiles

C:\Python373\zipfiles>dir
...

Let's using this python module inside python:

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> import datetime
>>> my_zip = zipfile.ZipFile('test.zip','r')
>>> print (my_zip.namelist())
['test.html', 'textalongpath.pdf']
>>> def print_info(archive_name):
...     my_zip = zipfile.ZipFile(archive_name)
...     for info in my_zip.infolist():
...             print (info.filename)
...             print ('Comment: ', info.comment)
...             print ('Modified: ', datetime.datetime(*info.date_time))
...             print ('System: ', info.create_system, '(0 = Windows, 3 = Unix)'
)
...             print ('ZIP version: ', info.create_version)
...             print ('Compressed: ', info.compress_size, 'bytes')
...             print ('Uncompressed: ', info.file_size, 'bytes')
...
>>> print_info('test_folder.zip')
test.html
Comment:  b''
Modified:  2019-07-11 10:46:58
System:  0 (0 = Windows, 3 = Unix)
ZIP version:  20
Compressed:  1679 bytes
Uncompressed:  6115 bytes
temp/
Comment:  b''
Modified:  2019-07-07 21:36:42
...

Extract all files from an archive:

>>> from zipfile import ZipFile
>>> with ZipFile('test.zip','r') as zipObj:
...     zipObj.extractall()

Extract files by extension:

>>> with ZipFile('test.zip', 'r') as zipObj:
...    listOfFileNames = zipObj.namelist()
...    for fileName in listOfFileNames:
...        if fileName.endswith('.html'):
...            zipObj.extract(fileName, 'new.html')
...
'new.html\\test.html'

Create a new arhive named The_new.zip and add the new.html file on it.

>>> zipObj = ZipFile('The_new.zip','w')
>>> zipObj.write('new.html')
>>> zipObj.close()
>>> print_info('The_new.zip')
new.html/
Comment:  b''
Modified:  2019-07-14 22:15:58
System:  0 (0 = Windows, 3 = Unix)
ZIP version:  20
Compressed:  0 bytes
Uncompressed:  0 bytes

Thursday, July 11, 2019

Python 3.7.3 : Three examples with BeautifulSoup.

Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree., see the pypi webpage.
This python module was created by Leonard Richardson.
A large definition can be this:
Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
This python module can do that but the input format and output format is different.
The input can be a webpage like an URL or webpage with all pieces of information and the output depends by the this and the user choices.
Les's see some examples:
First example show you how to take content of the first row table from a wikipedia webpage.

# get table from wikipedia 
import requests
from bs4 import BeautifulSoup
website_url = requests.get('https://en.wikipedia.org/w/index.php?title=Table_of_food_nutrients').text
soup = BeautifulSoup(website_url,'lxml')

my_table = soup.find('table',{'class':'wikitable collapsible collapsed'})
links = my_table.findAll('a')
Food = []
for link in links:
    Food.append(link.get('title'))

print(Food)

The next example takes all files from a page


# get links using the url
import urllib
from bs4 import BeautifulSoup
page = urllib.request.urlopen('http://____share.net/filmes/').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', href=True):
    print (anchor['href'])

The last example takes all images from the search query of imgur website:

# get images from imgur search query
import urllib
from bs4 import BeautifulSoup
url = 'https://imgur.com/search/score?q=cyborg'
with urllib.request.urlopen(url) as f:
    soup = BeautifulSoup(f.read(),'lxml')

a_tags = soup.findAll("a",{"class":"image-list-link"})
img_tags = [a.find("img") for a in a_tags]
print(img_tags)
srcs = []
for s in img_tags:
    src_tags=('http:'+s['src'])
    srcs.append(src_tags)

print(srcs)

As a conclusion, this module will pose problems for those who do not understand how to scroll through the source code, the content of web pages, how to read 'lxml', 'page', etc.
It will greatly help your Chrome F12 key to access parts of web content.

Python 3.7.3 : Testing the Bokeh python module.

This python module has a beautiful website:
Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
Let's install this python module with the pip tool:

C:\Python373>cd Scripts

C:\Python373\Scripts>pip install bokeh
Collecting bokeh
...
Successfully built bokeh
Installing collected packages: PyYAML, tornado, bokeh
Successfully installed PyYAML-5.1.1 bokeh-1.2.0 tornado-6.0.3

Let's test it with a simple example:

from bokeh.plotting import figure, output_file, show

output_file("test.html")
plot = figure()
plot.line([1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 7, 6, 1, 5, 6, 7, 9, 1], line_width=2)
show(plot)

This will create a file into my python folder C:/Python373/test.html.
The HTML webpage comes with the graph and additional tool like Pan, Box Zoom, Wheel Zoom, Save, Reset and help.
If you need sample data that is not included in the Bokeh GitHub repository or released packages then you need to download it.

>>> import bokeh.sampledata
>>> bokeh.sampledata.download()
Creating C:\Users\catafest\.bokeh directory
Creating C:\Users\catafest\.bokeh\data directory
Using data directory: C:\Users\catafest\.bokeh\data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3171836 bytes)
    229376 [  7.23%]

Bokeh comes with support for working with Geographical data: Mercator, Google Maps, GeoJSON Data.
You can see all example into the webpage gallery.

Wednesday, July 10, 2019

Python 3.7.3 : About python version 3.7.3.

All versions of python come with many features and changes with every released version.
A full list of these changes can be found at PEP official webpage and this documentation webpage.
The goal of these tutorials is to fix the learning area by each python version and have a good picture of these features.
Let's start with the first step - python modules.
Several of the standard library Python packages have been reorganized or moved with a few notable changes:

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> import profile
>>> import urllib
>>> import urllib3

The division math operation has new features to explicitly convert integers to floats when working with integer variables:

>>> one = 1
>>> two = 2
>>> three = 3
>>> float(three)
3.0
>>> one/two
0.5
>>> one//three
0
>>> one//two
0
>>> float(one)/three
0.3333333333333333
>>>

You can use the pathlib library which provides the Path() object to fulfill all your path manipulation needs.

>>> from pathlib import Path
>>> folder1 = Path('/folder1')
>>> config_path = folder1 / 'subfolder1'
>>> config_path
WindowsPath('/folder1/subfolder1')
>>> str(config_path)
'\\folder1\\subfolder1'
>>> config_path.name
'subfolder1'

You can use operators with matrix:

>>> import numpy as np
>>> x = np.array([[11, 33], [22, 55]])
>>> y = np.array([[1, 3], [2, 5]])
>>> x @ y
array([[ 77, 198],
       [132, 341]])
>>> x * y
array([[ 11,  99],
       [ 44, 275]])

The list and dictionaries can easily be emptied using the .clear method:

>>> my_list = ['a','b','c']
>>> my_list.clear()

The print function is changed.

>>> print('hello')
hello
>>> a = ''
>>> f = open('my_file.txt', 'w')
>>> print(a, file=f)
>>> f.close()

The function annotations can provide information on inputs/outputs:

>>> def a_to_b(x: str) -> str:
...     return x.replace('a','b')
...
>>> a_to_b("abcdcba!?!")
'bbcdcbb!?!'

Fix the sensible comparison:

>>> 'True' > True
Traceback (most recent call last):
  File "", line 1, in 
TypeError: '>' not supported between instances of 'str' and 'bool'

With Python 3.6 we have a new type of strings: f-strings and string interpolation:

>>> var = 76/3
>>> f'The value is {var}.'
'The value is 25.333333333333332.'

You can use underscores in numbers:

>>> int_a = 1_000_000_000
>>> hex_b = 0b_0011_1111_0100_1110
>>> print(int_a,hex_b)
1000000000 16206

The new Unicode strings and variable (including emoji) names to be used.
An LRU cache decorator for your functions: functools.lru_cache.
An enumerated type in the standard library: Enum.
Use the standard ipaddress:

>>> import ipaddress
>>> ipaddress.ip_address('192.168.0.1')
IPv4Address('192.168.0.1')
>>> ipaddress.ip_address('2001:db8::')
IPv6Address('2001:db8::')

In Python 3, decimals are rounded to the nearest even number (.5).
The input() function was fixed in Python 3 so that it always stores the user inputs as str objects.
In Python 3, the range() was implemented like the xrange() in older version.
That range got a new __contains__ method in Python 3.x.
You can simply convert the iterable object into a list via the list() function.

>>> print(range(3))
range(0, 3)
>>> print(type(range(3)))

>>> print(list(range(3)))
[0, 1, 2]

With advanced unpacking and range you can do this:

>>> a, b = range(2)
>>> print(a,b)
0 1
>>> a, b, *rest = range(6)
>>> print(a,b)
0 1
>>> print(rest)
[2, 3, 4, 5]
>>> a, *rest, b = range(6)
>>> print( a,b, rest)
0 5 [1, 2, 3, 4]

Get the first and the last of the open file with:

first, *_, last = f.readlines()

Keyword only arguments can be done with:

def f(a, b, *args, option=True):

The only way to access it is to explicitly call f(a, b, option=True).
If you don't want to collect *args the use this:

def f(a, b, *, option=True):

You can just use os.stat(file, follow_symlinks=False) instead of os.lstat.
The next function can be call only into this way:

>>> my_generator = (letter for letter in 'abcd')
>>> next(my_generator)
'a'
>>> next(my_generator)
'b'

The for-loop variables don’t leak into the global namespace anymore:

>>> i = 1
>>> print('comprehension:', [i for i in range(6)])
comprehension: [0, 1, 2, 3, 4, 5]
>>> print('i is ', i)
i is  1

The async and await are now reserved keywords.
More useful exceptions and also change the comma with the keyword as:

>>> try:
...     f = open('my_file.txt')
... except OSError as e:
...     if e.errno == errno.ENOENT:
...             #

...     else:

...     raise

In Python 3, the .keys() method instead returns an iterator object instead of a list.

>>> my_dict = {'a': 11, 'b': 12, 'c': 13, 'd': 14}
>>> my_dict.keys()
dict_keys(['a', 'b', 'c', 'd'])
>>> my_dict_keys=list(my_dict.keys())
>>> my_dict_keys[3]
'd'
>>> my_dict.keys()[3]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'dict_keys' object is not subscriptable

Keyword-only arguments and positional parameters are valid in Python 3.7.3.
The chained Exceptions provide by python 3.7.3 has more information and the original exception is printed out, along with the original traceback.

>>> raise exception from e
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'exception' is not defined
>>> raise NotImplementedError from OSError
OSError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "", line 1, in 
NotImplementedError

These were briefly some information about python 3.7.3 that might be of use to you.

Tuesday, July 9, 2019

Python 3.7.3 : Testing Wagtail.

The Wagtail is a beautiful project and can be integrated with Django.
In this tutorial, I will show you the steps for the first install of a basic website.

C:\Python373>python -m virtualenv venv_wagtail
Using base prefix 'C:\\Python373'
New python executable in C:\Python373\venv_wagtail\Scripts\python.exe
Installing setuptools, pip, wheel...
done.

C:\Python373>venv_wagtail\Scripts\activate.bat

(venv_wagtail) C:\Python373>pip install wagtail
Collecting wagtail
...
Successfully built django-treebeard draftjs-exporter
Installing collected packages: sqlparse, pytz, Django, django-taggit, chardet, i
dna, urllib3, certifi, requests, django-treebeard, draftjs-exporter, beautifulso
up4, django-modelcluster, djangorestframework, Willow, six, Pillow, webencodings
, html5lib, Unidecode, wagtail
Successfully installed Django-2.2.3 Pillow-5.4.1 Unidecode-1.1.1 Willow-1.1 beau
tifulsoup4-4.6.0 certifi-2019.6.16 chardet-3.0.4 django-modelcluster-4.4 django-
taggit-0.24.0 django-treebeard-4.3 djangorestframework-3.9.4 draftjs-exporter-2.
1.6 html5lib-1.0.1 idna-2.8 pytz-2019.1 requests-2.22.0 six-1.12.0 sqlparse-0.3.
0 urllib3-1.25.3 wagtail-2.5.1 webencodings-0.5.1

(venv_wagtail) C:\Python373>wagtail start mysite
Creating a Wagtail project called mysite

(venv_wagtail) C:\Python373>cd mysite

(venv_wagtail) C:\Python373\mysite>python manage.py migrate

(venv_wagtail) C:\Python373\mysite>python manage.py createsuperuser
Username (leave blank to use 'catafest'):
Email address: catafest@yahoo.com
Password:
Password (again):
This password is too short. It must contain at least 8 characters.
Bypass password validation and create user anyway? [y/N]: n
Password:
Password (again):
Superuser created successfully.

(venv_wagtail) C:\Python373\mysite>manage.py runserver
Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).
July 09, 2019 - 21:20:46
Django version 2.2.3, using settings 'mysite.settings.dev'
Starting development server at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.

Success! mysite has been created

Let's see the result of the first Wagtail test on my YouTube channel:

Python 3.7.3 : The python-slugify python module.

This python module named python-slugify can handle Unicode.
You can see this python module source code and examples at GITHUB webpage.
The install step with pip python tool is easy:

C:\Python373>cd Scripts
C:\Python373\Scripts>pip install python-slugify
Collecting python-slugify
...
Successfully built python-slugify
Installing collected packages: text-unidecode, python-slugify
Successfully installed python-slugify-3.0.2 text-unidecode-1.2

Let's see some simple example.

>>> from slugify import slugify
>>> txt = "___This is a test___"
>>> regex_pattern = r'[^-a-z0-9_]+'
>>> r = slugify(txt, regex_pattern=regex_pattern)
>>> print(r)
___this-is-a-test___

Remove an email address from a string:

>>> txt_email = "___My mail is catafest@yahoo.com *&@!@#$$76"
>>> regex_pattern =r'[\w\.]+\@[\w]+(?:\.[\w]{3}|\.[\w]{2}\.[\w]{2})\b'
>>> r = slugify(txt_email, regex_pattern=regex_pattern)
>>> print(r)
___my mail is - *&@!@#$$76

You can write your data into one text file:

>>> ftxt = open('C:\\Python373\\soup.txt','w')
>>> ftxt.write(soup)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: write() argument must be str, not BeautifulSoup
>>> ftxt.write(str(soup))

If you got errors the use the slugify to fix it:

ftxt.write(slugify(str(soup)))