python-catalin

Tuesday, November 5, 2019

Python 3.7.5 : About PEP 3107.

The PEP 3107 introduces a syntax for adding arbitrary metadata annotations to Python functions.
The function annotations refer to syntax parameters with an expression.

def my_function(x: expression, y: expression = 5):
...

For example:

>>> def show(myvar:np.float64):
...     print(type(myvar))
...     print(myvar)
... 
>>> show(1.1)

1.1
>>> def files(filename: str, dot='.') -> list:
...     print(filename)
...     print(type(filename))
... 
>>> files('file.txt')
file.txt

>>> print(files.__annotations__)
{'filename': , 'return': }
>>> print(show.__annotations__)
{'myvar': }
...

You can see the annotation syntax with a dictionary called __annotations__ as an attribute on your functions.
This lets you rewrite Python 3 code with function annotations to be compatible with both Python 3 and Python 2.
Type hints are a specialization of function annotations, and they can also work side by side with other function annotations.
Annotations have no standard meaning or semantics.
There are several benefits to the annotations:

if you rename an argument, the documentation docstring version may be out of date and is easier to see if an argument is not documented;
is no need to come up with a special format of argument because the annotations attribute provides a direct, standard mechanism of access;

Let's see one example using type aliases:


>>> Temperature = float
>>> def forecast(local_temperature: Temperature) -> str:
...     print(local_temperature)
... 
>>> forecast(13.1)
13.1
...

I can create multiple annotations:


>>> def div(a: dict(type=float, help='the dividend'), b: dict(type=float, help='this <> 0)') ) -> 
dict(type=float, help='the result of dividing a by b'):
...     return a / b
... 
>>> div(3,4)
0.75
...

Annotations for excess parameters like *args and **kwargs, allow arbitrary number of arguments to be passed in a function call.
See example with my_func:

def my_func(*args: expression, *kwargs: expression):
...

Annotations combine well with decorators to provide input to a decorator, and decorator-generated wrappers are a good place to put code that gives meaning to annotations, but this is another issue.

Monday, November 4, 2019

Python 3.7.5 : About PEP 506.

Today I did a python evaluation and saw that there are many new aspects that should be kept in mind for a programmer.
So I decided to recall some necessary elements of PEP.
First, PEP stands for Python Enhancement Proposal.
A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment.
My list will not follow a particular order and I will start with PEP 506.
This PEP 506 proposes the addition of a module for common security-related functions such as generating tokens to the Python standard library.
Python 3.6 added a new module called secrets that is designed to provide an obvious way to reliably generate cryptographically strong pseudo-random values suitable for managing secrets, such as account authentication, tokens, and similar.
Python’s random module was never designed for cryptographic but you can try to use it with urandom function:

[mythcat@desk ~]$ python3
Python 3.7.5 (default, Oct 17 2019, 12:09:47) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.urandom(8)

This module named secrets will contain a set of ready-to-use functions for dealing with anything which should remain secret (passwords, tokens, etc.).

>>> import secrets
>>> import string
>>> alphabet = string.ascii_letters + string.digits
>>> password = ''.join(secrets.choice(alphabet) for i in range(20)) 
>>> print(alphabet, password)
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 mwTKhSxGGBMU3voOV1Kf

The secrets module also provides several methods of generating tokens, see example:
As bytes, secrets.token_bytes;

>>> secrets.token_bytes()
b'I\xf9a\xd1j\xc6\xc9\xa0qV\x82\x07x\xc6\xe9\xbb\xd7<\xfb\xb2?\xe1\x94\xe9\xce\xbc\xaaF\xfc7\xfc='
>>> secrets.token_bytes(8)
b'\rl\xb1\xb9\x04i]d'
>>> secrets.token_bytes(16)
b'B!:G\x1c\xdd.\xacC\x7f\x95)\x1f^\xec\xb2'
>>> secrets.token_bytes(32)
b'\xfa\xa9\xff\x91y\x9e+z9\x88K\x95\xa8\xb0\x06\xc2b:\xf5]\xcf^%~\x0cJ\xdd\x80\xa2\xa0\xdc\xaa'
>>> secrets.token_bytes(64)
b"\xe4(\x80d7c6\\\xb2\xd5\xcb\x92\x8a'\x82\xcb\xfd\xcc\x9a\x8a\xd9jt\x84s\xb0\x8f]\x8cS\xdcP\n\xef\x14\xf6\
xe0+0\xaf\xcfL\xd3\xd0\xfe\x04\x98k\xc38\xf6\xad.~\xd1\xca\xd6\xc9\xf9\xbf\xff8O\xad"

As text, using hexadecimal digits, secrets.token_hex;

>>> secrets.token_hex()
'5a2eb8a0a89ecaf5a64e57215f359012eaaf8a3db51bd1ea171e922a24935183'
>>> secrets.token_hex(8)
'79e7582b72711af7'
>>> secrets.token_hex(16)
'9b274380935ae169ebd41159f7b85cf6'
>>> secrets.token_hex(32)
'0a2e5fde42c6578c3ba36501b69a9339e838d44c3240999a83d349d266bcb164'
>>> secrets.token_hex(64)
'fbd9ab627e9fe6c2b6d715b1438205321ac9139f5089fe6ca4ffece79aa0c08aa84a26fdbb984dc48a0489e1692b19d3f5fe40116be
60f1a1d7d61739718befe'

As text, using URL-safe base-64 encoding, secrets.token_urlsafe.

>>> secrets.token_urlsafe()
'L06rX6fIk1n-gpcLbsHq_w5SgkqgGcvnkjBRcOZqgXs'
>>> secrets.token_urlsafe(8)
'lhOw5llcgsQ'
>>> secrets.token_urlsafe(16)
'A493DgcDMiNx8WjlRswxBA'
>>> secrets.token_urlsafe(32)
'HSb5dqkaPrqFcdsQFYW5N_Fxb_Hxn0ESsT4VMfJcLYY'
>>> secrets.token_urlsafe(64)
'FKPC0LU7Sc_dsxm7m-VMA-vTEKgJeNcD2zpjKBEg0oLZlPBVVM0O5Vztp0ySLifyifok5009LByQUc5z8thCWQ'

Saturday, November 2, 2019

Python 3.7.5 : Intro about scikit-learn python module.

This python module named scikit-learn used like sklearn is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy and comes with various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN.
The official webpage can be found here
Let't install this on my Fedora 30 distro:

[mythcat@desk proiecte_github]$ mkdir sklearn_examples
[mythcat@desk proiecte_github]$ cd sklearn_examples/
[mythcat@desk sklearn_examples]$ pip3 install scikit-learn --user
Python 3.7.5 (default, Oct 17 2019, 12:09:47) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> print('sklearn: %s' % sklearn.__version__)
sklearn: 0.21.3

First, this is a complex python module with many examples on web.
You can learn much about how can use simple and efficient all data mining and data analysis.
You can learn a lot about how all data exploitation and data analysis can be used simply and efficiently.
Mathematical functions are simple and complex. How to use python programming and existing examples can be used in several learning points.
I would start with discovering the input and output data sets and then continue with clear examples used daily by us.
I tested today with SVC and sklearn python module.
The SVMs were introduced initially in the 1960s and were later refined in the 1990s.
The base of this algorithm is the decision boundary that maximizes the distance from the nearest data points of all the classes.
The wikipedia article show all informations about support-vector machines (named SVM).
As applications we can use this function in: medical field for cell counting or similar cell quantification, astronomy, etc.
This simple example use multiple kernels and gammas parameters to group the input data.

import numpy as np
from sklearn.datasets import make_blobs
from sklearn import svm
from sklearn.svm import SVC
# importing scikit learn with make_blobs 
from sklearn.datasets.samples_generator import make_blobs 
# 
import matplotlib.pyplot as plt

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

# import some data to play with
iris = datasets.load_iris()
x = iris.data[:, :2]
y = iris.target

def plotSVC(title):
  # create a mesh to plot with dataset x and y
  x_min, x_max = x[:, 0].min() - 1, x[:, 0].max() + 1
  y_min, y_max = x[:, 1].min() - 1, x[:, 1].max() + 1
  # set the resolution by 100 
  h = (x_max / x_min)/100
  # create the meshgrid 
  xx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))
  # divides the current figure into an m-by-n grid and creates axes in the position specified by p
  plt.subplot(1, 1, 1)
  # the model can then be used to predict new values
  Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
  # reshape your test data because prediction needs an array that looks like your training data
  Z = Z.reshape(xx.shape)
  # use plt to show result 
  plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8) 
  plt.scatter(x[:, 0], x[:, 1], c=y, cmap=plt.cm.Paired)
  plt.xlabel('x length')
  plt.ylabel('y width')
  plt.xlim(xx.min(), xx.max())
  plt.title("Plot SVC")
  plt.show()

# create kernels for svg 
kernels = ['linear', 'rbf', 'poly']
# for each kernel show graphs
for kernel in kernels:
  svc = svm.SVC(kernel=kernel).fit(x, y)
  plotSVC('kernel=' + str(kernel))
# create gammas values
# the gamma parameter defines how far the influence of a single training example reaches
gammas = [0.1, 1, 10, 100, 1000]
# for each gammas and kernel rbf - fast processing,  show graphs
for gamma in gammas:
   svc = svm.SVC(kernel='rbf', gamma=gamma).fit(x, y)
   plotSVC('gamma=' + str(gamma))

See the last result for kernel rbf and gamma 1000.

Python 3.7.5 : The ani script with ascii.

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. see Wikipedia.
This is a simple script named ani.py created by me to show an animation with ASCII ...

import os, time
os.system('cls')
filenames = ["0.txt","1.txt","2.txt","3.txt"]
frames = []
for name in filenames:
    with open (name, "r", encoding="utf8") as f:
        frames.append(f.readlines())
"""
for frame in frames:
    print("".join(frame))
    time.sleep(1)
    os.system('clear')
"""
for i in range (4):
    os.system('clear')
    for frame in frames:
        print("".join(frame))
        time.sleep(1)
        os.system('clear')

You need four text files with an 8X8 character matrix format: 0.txt , 1.txt , 2.txt and 3.txt.
The content of these files:

$ cat *.txt
        
 ###### 
        
        
        
        
 ###### 
                 
        
 ###### 
        
        
 ###### 
        
                 
        
        
  ####  
  ####  
        
        
                 
        
        
   ##   
   ##

The end result is a square that shrinks to 4 characters #.

Saturday, October 26, 2019

Python 3.7.4 : About with the PyOpenCL python module.

PyOpenCL lets you access GPUs and other massively parallel compute devices from Python.
It is important to note that OpenCL is not restricted to GPUs.
In fact, no special hardware is required to use OpenCL for computation–your existing CPU is enough.
The documentation of this project can be found at this website.
Let's install the python module for python 3 version:

[mythcat@desk ~]$ pip3 install pyopencl --user
Collecting pyopencl
...
Successfully built pytools
Installing collected packages: pytools, pyopencl
Successfully installed pyopencl-2019.1.1 pytools-2019.1.1

The install of OpenCL driver can be done with these commands:

# get OpenCL driver automated installer (installs kernel 4.7)
curl https://software.intel.com/sites/default/files/managed/f6/77/install_OCL_driver.sh_.txt > install_OCL\
_driver.sh
chmod +x install_OCL_driver.sh
# install OpenCL driver
sudo ./install_OCL_driver.sh install
# check
ls /boot/vmlinuz-*intel*

This is a simple python script to test the opencl context:

import pyopencl as cl
import numpy as np
ctx = cl.create_some_context()
# cet platforms, both CPU and GPU
my_plat= cl.get_platforms()
CPU = my_plat[0].get_devices()
try:
    GPU = my_plat[1].get_devices()
except IndexError:
    GPU = "none"
# create context for GPU/CPU
if GPU != "none":
    ctx = cl.Context(GPU)
else:
    ctx = cl.Context(CPU)
# create queue for each kernel execution
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags

This is another simple python script:

# -*- coding: utf-8 -*-
import pyopencl as cl 
import numpy
a = numpy.random.rand(50000).astype(numpy.float32)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
a_buf = cl.Buffer(ctx ,cl.mem_flags.READ_WRITE,size=a.nbytes)
cl.enqueue_write_buffer(queue, a_buf , a)

prg= cl.Program(ctx,
"""
__kernel void twice(__global float ∗a)
{
 int gid=get_global_id(0);
 a[gid] ∗= 2;
}"""
). build()

prg.twice(queue, a.shape, None,a_buf ).wait()

Sunday, October 20, 2019

Python 3.7.4 : Usinge pytesseract for text recognition.

About this python module named tesseract, you can read here.
I tested with the tesseract tool install on my Fedora 30 distro and python module pytesseract version 0.3.0.

[root@desk mythcat]# dnf install tesseract
Last metadata expiration check: 0:24:18 ago on Sun 20 Oct 2019 10:56:23 AM EEST.
Package tesseract-4.1.0-1.fc30.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!
[root@desk mythcat]# whereis tesseract
tesseract: /usr/bin/tesseract /usr/share/tesseract
[mythcat@desk ~]$ pip3 install pytesseract --user
Collecting pytesseract
...
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.0

I test with many images and texts and works very well.
Text images with a printed font are very well recognized.
This test with this image does not have very good accuracy.

The result of the handwriting image.

[mythcat@desk ~]$ python3 ocr_image.py 001.png 
rake Yous mnislakes,
take you chances,
look silby,

bul hep. mv going
dont freeze up