python-catalin

Tuesday, July 23, 2019

Python 3.7.3 : Using the flask - part 001.

A short intro into this python module can be found at the PyPI website:
Flask is a lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications. It began as a simple wrapper around Werkzeug and Jinja and has become one of the most popular Python web application frameworks.

Flask offers suggestions but doesn’t enforce any dependencies or project layout. It is up to the developer to choose the tools and libraries they want to use. There are many extensions provided by the community that makes adding new functionality easy.
The reason I used a series of tutorials with this python module is the complexity of the features of this python module.
Let's briefly outline some of the essential aspects of flask programming.

Flask is a simple, lightweight, and minimalist web framework;
Flask is developed based on the Jinja2 template engine;
Flask depends on the Jinja template engine and the Werkzeug WSGI toolkit;
Flask does not provide a built-in ORM system (Object Relation Mapping);
in Flask web applications to perform CRUD operations on a database can be tedious (see: ORM techniques of Flask-SQLAlchemy);
The Flask has a simple and customizable architecture;
the Flask to accelerate the development of simple websites that use static content;
has the option to extend and customize Flask according to precise project requirements;

A list of companies using the Flask framework - who is using Flask?
The install process is very simple using the pip tool:

C:\Python373>cd Scripts

C:\Python373\Scripts>pip install flask
Collecting flask
...
Installing collected packages: itsdangerous, Werkzeug, flask
Successfully installed Werkzeug-0.15.4 flask-1.1.1 itsdangerous-1.1.0

Let's start initializing the first flask application into the server.py file:

from flask import Flask 
app = Flask (__name__)
# create a wrarp of localhost
@app.route('/')
def home():
 return 'Hello world'
# the default name main 
if __name__ == '__main__':
 app.run()

About Routing and Variable Rules.
This can change it into the server.py script like this:

@app.route('/about')
def about():
 return 'The about page'
@app.route('/blog')
def blog():
 return 'This is the blog'

The next step is the variable rule issue:

# creaza a regula variabila ( variable rules) 
# the type of the variable can be set, see int 
# @app.route('/blog/')
@app.route('/blog/')
def blogpost(blog_id):
 return 'This is the post '+str(blog_id)

To run it, just use:

C:\Python373\my_flask>python server.py
 * Serving Flask app "server" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

The full source code can be found at my GitHub account.

Monday, July 22, 2019

Python 3.7.3 : The sip python module.

The official webpage pypi.org comes with this intro:
One of the features of Python that makes it so powerful is the ability to take existing libraries, written in C or C++, and make them available as Python extension modules. Such extension modules are often called bindings for the library.

SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries. It was originally developed to create PyQt, the Python bindings for the Qt toolkit, but can be used to create bindings for any C or C++ library.

SIP comprises a code generator and a Python module. The code generator processes a set of specification files and generates C or C++ code which is then compiled to create the bindings extension module. The sip Python module provides support functions to the automatically generated code.
The SIP is copyright (c) Riverbank Computing Limited and its homepage is this webpage.
Support may be obtained from the PyQt mailing list at here.
The SIP is a tool for quickly writing Python modules that interface with C++ and C libraries.
The SIP comprises a code generator and a Python module.
About the code generator:
First, the install:

C:\Python373\Scripts>pip3 install sip
Collecting sip
...
Installing collected packages: sip
Successfully installed sip-4.19.8

If you using the PyQt5 then this version includes a private copy of the module.

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyQt5 import sip

If you want to see backward compatibility the module then needs to imported and will only work if another PyQt5 module is imported first.

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyQt5 import QtCore
>>> import sip

A good example of how can you build with SIP can be seen here.
About the SIP Python module:
This provides support functions to the automatically generated code.
The import python module sip let you use all functions build by the code generator:

>>> dir(sip)
['SIP_VERSION', 'SIP_VERSION_STR', '_C_API', '__doc__', '__file__', '__loader__'
, '__name__', '__package__', '__spec__', '_unpickle_enum', '_unpickle_type', 'as
sign', 'cast', 'delete', 'dump', 'enableautoconversion', 'enableoverflowchecking
', 'getapi', 'isdeleted', 'ispycreated', 'ispyowned', 'setapi', 'setdeleted', 's
etdestroyonexit', 'settracemask', 'simplewrapper', 'transferback', 'transferto',
 'unwrapinstance', 'voidptr', 'wrapinstance', 'wrapper', 'wrappertype']

If you want to improve or create python modules with C or C ++, then this tool can help.

Sunday, July 21, 2019

Python 3.7.3 : The IMDbPY python module version 6.8.

The GitHub official webpage comes with this intro:
IMDbPY is a Python package for retrieving and managing the data of the IMDb movie database about movies, people and companies.
The last release version 6.8 was at 2019 Jul 20.
The official webpage tells us:
In the release 6.8 (codename "Apollo 11") of IMDbPY, multiple parsers were added and fixes; the new search_movie_advanced method allows advanced movie searches...
The changes of the version 6.8 can be found at GitHub webpage and come with these new features:
#224: introduce the search_movie_advanced(title, adult=None, results=None, sort=None, sort_dir=None) method
#145: names are stored in normal format (Name Surname)
#225: remove the obsolete cookie
#182: box office information
#168: parse series and episode number searching for movies
#217: grab poster from search
#218: extract MPAA rating
#220: extract actor headshot from full credits
The install on Python 3.7.3 is easy with pip3 tool:

C:\Python373\Scripts>pip3 install imdbpy
Collecting imdbpy
...
Installing collected packages: imdbpy
Successfully installed imdbpy-6.8

Let's test the new features:

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import imdb
>>> ia = imdb.IMDb()
>>> movies = ia.search_movie_advanced('debby does dallas', adult=True)
>>> print(movies)
...
>>> people = ia.search_person('Clark Gregg')
>>> print(people)

Let's test it:

import imdb
from imdb import IMDb

ia = imdb.IMDb()

# create a file to put the output 
file1 = open("_imdb_data.txt","w", encoding='utf-8') 

# get movies by movie 
# example: Alien 

def get_by_movie():
 my_movie = str(input('Type the movie name: '))
 movies = ia.search_movie_advanced(my_movie, adult=True)
 print(type(movies))
 return movies

# get filmography by id 
filmography_list = []
def get_filmography_by_id(id):
 actor_results = ia.get_person_filmography(id)
 for item in actor_results['data']['filmography']:
  filmography_list.append(str(item))
 return filmography_list

# the main function 
def main():
 a = get_by_movie()
 for i in a:
  print("________________________")
  print("i: ",i)
  # you can uncomment this to test Movie class functions
  #print("Type:",type(i))
  #print("Summary:",i.summary())
  #print("ID: ",i.getID())
  #print("Smart cannonical title: ",i.smartCanonicalTitle())
  #print("caracters ref: ",i.get_charactersRefs())
  #print("current info: ",i.get_current_info())
  #print("cinematographic process: ",i.get('cinematographic process'))
  #print(i["title"])
  #print informations items from Movie class
  print("~~~~~~~~~~~~~~~~~~~~~~~~")
  for k, v in i.items():
   print(k, v)
   # write to the file the value of a
   txt = str(k)+":"+str(v)+"\n" 
   file1.write(txt)
  print("------------------------")
  # add a new line on each movie
  file1.write('-----^-----\n')
 #get filmography by id 
 id_filmography=get_filmography_by_id('0078748')
 # print the filmography
 for item in id_filmography:
  print(item)

 #after write, close the file 
 file1.close()

if __name__ == '__main__':
    main()

This is the first part of the output file named _imdb_data:

title:Alien
certificates:['R']
runtimes:['117']
genres:['Horror', 'Sci-Fi']
rating:8.5
votes:719508
metascore:89
gross:78900000
plot:After a space merchant vessel perceives an unknown transmission as a distress call, its landing on the source moon finds one of the crew attacked by a mysterious lifeform, and they soon realize that its life cycle has merely begun.
directors:[]
cast:[, , , ]
cover url:https://m.media-amazon.com/images/M/MV5BMmQ2MmU3NzktZjAxOC00ZDZhLTk4YzEtMDMyMzcxY2IwMDAyXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UX67_CR0,0,67,98_AL_.jpg
year:1979
kind:movie
canonical title:Alien
long imdb title:Alien (1979)
long imdb canonical title:Alien (1979)
smart canonical title:Alien
smart long imdb canonical title:Alien (1979)
full-size cover url:https://m.media-amazon.com/images/M/MV5BMmQ2MmU3NzktZjAxOC00ZDZhLTk4YzEtMDMyMzcxY2IwMDAyXkEyXkFqcGdeQXVyNzkwMjQ5NzM@.jpg
-----^-----
title:Aliens
certificates:['R']
runtimes:['137']
...

Saturday, July 20, 2019

Python 3.7.3 : Use BeautifulSoup to parse Instagram account.

This example is a bit more complex because it parses the source code in a more particular way depending on it.
The basic idea of this script is to take the content of an Instagram account in the same way as a web browser.
For my account I found a parsing error, I guess the reason is using the points, see festila.george.catalin.

    scripts_content = json.loads(scripts[0].text.strip())
IndexError: list index out of range

In this case comment this line of code and will work:
For the other accounts I've tried, it works very well with the default script.
This is the script I used:

import requests
from bs4 import BeautifulSoup
import json
import re

from pprint import pprint

instagram_url = 'https://instagram.com'
#example user instagram profile_url = sherwoodseries
profile_url=str(input("name of the instagram user: "))


#UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f48b' in posit ion 5022: character maps to 
#fix write text file with  encoding='utf-8'
file1 = open("_shared_data.txt","w", encoding='utf-8') 

#profile_url = 'festila.george.catalin'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(f"{instagram_url}/{profile_url}", headers = headers)

if response.ok:
    html = response.text
    bs_html = BeautifulSoup(html, "html.parser")
    print(bs_html)
    # get info from ... type="application/ld+json">{"@context":"http:\/\/schema.org","@type":"Person","name":
    scripts = bs_html.select('script[type="application/ld+json"]')
    #scripts_content = json.loads(scripts[0].text.strip())
    #pprint(scripts_content)

    #print scripts_content like json 
    #print(json.dumps(scripts_content,indent = 4,sort_keys = True))

    #print just part of source code get by 'script' (0 .. n), see n = 6 
    #print(bs_html.find_all('script')[6])
    script_tag = bs_html.find('script', text=re.compile('window\._sharedData'))
    shared_data = script_tag.string.partition('=')[-1].strip(' ;')

    #get item from shared data, see "language_code":"en"
    rex_item  = re.compile('(?<=\"language_code\":\")[a-zA-Z_\- ]+(?=\")')
    rex_get_item = rex_item.findall(shared_data)  
    print(rex_get_item)
    #get url image from shared data
    rex_url  = re.compile('(?<=\"display_url\":\")[^\s\"]+(?=\")')
    rex_get_url = rex_url.findall(shared_data)  
    print(rex_get_url)
 
    # load like a json 
    result_json = json.loads(shared_data)
    pprint(result_json)
    
    data = bs_html.find_all('meta', attrs={'property': 'og:description'})
    bb = data[0].get('content').split()
    user = '%s %s %s' % (bb[-3], bb[-2], bb[-1])
    # get from bb parts 
    posts = bb[4]
    print('all string: ',bb)
    print('number of posts: ',posts)
    print('name and the user: ',user)

    # write any output show by print into _a.txt file, see example
    #file1.write(str(bs_html.find_all('script')[4]))
    #example: write to _shared_data.txt file the shared_data
    #file1.write(str(shared_data))
#after write, close the file 
#file1.close()

This is a part of the output for sherwoodseries account:

...
all string:  ['95', 'Followers,', '24', 'Following,', '56', 'Posts', '-', 'See',
 'Instagram', 'photos', 'and', 'videos', 'from', 'Sherwood', 'Series', '(@sherwo
odseries)']
number of posts:  56
name and the user:  Sherwood Series (@sherwoodseries)

Thursday, July 18, 2019

Python 3.7.3 : The pandas python module.

Since I started learning python programming language I have not found a more complex and complete module for viewing complex data.
The official documentation of this python module tells us:
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
The official webpage can be found here.
This python module is one of the most popular Python libraries for Data Science and Analytics.
You can install this python module with pip tool:

C:\Python373\Scripts>pip install pandas
Requirement already satisfied: pandas in c:\python373\lib\site-packages (0.24.2)

You can find many tutorials on web with this python module.
Today I will show you a short tutorial about this python module.
Most users use this both python modules:

import numpy as np
import pandas as pd

Most area of the pandas python module has a target into this list:
Window Functions, Aggregations, Missing Data, GroupBy, Merging/Joining, Concatenation, Date Functionality, Timedelta, Categorical Data,
Visualization, IO Tools, Sparse Data, Caveats & Gotchas, Comparison with SQL
There are two types of data structures in pandas: Series and DataFrames.
The pandas Series is a one-dimensional data structure.
The pandas DataFrame is a two (or more) dimensional data structure, like a table
Pandas provide few variants rolling, expanding and exponentially moving weights for window statistics.
Also have the sum, mean, median, variance, covariance, correlation, etc.
The several methods are available to perform aggregations on data.
Pandas provide functions for missing data like the isnull() and notnull().
Let's test the DataFrames with pandas and the Wikipedia example from my tutorial

C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> # get table from wikipedia
... import requests
>>> from bs4 import BeautifulSoup
>>> website_url = requests.get('https://en.wikipedia.org/w/index.php?title=Table
_of_food_nutrients').text
>>> soup = BeautifulSoup(website_url,'lxml')
>>>
>>> my_table = soup.find('table',{'class':'wikitable collapsible collapsed'})
>>> links = my_table.findAll('a')
>>> Food = []
>>> for link in links:
...     Food.append(link.get('title'))
...
>>> print(Food)
["Cows' milk (page does not exist)", 'Buttermilk', 'Fortified milk (page does no
t exist)', 'Powdered milk', "Goats' milk", 'Malted milk', 'Hot chocolate', 'Yogu
rt', 'Milk pudding (page does not exist)', 'Custard', 'Ice cream', 'Ice milk', '
Cream', 'Cheese', 'Cheddar cheese', 'American cheese', 'Processed cheese', 'Egg
(food)', 'Scrambled', 'Omelet', 'Yolk']
>>> import pandas
>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df['Foods'] = Food
>>> print(df)
                                   Foods
0       Cows' milk (page does not exist)
1                             Buttermilk
2   Fortified milk (page does not exist)
3                          Powdered milk
4                            Goats' milk
5                            Malted milk
6                          Hot chocolate
7                                 Yogurt
8     Milk pudding (page does not exist)
9                                Custard
10                             Ice cream
11                              Ice milk
12                                 Cream
13                                Cheese
14                        Cheddar cheese
15                       American cheese
16                      Processed cheese
17                            Egg (food)
18                             Scrambled
19                                Omelet
20                                  Yolk
>>> df.describe()
              Foods
count            21
unique           21
top     Malted milk
freq              1
>>> df.apply(pd.Series.value_counts)
                                      Foods
Malted milk                               1
Ice milk                                  1
Omelet                                    1
Goats' milk                               1
Custard                                   1
Cheddar cheese                            1
American cheese                           1
Ice cream                                 1
Yolk                                      1
Cream                                     1
Cows' milk (page does not exist)          1
Yogurt                                    1
Fortified milk (page does not exist)      1
Egg (food)                                1
Powdered milk                             1
Milk pudding (page does not exist)        1
Cheese                                    1
Hot chocolate                             1
Buttermilk                                1
Processed cheese                          1
Scrambled                                 1

The last example is to show data with

>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> ts = pd.Series(np.random.randn(76), index=pd.date_range('1/1/76', periods=76
))
>>> ts.plot()

>>> plt.show()