analitics

Pages

Thursday, June 20, 2019

Python 3.7.3 : Read and save RSS data from goodreads website.

Today I will show you how to parse data from www.goodreads.com using the feedparser and save all into CSV file.
The Goodreads website comes with hundreds of great book recommendations from fellow readers, beloved authors, and let you add your favorite books.
The main goal was to have a structured link from the RSS file and the CSV file.
This issue was solve with arrays for each type of data.
First, let's install the feedparser python module with the pip tool:
C:\Python373\Scripts>pip install feedparser
Collecting feedparser
...
Successfully built feedparser
Installing collected packages: feedparser
Successfully installed feedparser-5.2.1
You need to get the RSS link with your books from your account.
The example is simple and has commented lines to understand easily how can I solve this issue.
This is the source code for reading all RSS data and put on the CSV file:
import feedparser
import csv
bookread_rss = "your RSS with data account"
feeds = feedparser.parse(bookread_rss)
print ("aditional RSS data")
print (feeds['feed']['title'])
print (feeds['feed']['link'])
print (feeds.feed.subtitle)
print (len(feeds['entries']))
print (feeds.version)
print (feeds.headers)
print (feeds.headers.get('content-type'))
print ("read RSS items")
# empty arrays for values by type
dates = []
titles = []
authors = []
links = []
pages =[]

# create the name of the CSV file
file_csv = 'my_goodreads_books.csv'

# prepare the CSV file with fix for error
# UnicodeEncodeError: 'charmap' codec can't encode character '\u0435' in position
# 30: character maps to 
cvs_out = csv.writer(open(file_csv, 'w',newline='',encoding="utf-8"))

#print(feeds)
for post in feeds.entries:
    date = "%d/%02d/%02d" % (post.published_parsed.tm_year,\
        post.published_parsed.tm_mon, \
        post.published_parsed.tm_mday,)
    # uncomment and will print on console
    #print("___")
    #print("post date: " + date)
    #print("post title: " + post.title)
    #print("post author: " + post.author_name)
    #print("post link: " + post.link)
    #print("post pages: " + post.num_pages)

    dates.append(date)
    titles.append(post.title)
    authors.append(post.author_name)
    links.append(post.link)
    pages.append(post.num_pages)

for d,t,a,l,p in zip(dates,titles,authors,links,pages):
    cvs_out.writerow((d,t,a,l,p))
The result will print you some info, see my example:
C:\Python373>python bookreader_rss_001.py
aditional RSS data
Catalin's bookshelf: all
https://www.goodreads.com/review/list_rss/52019632?key=pyfTLqvJXpg-_ghi4a6ZTZfJV
gLVXC8TcWyaBSyoiScgfXq3&shelf=%23ALL%23
Catalin's bookshelf: all
100
rss20
{'Server': 'Server', 'Date': 'Thu, 20 Jun 2019 12:22:55 GMT', 'Content-Type': 'a
pplication/xml; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'c
lose', 'Status': '200 OK', 'X-Frame-Options': 'ALLOWALL', 'X-XSS-Protection': '1
...
All date will be put into the my_goodreads_books.csv file.

Tuesday, June 18, 2019

Python 3.7.3 : Stemming with nltk.

Today I will start another tutorial about nltk python module and stemming.
The stemming is the process of producing morphological variants of a root/base word.
Stemming programs are commonly referred to as stemming algorithms or stemmers to reduces the words.
Errors in Stemming can be overstemming and understemming.
These two words are stemmed to the same root that are of different stems then the term is overstemming.
When two words are stemmed to same root that are not of different stems then the term used is understemming.
Applications of stemming are used in information retrieval systems like search engines or is used to determine domain vocabularies in domain analysis.
Let install this python module named nltk with pip tool:
C:\Python373\Scripts>pip install nltk
Collecting nltk
...
Successfully installed nltk-3.4.1 six-1.12.0
The nltk python module work with human language data for applying in statistical natural language processing (NLP).
It contains text processing libraries for tokenization, parsing, classification, stemming, tagging, graphical demonstrations, sample data sets, and semantic reasoning.
The next step is to download the models and data, see more at this official webpage.
First run this lines of code to update the nltk python module.
import nltk
nltk.download()
Let's test a simple implementation of stemming words using nltk python module:
from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize 
   
my_porter = PorterStemmer() 
   
quote = "Deep in the human unconscious is a pervasive need for a logical universe that makes sense."

words = word_tokenize(quote) 
   
for w in words: 
    print(w, " : ", my_porter.stem(w))
The result is something like this:
C:\Users\catafest>python stemming_001.py
Deep  :  deep
in  :  in
the  :  the
human  :  human
unconscious  :  unconsci
is  :  is
a  :  a
pervasive  :  pervas
need  :  need
for  :  for
a  :  a
logical  :  logic
universe  :  univers
that  :  that
makes  :  make
sense  :  sens
.  :  .

C:\Users\catafest>
You can read more about the stemming at Wikipedia.

Python 3.7.3 : Using getters and setters in object-oriented.

The main purpose of using getters and setters in object-oriented programs is to ensure data encapsulation.
Let's start with a simple example.
I created a class named my_class init with one variable named my_variable:
self._my_variable = my_variable
A new, initialized instance can be obtained by this line of code:
test_it = my_class()
The example use getter and setter methods to use this variable.
C:\Users\catafest>python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> class my_class:
...     def __init__(self, my_variable = 0):
...          self._my_variable = my_variable
...
...     # getter method
...     def get_my_variable(self):
...         return self._my_variable
...
...     # setter method
...     def set_my_variable(self, x):
...         self._my_variable = x
...
>>> test_it = my_class()
>>> test_it.set_my_variable(1976)
>>> print(test_it.get_my_variable())
1976
>>> print(test_it._my_variable)
1976
In Python property() is a built-in function that creates and returns a property object.
Python has four arguments property: fget, fset, fdel, doc.
  • fget is a function for retrieving an attribute value;
  • fset is a fuction for setting an attribute value;
  • fdel is a function for deleting an attribute value;
  • doc creates a docstring for attribute.
This line of code will setting the my_variable using setter:
test_it.set_my_variable(1976)
This line of code will retrieving my_variable using getter:
print(test_it.get_my_variable())
A property object has three methods, getter(), setter(), and delete() to specify fget, fset and fdel individually.
How my example changes it:
>>> class my_class:
...      def __init__(self):
...           self._my_variable = 0
...
...      # function to get value of _my_variable
...      def get_my_variable(self):
...          print("getter method called")
...          return self._my_variable
...
...      # function to set value of _my_variable
...      def set_my_variable(self, a):
...          print("setter method called")
...          self._my_variable = a
...
...      # function to delete _my_variable attribute
...      def del_my_variable(self):
...          del self._my_variable
...
...      my_variable = property(get_my_variable, set_my_variable, del_my_variabl
e)
...
>>> test_it = my_class()
>>> test_it.my_variable = 1976
setter method called
>>> print(test_it.my_variable)
getter method called
1976
Using decorator with python @property is one of the built-in decorators.
The main purpose of any decorator is to change your class methods or attributes.
The user of your class no need to make any change in their code.
This is the final result:
>>> class my_class:
...      def __init__(self):
...           self._my_variable = 0
...
...      # using property decorator
...      # a getter function
...      @property
...      def my_variable(self):
...          print("getter method called")
...          return self._my_variable
...
...      # a setter function
...      @my_variable.setter
...      def my_variable(self, my_out):
...          if(my_out < 1976):
...             raise ValueError("... this my_variable has a criteria!!")
...          print("setter method called")
...          self._my_variable = my_out
...
>>> test_it = my_class()
>>> test_it.my_variable = 1979
setter method called
>>> print(test_it.my_variable)
getter method called
1979
>>> test_it.my_variable = 1975
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 16, in my_variable
ValueError: ... this my_variable has a criteria!!
This last example show you how to use @property decorator to create getters and setters in pythonic way.

Monday, June 17, 2019

Python Qt5 : the most simple QTreeWidget - part 002.

This tutorial uses PyQt5 and Python version 3.7.3.
Let's install the PyQt5 python module with the pip tool:
C:\Python373\Scripts>pip install PyQt5
Collecting PyQt5
...
Successfully installed PyQt5-5.12.2 PyQt5-sip-4.19.17
Let's see one simple example with comments about how to use QTreeWidget.
import sys
from PyQt5.QtWidgets import QTreeWidget, QTreeWidgetItem, QApplication, QWidget

if __name__ == '__main__':
    # create a empty my_app application
    my_app = ''
    # test this my_app to create instance
    if QApplication.instance() != None:
        my_app = QApplication.instance()
    else:
        my_app = QApplication(sys.argv)
    # create a QTreeWidgetItem with tree columns
    my_tree= QTreeWidgetItem(["Column A", "Column B", "Column C"])
    # add date using a for loop 
    for i in range(6):
        list_item_row = QTreeWidgetItem(["Child A-" + str(i), "Child B-" + str(i), "Child C-" + str(i)])
        my_tree.addChild(list_item_row)
    # create my_widget widget
    my_widget = QWidget()
    my_widget.resize(640, 180)
    # create a QTreeWidget named my_tree_widget 
    my_tree_widget = QTreeWidget(my_widget)
    # set the size
    my_tree_widget.resize(640, 180)
    # set the number of columns 
    my_tree_widget.setColumnCount(3)
    # add labels for each column 
    my_tree_widget.setHeaderLabels(["Column A label", "Column B label", "Column C label"])
    # add my_tree using addTopLevelItem
    my_tree_widget.addTopLevelItem(my_tree)
    # show the widget
    my_widget.show()
    # the exit of my_app
    sys.exit(my_app.exec_())
This is another simple example written in a simple way to show how versatile are Python and PyQt5.
import sys
from PyQt5.QtWidgets import *
from PyQt5.QtWidgets import QApplication, QWidget, QVBoxLayout, QTreeWidget, QTreeWidgetItem
 
my_app = QApplication(sys.argv)
my_window = QWidget()
my_layout = QVBoxLayout(my_window)
 
my_tree = QTreeWidget()
my_tree.setHeaderLabels(['Name', 'Cost ($)'])
my_item_root = QTreeWidgetItem(my_tree, ['Romania', '238,397 kmp'])
my_item_raw = QTreeWidgetItem(my_item_root, ['Black Sea', '436,402 kmp'])
 
my_layout.addWidget(my_tree)
my_window.show()
sys.exit(my_app.exec_())
If you like my simple tutorials then you subscribe or you can search my other web sites too.

Sunday, June 16, 2019

Python 3.7.3 : Using the pycryptodome python module.

This python module can be used with python 3.
More information can be found here.
PyCryptodome is a self-contained Python package of low-level cryptographic primitives.
It supports Python 2.6 and 2.7, Python 3.4 and newer, and PyPy.

The install of this python module is easy with pip tool:
C:\Python373\Scripts>pip install pycryptodome
Collecting pycryptodome
...
Installing collected packages: pycryptodome
Successfully installed pycryptodome-3.8.2
All packages from this python module are:
  • Cipher;
  • Signature;
  • Hash;
  • Publickey;
  • Protocol;
  • IO;
  • Random and Util;
Let start with a few examples:
C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from Crypto.Random import get_random_bytes
>>> key16 = get_random_bytes(16) # 16 bytes * 8 = 128 bits (1 byte = 8 bits)
The python module comes with features for encryption and dencryption, like RSA, AES, DES and many options.
Another good feature is KDF with PBKDF2 function:
>>> from Crypto.Protocol.KDF import PBKDF2
>>> center = b'go'
>>> password = 'go'
>>> output = PBKDF2(password, center, dkLen=64)
>>> print(output)
b'E\x0b\x91\x87\xde\xb2E\xe5Gv\x03\x86fVe8\x1e%\xb5l\xa0\xdb\xbfI\x01\xb5\xdf\x8
1\xad\x82@\x00\xacr\xc7\xa26\xc6\xe92\x1e\xf8\xe9\x0b\x9e\x93\x1dj\x1c\xff\x1c4\
xc2\x0e6\xc2\x8eYc2N\x995\x87'
>>>
If you are a fan of encryption and decryption math, then this module will enable you to use it.
By studying this module you can give a more in-depth view of what exists in this field and what does not exist.

Saturday, June 8, 2019

Python 3.7.3 : Testing the PyX python module.

This python module has a good documenation with many examples, see the eofficial wepage.
The development team come with this intro:
Summary
PyX is a Python package for the creation of PostScript, PDF, and SVG files. It combines an abstraction of the PostScript drawing model with a TeX/LaTeX interface. Complex tasks like 2d and 3d plots in publication-ready quality are built out of these primitives.
Features
  • PostScript, PDF, and SVG output for device independent, freely scalable figures;
  • seamless TeX/LaTeX integration;
  • full access to PostScript features like paths, linestyles, fill patterns, transformations, clipping, bitmap inclusion, etc.;
  • advanced geometric operations on paths like intersections, transformations, splitting, smoothing, etc.;
  • sophisticated graph generation: modular design, pluggable axes, axes partitioning based on rational number arithmetics, flexible graph styles, etc.
Let's install this python module with pip tool:
C:\Python373\Scripts>pip install PyX
...
Installing collected packages: PyX
Successfully installed PyX-0.14.1
I try few examples from official webpage and working well.
>>> from pyx import *
>>>
>>> c = canvas.canvas()
>>> c.stroke(path.line(0, 0, 3, 0))
>>> c.stroke(path.rect(0, 1, 1, 1))
>>> c.fill(path.circle(2.5, 1.5, 0.5))
>>> c.writeEPSfile("path")
>>> c.writePDFfile("path")
>>> c.writeSVGfile("path")
>>> from pyx import *
>>>
>>> c = canvas.canvas()
>>> c.stroke(path.curve(0, 0, 0, 4, 2, 4, 3, 3),
...          [style.linewidth.THICK, style.linestyle.dashed, color.rgb.blue,
...           deco.earrow([deco.stroked([color.rgb.red, style.linejoin.round]),
...                        deco.filled([color.rgb.green])], size=1)])
>>> c.writeEPSfile("arrow")
>>> c.writePDFfile("arrow")
>>> c.writeSVGfile("arrow")
I got some error from writePDFfile with this example:
>>> c.writeEPSfile("textalongpath")
>>> c.writePDFfile("textalongpath")
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Python373\lib\site-packages\pyx\canvas.py", line 50, in wrappedindocu
ment
    return method(d, file, **write_kwargs)
  File "C:\Python373\lib\site-packages\pyx\document.py", line 193, in writePDFfi
le
    pdfwriter.PDFwriter(self, f, **kwargs)
  File "C:\Python373\lib\site-packages\pyx\pdfwriter.py", line 321, in __init__
    registry.write(file, self, catalog)
  File "C:\Python373\lib\site-packages\pyx\pdfwriter.py", line 78, in write
    object.write(file, writer, self)
  File "C:\Python373\lib\site-packages\pyx\pdfwriter.py", line 248, in write
    file.write("/MediaBox [%f %f %f %f]\n" % self.PDFcontent.bbox.highrestuple_p
t())
  File "C:\Python373\lib\site-packages\pyx\bbox.py", line 112, in highrestuple_p
t
    raise ValueError("Cannot return high-res tuple for empty bbox")
ValueError: Cannot return high-res tuple for empty bbox
>>> c.writeSVGfile("textalongpath")
This python module has problem with some complex example from official webpage.

Monday, June 3, 2019

Python 3.7.3 : Working with wikipedia python module.

Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia.
Let's install it:
C:\Python373\Scripts>pip install wikipedia
First, let's test the default example:
C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import wikipedia
>>> print(wikipedia.summary("Wikipedia"))
...
.Wikipedia has been criticized for exhibiting systemic bias, for presenting a m
ixture of "truths, half truths, and some falsehoods", and for being subject to m
anipulation and spin in controversial topics. But by 2017, Facebook announced th
at it would help readers detect fake news by suggesting links to related Wikiped
ia articles. YouTube announced a similar plan in 2018.
>>> wikipedia.search("Falticeni")
['Falticeni', 'Foresta Falticeni', 'Charles, Prince of Wales', 'Sofia Ionescu',
'Buciumeni River (?omuzul Mare)', '?omuzul Mare River', 'Constantin Schumacher',
'Ionu? Atodiresei', '1967 Cupa României Final', 'J. J. Benjamin']
>>> wikipedia.page("Falticeni")
...
>>> city=wikipedia.page("Falticeni")
>>> city.title
...
>>> city.content
...
>>> wikipedia.set_lang("fr")
>>> page=wikipedia.page("Null")
>>> page.title
'Null' 
You can extract links:
>>> page = wikipedia.page("List_of_works_by_Leonardo_da_Vinci")
>>> print(page.links)
You can test all of these:
>>> dir(wikipedia)
['API_URL', 'BeautifulSoup', 'Decimal', 'DisambiguationError', 'HTTPTimeoutError
', 'ODD_ERROR_MESSAGE', 'PageError', 'RATE_LIMIT', 'RATE_LIMIT_LAST_CALL', 'RATE
_LIMIT_MIN_WAIT', 'RedirectError', 'USER_AGENT', 'WikipediaException', 'Wikipedi
aPage', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__na
me__', '__package__', '__path__', '__spec__', '__version__', 'cache', 'datetime'
, 'debug', 'donate', 'exceptions', 'geosearch', 'languages', 'page', 'random', '
re', 'requests', 'search', 'set_lang', 'set_rate_limiting', 'set_user_agent', 's
tdout_encode', 'suggest', 'summary', 'sys', 'time', 'timedelta', 'unicode_litera
ls', 'util', 'wikipedia']
Read more at pypi website.