analitics

Pages

Thursday, February 14, 2019

Using python with documents files.

Today I tested with python version 3.6.4 two python modules: python-docx and openpyxl.
This python modules let us to deal with document files like: docx, xlsx, xlsm, xltx, xltm.
First python module named python-docx is a Python library for creating and updating Microsoft Word (.docx) files.
The documentation of this python module can be found here.
Let's start with a simple example.
C:\Python364>cd Scripts
C:\Python364\Scripts>pip3.6.exe install python-docx
Collecting python-docx
...
Successfully installed python-docx-0.8.10
Let's start with the import step:
C:\Python364>python.exe
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import docx
>>> dir(docx)
['Document', 'ImagePart', 'RT', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '
__name__', '__package__', '__path__', '__spec__', '__version__', 'api', 'blkcntnr', 'compat', 'dml',
 'document', 'enum', 'exceptions', 'image', 'opc', 'oxml', 'package', 'parts', 'section', 'settings'
, 'shape', 'shared', 'styles', 'text']
Let's create a document with this python module and add some text and an image:
C:\Python364>python.exe
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import docx
>>> mydoc = docx.Document()
>>> mydoc.add_paragraph('This is a text')
>>> mydoc.add_picture('icon.png',width=docx.shared.Inches(1),height=docx.shared.Inches(1))
>>> mydoc.save('test.docx')
Another python module is openpyxl.
This python module let you to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.
C:\Python364\Scripts>pip3.6.exe install openpyxl
Collecting openpyxl
...
Successfully installed et-xmlfile-1.0.1 jdcal-1.4 openpyxl-2.6.0
Let's test it:
C:\Python364>python.exe
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from openpyxl import load_workbook
>>> w = load_workbook(filename='test.xlsx',read_only=True)
>>> print(w.sheetnames)
['Sheet1', 'TestSheet2']
>>> s=w['Sheet1']
>>> for row in s.rows:
...     for c in row:
...             print(c.value)
...
A1
None
ABC
None
None
NOP