python-catalin: The nltk python module

Tuesday, May 2, 2017

The nltk python module - part 001.

About nltk python module.
NLTK is a leading platform for building Python programs to work with human language data. The base of this issue is about Natural Language Processing techniques to analyze text like a processing of human language data. You can read the NLTK 3.0 documentation from here.
How to install nltk python module under Windows 10 and Fedora 26 distro.
Install under Windows 10, by using the pip command:

C:\Python27\Scripts>pip install --trusted-host pypi.python.org nltk
Collecting nltk
Downloading nltk-3.2.2.tar.gz (1.2MB)
100% |################################| 1.2MB 2.6MB/s
Requirement already satisfied: six in c:\python27\lib\site-packages (from nltk)
Building wheels for collected packages: nltk
...
Successfully built nltk
Installing collected packages: nltk
Successfully installed nltk-3.2.2

Download all packages into your Windows 10 with this python source code:

C:\Python27>python
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
True

Under Linux you can install by using the pip command, I used Fedora 26 distro:

[root@localhost mythcat]# pip install nltk
WARNING: Running pip install with root privileges is generally not a good idea.
 Try `pip install --user` instead.
Collecting nltk
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken
 by 'ProtocolError('Connection aborted.', error(104, 'Connection reset by peer'))': /simple/nltk/
  Downloading nltk-3.2.2.tar.gz (1.2MB)
    100% |████████████████████████████████| 1.2MB 1.1MB/s 
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from nltk)
Installing collected packages: nltk
  Running setup.py install for nltk ... done
Successfully installed nltk-3.2.2

Download all packages into your Fedora 26 distro with this python source code:

[mythcat@localhost ~]$ python 
Python 2.7.13 (default, Feb 21 2017, 12:00:39) 
[GCC 7.0.1 20170219 (Red Hat 7.0.1-0.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> l
Packages:
  [ ] abc................. Australian Broadcasting Commission 2006
  [ ] alpino.............. Alpino Dutch Treebank
...
Collections:
  [ ] all-corpora......... All the corpora
  [ ] all................. All packages
  [ ] book................ Everything used in the NLTK Book

([*] marks installed packages)

Download which package (l=list; x=cancel)?
  Identifier> all
    Downloading collection u'all'
       | 
       | Downloading package abc to /home/mythcat/nltk_data...
       |   Unzipping corpora/abc.zip.
       | Downloading package alpino to /home/mythcat/nltk_data...
       |   Unzipping corpora/alpino.zip.
       | Downloading package biocreative_ppi to
...

Let's start with a simple example by show sample example books:


>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
>>> ...

The next example let you import books from the sample area and use it:

#function count the word in the Text
>>> print text1.count("white")
191
# function concordance view shows us every occurrence of a given word, together with some context.
>>> print text3.concordance("white")
Displaying 5 of 5 matches:
potted , and every one that had some white in it , and all the brown among the 
 hazel and chesnut tree ; and pilled white strakes in them , and made the white
white strakes in them , and made the white appear which was in the rods . And h
y dream , and , behold , I had three white baskets on my he And in the uppermos
all be red with wine , and his teeth white with milk . Zebulun shall dwell at t
None
#function similar to the name of the text
>>> print text3.similar("white")
None
>>> print text3.similar("got")
named set arrayed bound brought see embraced kissed slew unto curse
built shewed laid digged sent gave offer offered blessed
None
#contexts are shared by two or more words
>>> text3.common_contexts(["white","blue"])
(u'The following word(s) were not found:', u'white blue')
>>> text3.common_contexts(["man","men"])
old_of the_and the_said the_that the_took young_and the_s

This is all for today.

python-catalin

analitics

Pages

Tuesday, May 2, 2017

The nltk python module - part 001.