analitics

Pages

Thursday, July 11, 2019

Python 3.7.3 : Three examples with BeautifulSoup.

Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree., see the pypi webpage.
This python module was created by Leonard Richardson.
A large definition can be this:
Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
This python module can do that but the input format and output format is different.
The input can be a webpage like an URL or webpage with all pieces of information and the output depends by the this and the user choices.
Les's see some examples:
First example show you how to take content of the first row table from a wikipedia webpage.
# get table from wikipedia 
import requests
from bs4 import BeautifulSoup
website_url = requests.get('https://en.wikipedia.org/w/index.php?title=Table_of_food_nutrients').text
soup = BeautifulSoup(website_url,'lxml')

my_table = soup.find('table',{'class':'wikitable collapsible collapsed'})
links = my_table.findAll('a')
Food = []
for link in links:
    Food.append(link.get('title'))

print(Food)
The next example takes all files from a page

# get links using the url
import urllib
from bs4 import BeautifulSoup
page = urllib.request.urlopen('http://____share.net/filmes/').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', href=True):
    print (anchor['href'])
The last example takes all images from the search query of imgur website:
# get images from imgur search query
import urllib
from bs4 import BeautifulSoup
url = 'https://imgur.com/search/score?q=cyborg'
with urllib.request.urlopen(url) as f:
    soup = BeautifulSoup(f.read(),'lxml')

a_tags = soup.findAll("a",{"class":"image-list-link"})
img_tags = [a.find("img") for a in a_tags]
print(img_tags)
srcs = []
for s in img_tags:
    src_tags=('http:'+s['src'])
    srcs.append(src_tags)

print(srcs)
As a conclusion, this module will pose problems for those who do not understand how to scroll through the source code, the content of web pages, how to read 'lxml', 'page', etc.
It will greatly help your Chrome F12 key to access parts of web content.

Python 3.7.3 : Testing the Bokeh python module.

This python module has a beautiful website:
Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
Let's install this python module with the pip tool:
C:\Python373>cd Scripts

C:\Python373\Scripts>pip install bokeh
Collecting bokeh
...
Successfully built bokeh
Installing collected packages: PyYAML, tornado, bokeh
Successfully installed PyYAML-5.1.1 bokeh-1.2.0 tornado-6.0.3
Let's test it with a simple example:
from bokeh.plotting import figure, output_file, show

output_file("test.html")
plot = figure()
plot.line([1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 7, 6, 1, 5, 6, 7, 9, 1], line_width=2)
show(plot)
This will create a file into my python folder C:/Python373/test.html.
The HTML webpage comes with the graph and additional tool like Pan, Box Zoom, Wheel Zoom, Save, Reset and help.
If you need sample data that is not included in the Bokeh GitHub repository or released packages then you need to download it.
>>> import bokeh.sampledata
>>> bokeh.sampledata.download()
Creating C:\Users\catafest\.bokeh directory
Creating C:\Users\catafest\.bokeh\data directory
Using data directory: C:\Users\catafest\.bokeh\data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3171836 bytes)
    229376 [  7.23%]
Bokeh comes with support for working with Geographical data: Mercator, Google Maps, GeoJSON Data.
You can see all example into the webpage gallery.