python-catalin: requests

Showing posts with label requests. Show all posts

Tuesday, January 3, 2023

Python 3.10.2 : testing the NASA A.P.I. features.

In this tutorial I will show you how to deal with the NASA A.P.I. and python programming language.

This source code was build and tested yesterday.

This is the source code:

import requests
from datetime import date

today_data = date.today()
today = today_data.strftime("%d%m%Y")
import urllib.parse

# set your API key from nasa https://api.nasa.gov/#NHATS
api_key = "... your A.P.I. key ..."

# this is a simple example to get one day image 
base_url = "https://api.nasa.gov/planetary/apod"

# set the parameters for the API request
params = {
    "api_key": api_key
}

# the request to the API
response = requests.get(base_url, params=params)

# get data
if response.status_code == 200:
    # parse the response
    data = response.json()

    # print the image URL
    print(data["url"])
    # parse the URL
    parsed_url = urllib.parse.urlparse(data["url"])

    # extract the file name from the URL
    file_name = parsed_url.path.split("/")[-1]
    # save the image
    response_image = requests.get(data["url"])
    with open(today+'_'+file_name, "wb") as f:
        f.write(response_image.content)
else:
    # print the status code
    print(response.status_code)

I run the source code and I get these two images ...

...
01/03/2023  01:06 AM            86,943 03012023_AllPlanets_Tezel_1080_annotated.jpg
01/03/2023  04:22 PM           553,426 03012023_KembleCascade_Lease_960.jpg
...

Saturday, July 20, 2019

Python 3.7.3 : Use BeautifulSoup to parse Instagram account.

This example is a bit more complex because it parses the source code in a more particular way depending on it.
The basic idea of this script is to take the content of an Instagram account in the same way as a web browser.
For my account I found a parsing error, I guess the reason is using the points, see festila.george.catalin.

    scripts_content = json.loads(scripts[0].text.strip())
IndexError: list index out of range

In this case comment this line of code and will work:
For the other accounts I've tried, it works very well with the default script.
This is the script I used:

import requests
from bs4 import BeautifulSoup
import json
import re

from pprint import pprint

instagram_url = 'https://instagram.com'
#example user instagram profile_url = sherwoodseries
profile_url=str(input("name of the instagram user: "))


#UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f48b' in posit ion 5022: character maps to 
#fix write text file with  encoding='utf-8'
file1 = open("_shared_data.txt","w", encoding='utf-8') 

#profile_url = 'festila.george.catalin'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(f"{instagram_url}/{profile_url}", headers = headers)

if response.ok:
    html = response.text
    bs_html = BeautifulSoup(html, "html.parser")
    print(bs_html)
    # get info from ... type="application/ld+json">{"@context":"http:\/\/schema.org","@type":"Person","name":
    scripts = bs_html.select('script[type="application/ld+json"]')
    #scripts_content = json.loads(scripts[0].text.strip())
    #pprint(scripts_content)

    #print scripts_content like json 
    #print(json.dumps(scripts_content,indent = 4,sort_keys = True))

    #print just part of source code get by 'script' (0 .. n), see n = 6 
    #print(bs_html.find_all('script')[6])
    script_tag = bs_html.find('script', text=re.compile('window\._sharedData'))
    shared_data = script_tag.string.partition('=')[-1].strip(' ;')

    #get item from shared data, see "language_code":"en"
    rex_item  = re.compile('(?<=\"language_code\":\")[a-zA-Z_\- ]+(?=\")')
    rex_get_item = rex_item.findall(shared_data)  
    print(rex_get_item)
    #get url image from shared data
    rex_url  = re.compile('(?<=\"display_url\":\")[^\s\"]+(?=\")')
    rex_get_url = rex_url.findall(shared_data)  
    print(rex_get_url)
 
    # load like a json 
    result_json = json.loads(shared_data)
    pprint(result_json)
    
    data = bs_html.find_all('meta', attrs={'property': 'og:description'})
    bb = data[0].get('content').split()
    user = '%s %s %s' % (bb[-3], bb[-2], bb[-1])
    # get from bb parts 
    posts = bb[4]
    print('all string: ',bb)
    print('number of posts: ',posts)
    print('name and the user: ',user)

    # write any output show by print into _a.txt file, see example
    #file1.write(str(bs_html.find_all('script')[4]))
    #example: write to _shared_data.txt file the shared_data
    #file1.write(str(shared_data))
#after write, close the file 
#file1.close()

This is a part of the output for sherwoodseries account:

...
all string:  ['95', 'Followers,', '24', 'Following,', '56', 'Posts', '-', 'See',
 'Instagram', 'photos', 'and', 'videos', 'from', 'Sherwood', 'Series', '(@sherwo
odseries)']
number of posts:  56
name and the user:  Sherwood Series (@sherwoodseries)

Thursday, July 11, 2019

Python 3.7.3 : Three examples with BeautifulSoup.

Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree., see the pypi webpage.
This python module was created by Leonard Richardson.
A large definition can be this:
Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
This python module can do that but the input format and output format is different.
The input can be a webpage like an URL or webpage with all pieces of information and the output depends by the this and the user choices.
Les's see some examples:
First example show you how to take content of the first row table from a wikipedia webpage.

# get table from wikipedia 
import requests
from bs4 import BeautifulSoup
website_url = requests.get('https://en.wikipedia.org/w/index.php?title=Table_of_food_nutrients').text
soup = BeautifulSoup(website_url,'lxml')

my_table = soup.find('table',{'class':'wikitable collapsible collapsed'})
links = my_table.findAll('a')
Food = []
for link in links:
    Food.append(link.get('title'))

print(Food)

The next example takes all files from a page


# get links using the url
import urllib
from bs4 import BeautifulSoup
page = urllib.request.urlopen('http://____share.net/filmes/').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', href=True):
    print (anchor['href'])

The last example takes all images from the search query of imgur website:

# get images from imgur search query
import urllib
from bs4 import BeautifulSoup
url = 'https://imgur.com/search/score?q=cyborg'
with urllib.request.urlopen(url) as f:
    soup = BeautifulSoup(f.read(),'lxml')

a_tags = soup.findAll("a",{"class":"image-list-link"})
img_tags = [a.find("img") for a in a_tags]
print(img_tags)
srcs = []
for s in img_tags:
    src_tags=('http:'+s['src'])
    srcs.append(src_tags)

print(srcs)

As a conclusion, this module will pose problems for those who do not understand how to scroll through the source code, the content of web pages, how to read 'lxml', 'page', etc.
It will greatly help your Chrome F12 key to access parts of web content.

Friday, September 28, 2018

Python 2.7 : Python geocoding without key.

Today I will come with a simple example about geocoding.
I used JSON and requests python modules and python version 2.7.
About geocoding I use this service provide by datasciencetoolkit.
You can use this service free and you don't need to register to get a key.
Let's see the python script:

import requests
import json

url = u'http://www.datasciencetoolkit.org/maps/api/geocode/json'
par = {
    u'sensor': False,
    u'address': u'London'
}

my = requests.get(
    url,
    par
)
json_out = json.loads(my.text)

if json_out['status'] == 'OK':
    print([r['geometry']['location'] for r in json_out['results']])

I run this script and I test with google map to see if this works well.
This is output and working well with the geocoding service:

python-catalin

analitics

Pages