analitics

Pages

Saturday, December 21, 2019

Python 3.7.5 : Simple web search with google python package.

This is a simple search on the web with python google package.
[mythcat@desk ~]$ pip3 install google --user
Collecting google
...
Installing collected packages: google
Successfully installed google-2.0.3
This is a simple example for search on web with this words: protv news 2019.
From the python package, I need to import just the search and used it.
The python package need a variable string named query.
The search wants to know the words from query and arguments, see the help:
Help on function search in module googlesearch:

search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0, stop=None, 
domains=None, pause=2.0, tpe='', country='', extra_params=None, user_agent=None)
    Search the given query string using Google.
    
    :param str query: Query string. Must NOT be url-encoded.
    :param str tld: Top level domain.
    :param str lang: Language.
    :param str tbs: Time limits (i.e "qdr:h" => last hour,
        "qdr:d" => last 24 hours, "qdr:m" => last month).
    :param str safe: Safe search.
    :param int num: Number of results per page.
    :param int start: First result to retrieve.
    :param int stop: Last result to retrieve.
        Use None to keep searching forever.
    :param list domains: A list of web domains to constrain
        the search.
    :param float pause: Lapse to wait between HTTP requests.
        A lapse too long will make the search slow, but a lapse too short may
        cause Google to block your IP. Your mileage may vary!
    :param str tpe: Search type (images, videos, news, shopping, books, apps)
        Use the following values {videos: 'vid', images: 'isch',
        news: 'nws', shopping: 'shop', books: 'bks', applications: 'app'}
    :param str country: Country or region to focus the search on. Similar to
        changing the TLD, but does not yield exactly the same results.
        Only Google knows why...
    :param dict extra_params: A dictionary of extra HTTP GET
        parameters, which must be URL encoded. For example if you don't want
        Google to filter similar results you can set the extra_params to
        {'filter': '0'} which will append '&filter=0' to every query.
    :param str user_agent: User agent for the HTTP requests.
        Use None for the default.
    
    :rtype: generator of str
    :return: Generator (iterator) that yields found URLs.
        If the stop parameter is None the iterator will loop forever.
This is the script:
[mythcat@desk ~]$ python3
Python 3.7.5 (default, Dec 15 2019, 17:54:26) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from googlesearch import search
>>> query = "protv news 2019"
>>> my_results_list = []
>>> for url in search(query,        
...     tld = 'com',  
...     lang = 'en',       
...     num = 10,     
...     start = 0,    
...     stop = None,  
...     pause = 2.0,):
...     my_results_list.append(url)
...     print(url)
... 
https://stirileprotv.ro/protvnews/
https://stirileprotv.ro/
https://stirileprotv.ro/superbun/protv-news.html
https://www.facebook.com/ProTvNews/
https://www.youtube.com/playlist?list=PLCJaU-QvLGR_FSZw6yeqBJHDe9LgFDRdy
https://www.youtube.com/watch?v=HaiQtDlaNic
https://www.youtube.com/watch?v=hxMEgAANSl4
https://www.youtube.com/channel/UCbbDChpDluLkdnH8QMwN6qA
https://www.youtube.com/watch?v=5zuN9uWFcTE
https://protvplus.ro/tv-live/1-pro-tv
https://en.wikipedia.org/wiki/Pro_TV
https://pro-tv.com/news/page/2/
https://m.youtube.cat/channel/UCbbDChpDluLkdnH8QMwN6qA
...