python-catalin: kaggle

Showing posts with label kaggle. Show all posts

Sunday, December 8, 2019

Python 3.7.5 : Starting with kaggle platform.

Kaggle is the world's largest community of data scientists and the platform is the fastest way to get started on a new data science project.
A good choice to use Kaggle is this feature: Kaggle provides free access to NVidia K80 GPUs in kernels.
The tutorial for today is about kaggle and is new for me because I hear about this opportunity last year.
This platform that hosts data science and machine learning competitions can give the people a good area for development.
The official blog can tell you more about how can this platform works.
The kaggle A.P.I. can be found at GitHub.
Let's start the tutorial with the pip3 install tool:

[mythcat@desk kaggle]$ pip3 install kaggle --upgrade --user
Collecting kaggle
...
[mythcat@desk kaggle]$ mkdir ~/.kaggle/
[mythcat@desk kaggle]$ mv kaggle.json ~/.kaggle/kaggle.json
[mythcat@desk kaggle]$ kaggle 
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can

 run 'chmod 600 /home/mythcat/.kaggle/kaggle.json'
usage: kaggle [-h] [-v] {competitions,c,datasets,d,kernels,k,config} ...
kaggle: error: the following arguments are required: command
[mythcat@desk kaggle]$ chmod 600 /home/mythcat/.kaggle/kaggle.json

You can use the kaggle command to get informations from kaggle platform:

[mythcat@desk kaggle]$ kaggle competitions list 
ref                                            deadline             category            reward  teamCount
  userHasEntered  
---------------------------------------------  -------------------  ---------------  ---------  ---------
  --------------  
digit-recognizer                               2030-01-01 00:00:00  Getting Started  Knowledge       2305
           False  
titanic                                        2030-01-01 00:00:00  Getting Started  Knowledge      17135
           False  
house-prices-advanced-regression-techniques    2030-01-01 00:00:00  Getting Started  Knowledge       5532
           False  
imagenet-object-localization-challenge         2029-12-31 07:00:00  Research         Knowledge         57
           False  
google-quest-challenge                         2020-02-10 23:59:00  Featured           $25,000        410
           False  
tensorflow2-question-answering                 2020-01-22 23:59:00  Featured           $50,000        810
           False  
data-science-bowl-2019                         2020-01-22 23:59:00  Featured          $160,000       1762
           False  
pku-autonomous-driving   
...

The commands for this platform can be seen at GitHub:

kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init}
kaggle kernels {list, init, push, pull, output, status}
kaggle config {view, set, unset}

The kaggle platform use an online tool with notebooks similar with Jupyter notebook for process coding in Kernels.
The process of using models need the datasets and you can use the kaggle datasets.
You can use the New Notebook button from that page to start using the datasets.
The datasets can be load from kaggle or can be uploaded.
The next window let you to Select new notebook settings.
I used Python with a notebook and I got the default source code:

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input

 directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

Now, I can read the dataset shown in the right the Data with input
(read-only data) and output from kaggle with pandas module:

data = pd.read_csv("../input/lego-database/colors.csv")
data.head()

The Commit button let you to save your work for later.
You can see my online test I created with dataset Lego and python on my kaggle page.

python-catalin

analitics

Pages

Sunday, December 8, 2019

Python 3.7.5 : Starting with kaggle platform.