Kaggle is the world's largest community of data scientists and the platform is the fastest way to get started on a new data science project.
A good choice to use Kaggle is this feature: Kaggle provides free access to NVidia K80 GPUs in kernels.
The tutorial for today is about kaggle and is new for me because I hear about this opportunity last year.
This platform that hosts data science and machine learning competitions can give the people a good area for development.
The official blog can tell you more about how can this platform works.
The kaggle A.P.I. can be found at
GitHub.
Let's start the tutorial with the pip3 install tool:
[mythcat@desk kaggle]$ pip3 install kaggle --upgrade --user
Collecting kaggle
...
[mythcat@desk kaggle]$ mkdir ~/.kaggle/
[mythcat@desk kaggle]$ mv kaggle.json ~/.kaggle/kaggle.json
[mythcat@desk kaggle]$ kaggle
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can
run 'chmod 600 /home/mythcat/.kaggle/kaggle.json'
usage: kaggle [-h] [-v] {competitions,c,datasets,d,kernels,k,config} ...
kaggle: error: the following arguments are required: command
[mythcat@desk kaggle]$ chmod 600 /home/mythcat/.kaggle/kaggle.json
You can use the kaggle command to get informations from kaggle platform:
[mythcat@desk kaggle]$ kaggle competitions list
ref deadline category reward teamCount
userHasEntered
--------------------------------------------- ------------------- --------------- --------- ---------
--------------
digit-recognizer 2030-01-01 00:00:00 Getting Started Knowledge 2305
False
titanic 2030-01-01 00:00:00 Getting Started Knowledge 17135
False
house-prices-advanced-regression-techniques 2030-01-01 00:00:00 Getting Started Knowledge 5532
False
imagenet-object-localization-challenge 2029-12-31 07:00:00 Research Knowledge 57
False
google-quest-challenge 2020-02-10 23:59:00 Featured $25,000 410
False
tensorflow2-question-answering 2020-01-22 23:59:00 Featured $50,000 810
False
data-science-bowl-2019 2020-01-22 23:59:00 Featured $160,000 1762
False
pku-autonomous-driving
...
The commands for this platform can be seen at GitHub:
kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init}
kaggle kernels {list, init, push, pull, output, status}
kaggle config {view, set, unset}
The kaggle platform use an online tool with
notebooks similar with Jupyter notebook for process
coding in Kernels.
The process of using models need the datasets and you can use
the kaggle datasets.
You can use the
New Notebook button from that page to start using the datasets.
The datasets can be load from kaggle or can be uploaded.
The next window let you to
Select new notebook settings.
I used Python with a notebook and I got the default source code:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input
directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# Any results you write to the current directory are saved as output.
Now, I can read the dataset shown in the right the Data with
input
(read-only data) and
output from kaggle with pandas module:
data = pd.read_csv("../input/lego-database/colors.csv")
data.head()
The
Commit button let you to save your work for later.
You can see my online test I created with dataset
Lego and python on
my kaggle page.