The official documentation of this python module tells us:
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
The official webpage can be found here.
This python module is one of the most popular Python libraries for Data Science and Analytics.
You can install this python module with pip tool:
C:\Python373\Scripts>pip install pandas
Requirement already satisfied: pandas in c:\python373\lib\site-packages (0.24.2)
You can find many tutorials on web with this python module. Today I will show you a short tutorial about this python module.
Most users use this both python modules:
import numpy as np
import pandas as pd
Most area of the pandas python module has a target into this list: Window Functions, Aggregations, Missing Data, GroupBy, Merging/Joining, Concatenation, Date Functionality, Timedelta, Categorical Data,
Visualization, IO Tools, Sparse Data, Caveats & Gotchas, Comparison with SQL
There are two types of data structures in pandas: Series and DataFrames.
The pandas Series is a one-dimensional data structure.
The pandas DataFrame is a two (or more) dimensional data structure, like a table
Pandas provide few variants rolling, expanding and exponentially moving weights for window statistics.
Also have the sum, mean, median, variance, covariance, correlation, etc.
The several methods are available to perform aggregations on data.
Pandas provide functions for missing data like the isnull() and notnull().
Let's test the DataFrames with pandas and the Wikipedia example from my tutorial
C:\Python373>python.exe
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> # get table from wikipedia
... import requests
>>> from bs4 import BeautifulSoup
>>> website_url = requests.get('https://en.wikipedia.org/w/index.php?title=Table
_of_food_nutrients').text
>>> soup = BeautifulSoup(website_url,'lxml')
>>>
>>> my_table = soup.find('table',{'class':'wikitable collapsible collapsed'})
>>> links = my_table.findAll('a')
>>> Food = []
>>> for link in links:
... Food.append(link.get('title'))
...
>>> print(Food)
["Cows' milk (page does not exist)", 'Buttermilk', 'Fortified milk (page does no
t exist)', 'Powdered milk', "Goats' milk", 'Malted milk', 'Hot chocolate', 'Yogu
rt', 'Milk pudding (page does not exist)', 'Custard', 'Ice cream', 'Ice milk', '
Cream', 'Cheese', 'Cheddar cheese', 'American cheese', 'Processed cheese', 'Egg
(food)', 'Scrambled', 'Omelet', 'Yolk']
>>> import pandas
>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df['Foods'] = Food
>>> print(df)
Foods
0 Cows' milk (page does not exist)
1 Buttermilk
2 Fortified milk (page does not exist)
3 Powdered milk
4 Goats' milk
5 Malted milk
6 Hot chocolate
7 Yogurt
8 Milk pudding (page does not exist)
9 Custard
10 Ice cream
11 Ice milk
12 Cream
13 Cheese
14 Cheddar cheese
15 American cheese
16 Processed cheese
17 Egg (food)
18 Scrambled
19 Omelet
20 Yolk
>>> df.describe()
Foods
count 21
unique 21
top Malted milk
freq 1
>>> df.apply(pd.Series.value_counts)
Foods
Malted milk 1
Ice milk 1
Omelet 1
Goats' milk 1
Custard 1
Cheddar cheese 1
American cheese 1
Ice cream 1
Yolk 1
Cream 1
Cows' milk (page does not exist) 1
Yogurt 1
Fortified milk (page does not exist) 1
Egg (food) 1
Powdered milk 1
Milk pudding (page does not exist) 1
Cheese 1
Hot chocolate 1
Buttermilk 1
Processed cheese 1
Scrambled 1
The last example is to show data with >>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> ts = pd.Series(np.random.randn(76), index=pd.date_range('1/1/76', periods=76
))
>>> ts.plot()
>>> plt.show()