What are the essential operations in Pandas?
Data analysts spend a significant amount of time cleaning and preparing data sets to work on. They must possess the necessary tools and ability to work with messy data sets, missing values, inconsistencies, and ambiguous data.
Pandas has certain essential operations that data analysts need to use to interact with the data stored in Series and DataFrame. These operations allow data analysts to get data into a workable form before the data analysis.
Reindexing
A necessary operation that we perform on the Pandas data structure is reindexing, which means creating a new object and rearranging the data in the Pandas data structure, conforming to the new index.
While doing so, if data is not present for some index in the original data, missing values are added, corresponding to those indexes.
Code:
a = Series(np.random.randn(10), index=['a','b','c','d','e','f','g','h','i','j'])
a
Output:
a 0.591050
b -0.952670
c -0.948599
d 0.091596
e -1.096649
f 0.199346
g 0.856941
h -0.086180
i -2.623903
j 0.271230
dtype: float64
Code:
new_index = ['a','A1','b','B1','c','C1','d','e','f','g','h','i','j']
a_new = a.reindex(new_index)
a_new
Output:
a 0.591050
A1 NaN
b -0.952670
B1 NaN
c -0.948599
C1 NaN
d 0.091596
e -1.096649
f 0.199346
g 0.856941
h -0.086180
i -2.623903
j 0.271230
dtype: float64
Handling missing values during reindexing
Imagine a situation where you are processing employee records. However, many of the employees have supplied incomplete information. You need a way to handle these cases and highlight the gaps to follow up with them. Perhaps you could insert ‘Unknown’ into all the empty fields to make the missing values easy to identify.
There are various ways the missing values can be handled during reindexing. We can:
-
- either specify a particular value to be filled – we do this by adding a parameter
fill_value = <value to be filled>
to the reindex method
- either specify a particular value to be filled – we do this by adding a parameter
For example:
Code:
a_fillvalue = a.reindex(new_index, fill_value=0)
a_fillvalue
Output:
a 0.591050
A1 0.000000
b -0.952670
B1 0.000000
c -0.948599
C1 0.000000
d 0.091596
e -1.096649
f 0.199346
g 0.856941
h -0.086180
i -2.623903
j 0.271230
dtype: float64
Or, we can specify the pre-defined options by passing a parameter method = <predefined method values>
. This method is handy in case we need to do operations like interpolation, forward fill, backward fill, and so on for instances such as time-series data analysis.
For example:
Code:
a = Series(np.random.randn(10), index=[0,2,4,6,8,10,12,14,16,18])
a
Output:
0 1.036439
1 1.036439
2 -0.841819
3 -0.841819
4 0.629621
5 0.629621
6 -1.905720
7 -1.905720
8 1.673387
9 1.673387
10 0.792506
11 0.792506
12 0.267104
13 0.267104
14 0.759571
15 0.759571
16 -0.847925
17 -0.847925
18 -0.598402
19 -0.598402
dtype: float64
Code:
## Reindex so that indexes 1,3,5... are introduced in the series
a_new = a.reindex(range(20))
a_new
Output:
0 1.036439
1 NaN
2 -0.841819
3 NaN
4 0.629621
5 NaN
6 -1.905720
7 NaN
8 1.673387
9 NaN
10 0.792506
11 NaN
12 0.267104
13 NaN
14 0.759571
15 NaN
16 -0.847925
17 NaN
18 -0.598402
19 NaN
dtype: float64
Code:
## Perform similar reindex but with forward fill method specific for null values
a_ffill = a.reindex(range(20), method='ffill')
a_ffill
Output:
0 1.036439
1 1.036439
2 -0.841819
3 -0.841819
4 0.629621
5 0.629621
6 -1.905720
7 -1.905720
8 1.673387
9 1.673387
10 0.792506
11 0.792506
12 0.267104
13 0.267104
14 0.759571
15 0.759571
16 -0.847925
17 -0.847925
18 -0.598402
19 -0.598402
dtype: float64
Look at index 1, 3, and 5: values have been populated from the previous index.
For the complete list of parameters of reindexing method, refer to the documentation available at the following links:
Read: Pandas Document for Series reindexing [1]
Read: Pandas Document for Dataframe Reindexing [2]
Deleting entries
We often need to delete the data from the Pandas Series and DataFrame. You can do this using the drop()
method, which is available to both Series and DataFrame. This method accepts the index, or the list of index, to be dropped from the Series and DataFrame.
This method creates a new object with only the required values. Note that this operation doesn’t perform inline-drop (i.e. the original Pandas Series or DataFrame will be preserved and still available after the drop operations). In practical terms, the method creates a selective copy of the data.
Deleting entries from Pandas Series
Let’s look at how to delete entries from a Pandas Series.
-
- Drop single index.
Code:
b = Series(np.arange(10), index=['a','b','c','d','e','f','g','h','i','j'])
b
Output:
a 0
b 1
c 2
d 3
e 4
f 5
g 6
h 7
I 8
j 9
dtype: int32
Code:
#Dropping index b
new_series = b.drop('b')
new_series
Output:
a 0
c 2
d 3
e 4
f 5
g 6
h 7
I 8
j 9
dtype: int32
-
- Drop multiple indexes.
Code:
# Dropping multiple index.
# for e.g., a, ge j
new_series_1 = b.drop(['a','g','j'])
new_series_1
Output:
b 1
c 2
d 3
e 4
f 5
h 7
I 8
dtype: int32
Deleting entries from Pandas DataFrame
In the case of DataFrame, we specify the index for both axes: row labels (by using index parameter) and column names (by using columns parameter).
The following code snippets demonstrate this behaviour:
-
- Removing a row from DataFrame.
Code:
df_states
Output:
Code:
df_states_noNT = df_states.drop('NT')
df_staes_noNT
Output:
Removing multiple columns from DataFrame by passing a sequence of column index and axis = 1.
Code:
~~~ python
df_states
~~~
Output:
Code:
df1 = df_states.drop(['state','area'], axis=1)
df1
Output:
Code:
df_states
Output:
References
- Pandas Document for Series reindexing [Internet]. Pandas; [date unknown]. Available from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html
- Pandas Document for Dataframe Reindexing [Internet]. Pandas; [date unknown]. Available from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html
Introduction to Data Analytics with Python
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free