Skip to content

Pandas Cheat Sheet

Detailed guide for using the Pandas library.

Installing Pandas

Command Description
conda install pandas The best way to get pandas is via conda
python3 -m pip install --upgrade pandas Via PyPI
pip install pandas Standard pip installation

Key Abbreviations and Package Imports

Abbreviation Description
df Any Pandas DataFrame object
s Any Pandas Series object

Import Methods

Command Description
import pandas as pd Import pandas with the alias pd
import numpy as np Import numpy with the alias np

Data Selection

Command Description
df[col] Returns a column as a Series based on column name
df[[col1, col2]] Returns multiple columns as a DataFrame
s.iloc[0] Select data by position
s.loc['index_one'] Select data by index
df.iloc[0,:] Returns the first row
df.iloc[0,0] Returns the first element of the first column

Data Cleaning

Command Description
df.columns = ['a','b','c'] Rename columns
pd.isnull() Check for null values in a DataFrame; returns a Boolean array
pd.notnull() Check for non-null values in a DataFrame; returns a Boolean array
df.dropna() Delete all rows containing null values
df.dropna(axis=1) Delete all columns containing null values
df.dropna(axis=1, thresh=n) Delete all rows with fewer than n non-null values
df.fillna(x) Replace all null values in a DataFrame with x
s.astype(float) Convert Series data type to float
s.replace(1, 'one') Replace all values equal to 1 with ‘one’
s.replace([1,3], ['one', 'three']) Replace 1 with ‘one’ and 3 with ’three’
df.rename(columns=lambda x: x + 1) Batch rename columns
df.rename(columns={'old_name': 'new_name'}) Selectively rename columns
df.set_index('column_one') Change the index column
df.rename(index=lambda x: x + 1) Batch rename indices

Importing Data

Command Description
pd.read_csv(filename) Import data from a CSV file
pd.read_table(filename) Import data from a delimited text file
pd.read_excel(filename) Import data from an Excel file
pd.read_sql(query, connection_object) Import data from a SQL table/database
pd.read_json(json_string) Import data from a JSON string
pd.read_html(url) Extract tables from a URL, string, or HTML file
pd.read_clipboard() Get content from clipboard and pass to read_table()
pd.DataFrame(dict) Import data from a dictionary (Keys are column names, Values are data)

Exporting Data

Command Description
df.to_csv(filename) Export data to a CSV file
df.to_excel(filename) Export data to an Excel file
df.to_sql(table_name, connection_object) Export data to a SQL table
df.to_json(filename) Export data to a text file in JSON format

Data Processing: Filter, Sort, and GroupBy

Command Description
df[df[col] > 0.5] Select rows where the value in column ‘col’ is > 0.5
df.sort_values(col1) Sort data by column col1 (ascending by default)
df.sort_values(col2, ascending=False) Sort data by column col2 in descending order
df.sort_values([col1, col2], ascending=[True, False]) Sort by col1 ascending, then col2 descending
df.groupby(col) Returns a GroupBy object grouped by column ‘col’
df.groupby([col1, col2]) Returns a GroupBy object grouped by multiple columns
df.groupby(col1)[col2].mean() Returns the mean of column col2 for each group in col1
df.pivot_table(index=col1, values=[col2, col3], aggfunc=max) Create a pivot table grouped by col1, calculating max for col2 and col3
df.groupby(col1).agg(np.mean) Returns the mean for all columns grouped by col1
df.apply(np.mean) Apply function np.mean across each column
df.apply(np.max, axis=1) Apply function np.max across each row

Creating Test Objects

Command Description
pd.DataFrame(np.random.rand(20,5)) Create a DataFrame with 20 rows and 5 columns of random numbers
pd.Series(my_list) Create a Series from an iterable object my_list
df.index = pd.date_range('1900/1/30', periods=df.shape[0]) Add a date index

Viewing and Inspecting Data

Command Description
df.head(n) View the first n rows of a DataFrame
df.tail(n) View the last n rows of a DataFrame
df.shape View the number of rows and columns
df.info() View index, data types, and memory info
df.describe() View summary statistics for numerical columns
s.value_counts(dropna=False) View unique values and counts for a Series
df.apply(pd.Series.value_counts) View unique values and counts for each column in a DataFrame

Data Merging

Command Description
df1.append(df2) Append rows from df2 to the end of df1
pd.concat([df1, df2], axis=1) Append columns from df2 to the end of df1
df1.join(df2, on=col1, how='inner') SQL-style join between df1 and df2 columns

Data Statistics

Command Description
df.describe() View summary statistics for numerical columns
df.mean() Returns the mean of all columns
df.corr() Returns the correlation coefficient between columns
df.count() Returns the number of non-null values in each column
df.max() Returns the maximum value of each column
df.min() Returns the minimum value of each column
df.median() Returns the median of each column
df.std() Returns the standard deviation of each column