Pandas Cheat Sheet

Detailed guide for using the Pandas library.

Installing Pandas

Command	Description
`conda install pandas`	The best way to get pandas is via conda
`python3 -m pip install --upgrade pandas`	Via PyPI
`pip install pandas`	Standard pip installation

Key Abbreviations and Package Imports

Abbreviation	Description
`df`	Any Pandas DataFrame object
`s`	Any Pandas Series object

Import Methods

Command	Description
`import pandas as pd`	Import pandas with the alias pd
`import numpy as np`	Import numpy with the alias np

Data Selection

Command	Description
`df[col]`	Returns a column as a Series based on column name
`df[[col1, col2]]`	Returns multiple columns as a DataFrame
`s.iloc[0]`	Select data by position
`s.loc['index_one']`	Select data by index
`df.iloc[0,:]`	Returns the first row
`df.iloc[0,0]`	Returns the first element of the first column

Data Cleaning

Command	Description
`df.columns = ['a','b','c']`	Rename columns
`pd.isnull()`	Check for null values in a DataFrame; returns a Boolean array
`pd.notnull()`	Check for non-null values in a DataFrame; returns a Boolean array
`df.dropna()`	Delete all rows containing null values
`df.dropna(axis=1)`	Delete all columns containing null values
`df.dropna(axis=1, thresh=n)`	Delete all rows with fewer than n non-null values
`df.fillna(x)`	Replace all null values in a DataFrame with x
`s.astype(float)`	Convert Series data type to float
`s.replace(1, 'one')`	Replace all values equal to 1 with ‘one’
`s.replace([1,3], ['one', 'three'])`	Replace 1 with ‘one’ and 3 with ’three’
`df.rename(columns=lambda x: x + 1)`	Batch rename columns
`df.rename(columns={'old_name': 'new_name'})`	Selectively rename columns
`df.set_index('column_one')`	Change the index column
`df.rename(index=lambda x: x + 1)`	Batch rename indices

Importing Data

Command	Description
`pd.read_csv(filename)`	Import data from a CSV file
`pd.read_table(filename)`	Import data from a delimited text file
`pd.read_excel(filename)`	Import data from an Excel file
`pd.read_sql(query, connection_object)`	Import data from a SQL table/database
`pd.read_json(json_string)`	Import data from a JSON string
`pd.read_html(url)`	Extract tables from a URL, string, or HTML file
`pd.read_clipboard()`	Get content from clipboard and pass to read_table()
`pd.DataFrame(dict)`	Import data from a dictionary (Keys are column names, Values are data)

Exporting Data

Command	Description
`df.to_csv(filename)`	Export data to a CSV file
`df.to_excel(filename)`	Export data to an Excel file
`df.to_sql(table_name, connection_object)`	Export data to a SQL table
`df.to_json(filename)`	Export data to a text file in JSON format

Data Processing: Filter, Sort, and GroupBy

Command	Description
`df[df[col] > 0.5]`	Select rows where the value in column ‘col’ is > 0.5
`df.sort_values(col1)`	Sort data by column col1 (ascending by default)
`df.sort_values(col2, ascending=False)`	Sort data by column col2 in descending order
`df.sort_values([col1, col2], ascending=[True, False])`	Sort by col1 ascending, then col2 descending
`df.groupby(col)`	Returns a GroupBy object grouped by column ‘col’
`df.groupby([col1, col2])`	Returns a GroupBy object grouped by multiple columns
`df.groupby(col1)[col2].mean()`	Returns the mean of column col2 for each group in col1
`df.pivot_table(index=col1, values=[col2, col3], aggfunc=max)`	Create a pivot table grouped by col1, calculating max for col2 and col3
`df.groupby(col1).agg(np.mean)`	Returns the mean for all columns grouped by col1
`df.apply(np.mean)`	Apply function np.mean across each column
`df.apply(np.max, axis=1)`	Apply function np.max across each row

Creating Test Objects

Command	Description
`pd.DataFrame(np.random.rand(20,5))`	Create a DataFrame with 20 rows and 5 columns of random numbers
`pd.Series(my_list)`	Create a Series from an iterable object my_list
`df.index = pd.date_range('1900/1/30', periods=df.shape[0])`	Add a date index

Viewing and Inspecting Data

Command	Description
`df.head(n)`	View the first n rows of a DataFrame
`df.tail(n)`	View the last n rows of a DataFrame
`df.shape`	View the number of rows and columns
`df.info()`	View index, data types, and memory info
`df.describe()`	View summary statistics for numerical columns
`s.value_counts(dropna=False)`	View unique values and counts for a Series
`df.apply(pd.Series.value_counts)`	View unique values and counts for each column in a DataFrame

Data Merging

Command	Description
`df1.append(df2)`	Append rows from df2 to the end of df1
`pd.concat([df1, df2], axis=1)`	Append columns from df2 to the end of df1
`df1.join(df2, on=col1, how='inner')`	SQL-style join between df1 and df2 columns

Data Statistics

Command	Description
`df.describe()`	View summary statistics for numerical columns
`df.mean()`	Returns the mean of all columns
`df.corr()`	Returns the correlation coefficient between columns
`df.count()`	Returns the number of non-null values in each column
`df.max()`	Returns the maximum value of each column
`df.min()`	Returns the minimum value of each column
`df.median()`	Returns the median of each column
`df.std()`	Returns the standard deviation of each column