Intro to Pandas

Pandas has 2 main datatypes:

Anatomy of a DataFrame

Describing Data

Viewing and Selecting Data

.head(x) & .tail(x) methods

.loc(i) & .iloc(p) methods

On a Series

On a DataFrame

pd.crosstab(col1, col2)

Matplotlib Setup

%matplotlib inline
import matplotlib.pyplot as plt

Manipulating Data

Modifying a DataFrame Column

2 Methods

  1. Reassignment
    • e.g. df["col1"] = df["col1"].transformation()
  2. Inplace
    • e.g. df["col1"].transformation(transformedValue, inplace=True)

Pandas Docs: Working with Missing Data

Deleting a DataFrame Column

2 Methods

  1. Delete
    • e.g. del df["colname"]
  2. Pop
    • e.g. popped_column = df.pop("colname")

Pandas Docs: Column selection, addition, deletion

Data Randomization

We can use sample() to randomly select a percentage of the data, and reset_index() to renumber the index in the current order.

Pandas Docs