Skip to main content

Pandas cheat sheet

Pandas is a Python library for working with different data sources. It enables to select and manipulate data, and is primarily used for working with CSV files.

Installing pandas:

pip install pandas 

Then, we can import it. The standard is to import it as pd.

Creating a data frame

A data frame is the basis for everything. The in short, df stores the data within Pandas, for enabling us to manipulate the data.

Using CSV data

import pandas as pd 

df = pd.read_csv("homes.csv")

df.head()
df.head() prints the column labels and five more rows. It is often used to verify Pandas is working.

Using dictionaries

data = {
"a": [1, 2, 3],
"b": [4, 10, 2],
"c": [6, 15, 13]
}
df = pd.DataFrame(data)

The provided dictionary generates the following data frame:

abc
123
4102
61513

Selecting data

Getting a single column

df["name"]

Selecting columns in range

df.loc[:,"name":]

Analysing data

While Pandas is not mainly for plotting and analysing data, it has some capabilites for it. For example, we can retrieve information about the distribution of a data column.

name_info = df['name'].value_counts(normalize=True)