Pandas cheat sheet
Pandas is a Python library for working with different data sources. It enables to select and manipulate data, and is primarily used for working with CSV files.
Installing pandas:
pip install pandasThen, we can import it. The standard is to import it as pd.
Creating a data frame
Section titled “Creating a data frame”A data frame is the basis for everything. The in short, df stores the data within Pandas, for enabling
us to manipulate the data.
Using CSV data
Section titled “Using CSV data”import pandas as pd
df = pd.read_csv("homes.csv")
df.head()df.head() prints the column labels and five more rows. It is often used to verify Pandas is working.
Using dictionaries
Section titled “Using dictionaries”data = { "a": [1, 2, 3], "b": [4, 10, 2], "c": [6, 15, 13]}df = pd.DataFrame(data)The provided dictionary generates the following data frame:
| a | b | c |
|---|---|---|
| 1 | 2 | 3 |
| 4 | 10 | 2 |
| 6 | 15 | 13 |
Selecting data
Section titled “Selecting data”Getting a single column
Section titled “Getting a single column”df["name"]Selecting columns in range
Section titled “Selecting columns in range”df.loc[:,"name":]Analysing data
Section titled “Analysing data”While Pandas is not mainly for plotting and analysing data, it has some capabilites for it. For example, we can retrieve information about the distribution of a data column.
name_info = df['name'].value_counts(normalize=True)