Pandas

Pandas is one of the most important Python libraries for working with structured data. It is widely used in data analysis, machine learning, statistics, finance, and data preprocessing. Pandas provides two core data structures—Series and DataFrame—that make it easy to clean, analyze, and manipulate data efficiently.

1. Introduction to Pandas

Pandas is built on top of NumPy and provides a powerful set of tools for:

Reading data (CSV, Excel, JSON, SQL, etc.)
Cleaning messy data
Handling missing values
Filtering, sorting, grouping
Merging and joining datasets
Performing statistical analysis

Install Pandas using:

pip install pandas

Then import it:

import pandas as pd

2. Pandas Data Structures

Pandas provides two main data structures:

Structure	Description
Series	One-dimensional labelled array
DataFrame	Two-dimensional table-like structure

3. Series (1-Dimensional Data)

A Series stores a single column of data with an index.

Creating a Series

import pandas as pd
s = pd.Series([10, 20, 30, 40])
print(s)

Output:

00 10
1 20
2 30
3 4

Custom index:

s = pd.Series([10, 20, 30], index=["a", "b", "c"])

Accessing values:

s["b"] # 20
s[1] # 20

4. DataFrame (2-Dimensional Data)

A DataFrame is like an Excel sheet or SQL table.

Creating a DataFrame from a dictionary

data = {
"Name": ["Aarav", "Meera", "Rohan"],
"Age": [21, 19, 22],
"City": ["Delhi", "Mumbai", "Pune"]
}
df = pd.DataFrame(data)
print(df)

Output:

Name Age City
0 Aarav 21 Delhi
1 Meera 19 Mumbai
2 Rohan 22 Pune

5. Reading and Writing Data

Read CSV file

df = pd.read_csv("data.csv")

Write DataFrame to CSV

df.to_csv("output.csv", index=False)

Read Excel file

df = pd.read_excel("file.xlsx")

Write to Excel

df.to_excel("output.xlsx", index=False)

6. Basic DataFrame Operations

View first rows

df.head()

View last rows

df.tail()

Check shape

df.shape # (rows, columns)

Get summary information

df.info()

Statistical summary

df.describe()

7. Selecting Data

Select a single column

df["Name"]

Select multiple columns

df[["Name", "City"]]

Select row by index

df.iloc[0] # first row

Select row by label

df.loc[2] # row with index 2

8. Filtering Data

Filter rows using conditions:

df[df["Age"] > 20]

Multiple conditions:

df[(df["Age"] > 20) & (df["City"] == "Delhi")]

9. Adding and Removing Columns

Add a new column

df["Score"] = [85, 92, 78]

Remove a column

df = df.drop("Score", axis=1)

10. Handling Missing Values

Detect missing values

df.isnull()
df.isnull().sum()

Fill missing values

df.fillna(0)

Remove rows with missing values

df.dropna()

11. Sorting Data

Sort by a column:

df.sort_values("Age")

Sort descending:

df.sort_values("Age", ascending=False)

12. Grouping Data

Group rows and calculate aggregate values:

grouped = df.groupby("City")["Age"].mean()
print(grouped)

13. Merging and Joining DataFrames

Merge two datasets like SQL JOIN

pd.merge(df1, df2, on="ID")

Concatenate vertically

pd.concat([df1, df2])

14. Applying Functions

Apply a function to a column:

df["Age_Double"] = df["Age"].apply(lambda x: x * 2)

15. Plotting with Pandas

Pandas integrates with Matplotlib.

df["Age"].plot(kind="bar")
plt.show()