Pandas

Pandas is one of the most important Python libraries for working with structured data. It is widely used in data analysis, machine learning, statistics, finance, and data preprocessing. Pandas provides two core data structures—Series and DataFrame—that make it easy to clean, analyze, and manipulate data efficiently.

1. Introduction to Pandas

Pandas is built on top of NumPy and provides a powerful set of tools for:

Reading data (CSV, Excel, JSON, SQL, etc.)

Cleaning messy data

Handling missing values

Filtering, sorting, grouping

Merging and joining datasets

Performing statistical analysis

Install Pandas using:

pip install pandas

Then import it:

import pandas as pd

2. Pandas Data Structures

Pandas provides two main data structures:

StructureDescription
Series               One-dimensional labelled array
DataFrameTwo-dimensional table-like structure

3. Series (1-Dimensional Data)

A Series stores a single column of data with an index.

Creating a Series

import pandas as pd

s = pd.Series([10, 20, 30, 40])
print(s)

Output:

00    10
1    20
2    30
3    4

Custom index:

s = pd.Series([10, 20, 30], index=["a", "b", "c"])

Accessing values:

s["b"]      # 20
s[1]        # 20

4. DataFrame (2-Dimensional Data)

A DataFrame is like an Excel sheet or SQL table.

Creating a DataFrame from a dictionary

data = {
   "Name": ["Aarav", "Meera", "Rohan"],
   "Age": [21, 19, 22],
   "City": ["Delhi", "Mumbai", "Pune"]
}

df = pd.DataFrame(data)
print(df)
 

Output:

    Name   Age    City
0  Aarav   21   Delhi
1  Meera   19  Mumbai
2  Rohan   22    Pune

5. Reading and Writing Data

Read CSV file

df = pd.read_csv("data.csv")

Write DataFrame to CSV

df.to_csv("output.csv", index=False)

Read Excel file

df = pd.read_excel("file.xlsx")

Write to Excel

df.to_excel("output.xlsx", index=False)

6. Basic DataFrame Operations

View first rows

df.head()

View last rows

df.tail()

Check shape

df.shape    # (rows, columns)

Get summary information

df.info()

Statistical summary

df.describe()

7. Selecting Data

Select a single column

df["Name"]

Select multiple columns

df[["Name", "City"]]

Select row by index

df.iloc[0]      # first row

Select row by label

df.loc[2]       # row with index 2

8. Filtering Data

Filter rows using conditions:

df[df["Age"] > 20]

Multiple conditions:

df[(df["Age"] > 20) & (df["City"] == "Delhi")]

9. Adding and Removing Columns

Add a new column

df["Score"] = [85, 92, 78]

Remove a column

df = df.drop("Score", axis=1)

10. Handling Missing Values

Detect missing values

df.isnull()
df.isnull().sum()

Fill missing values

df.fillna(0)

Remove rows with missing values

df.dropna()

11. Sorting Data

Sort by a column:

df.sort_values("Age")

Sort descending:

df.sort_values("Age", ascending=False)

12. Grouping Data

Group rows and calculate aggregate values:

grouped = df.groupby("City")["Age"].mean()
print(grouped)

13. Merging and Joining DataFrames

Merge two datasets like SQL JOIN

pd.merge(df1, df2, on="ID")

Concatenate vertically

pd.concat([df1, df2])

14. Applying Functions

Apply a function to a column:

df["Age_Double"] = df["Age"].apply(lambda x: x * 2)

15. Plotting with Pandas

Pandas integrates with Matplotlib.

df["Age"].plot(kind="bar")
plt.show()