Pandas is one of the most important Python libraries for working with structured data. It is widely used in data analysis, machine learning, statistics, finance, and data preprocessing. Pandas provides two core data structures—Series and DataFrame—that make it easy to clean, analyze, and manipulate data efficiently.
Pandas is built on top of NumPy and provides a powerful set of tools for:
Reading data (CSV, Excel, JSON, SQL, etc.)
Cleaning messy data
Handling missing values
Filtering, sorting, grouping
Merging and joining datasets
Performing statistical analysis
Install Pandas using:
pip install pandas
Then import it:
import pandas as pd
Pandas provides two main data structures:
| Structure | Description |
|---|---|
| Series | One-dimensional labelled array |
| DataFrame | Two-dimensional table-like structure |
A Series stores a single column of data with an index.
import pandas as pd
s = pd.Series([10, 20, 30, 40])
print(s)
Output:
00 10
1 20
2 30
3 4
Custom index:
s = pd.Series([10, 20, 30], index=["a", "b", "c"])
Accessing values:
s["b"] # 20
s[1] # 20
A DataFrame is like an Excel sheet or SQL table.
data = {
"Name": ["Aarav", "Meera", "Rohan"],
"Age": [21, 19, 22],
"City": ["Delhi", "Mumbai", "Pune"]
}df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Aarav 21 Delhi
1 Meera 19 Mumbai
2 Rohan 22 Pune
df = pd.read_csv("data.csv")
df.to_csv("output.csv", index=False)
df = pd.read_excel("file.xlsx")
df.to_excel("output.xlsx", index=False)
df.head()
df.tail()
df.shape # (rows, columns)
df.info()
df.describe()
df["Name"]
df[["Name", "City"]]
df.iloc[0] # first row
df.loc[2] # row with index 2
Filter rows using conditions:
df[df["Age"] > 20]
Multiple conditions:
df[(df["Age"] > 20) & (df["City"] == "Delhi")]
df["Score"] = [85, 92, 78]
df = df.drop("Score", axis=1)
df.isnull()
df.isnull().sum()
df.fillna(0)
df.dropna()
Sort by a column:
df.sort_values("Age")
Sort descending:
df.sort_values("Age", ascending=False)
Group rows and calculate aggregate values:
grouped = df.groupby("City")["Age"].mean()
print(grouped)
pd.merge(df1, df2, on="ID")
pd.concat([df1, df2])
Apply a function to a column:
df["Age_Double"] = df["Age"].apply(lambda x: x * 2)
Pandas integrates with Matplotlib.
df["Age"].plot(kind="bar")
plt.show()