Pandas is a powerful library for data manipulation and analysis. It provides data structures and functions that make working with structured data easy and intuitive. Here’s a comprehensive guide on how to work with Pandas:
1. Importing Pandas: You need to import the Pandas library before using it in your Python code. Use the following import statement:
import pandas as pd
2. Data Structures in Pandas: Pandas provides two main data structures: Series and DataFrame.
- Series: A one-dimensional array-like object that can hold data of any type. It’s similar to a column in a spreadsheet.
data = pd.Series([10, 20, 30, 40])
- DataFrame: A two-dimensional table of data with rows and columns, similar to a spreadsheet or a SQL table.
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22]}
df = pd.DataFrame(data)
3. Reading Data: Pandas can read data from various file formats like CSV, Excel, SQL, etc.
df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx')
import sqlalchemy
engine = sqlalchemy.create_engine('sqlite:///data.db')
df = pd.read_sql('SELECT * FROM table_name', engine)
4. Basic Data Manipulation:
- Viewing Data: Use
head()
,tail()
, andsample()
to view the first, last, and random rows of the DataFrame. - Selecting Columns: Access columns using their labels.
name_column = df['Name']
- Filtering Data: Use conditional statements to filter rows based on specific conditions.
young_people = df[df['Age'] < 30]
- Adding Columns: Create new columns by assigning values or using operations.
df['Salary'] = df['Age'] * 1000
5. Data Cleaning:
- Handling Missing Data: Pandas provides methods like
isna()
,fillna()
, anddropna()
to handle missing data.
df.dropna() # Drop rows with missing values
df.fillna(0) # Fill missing values with 0
6. Grouping and Aggregation: You can group data based on certain columns and perform aggregation functions on them.
grouped = df.groupby('Department')['Salary'].mean()
7. Data Visualization: Pandas can work with visualization libraries like Matplotlib and Seaborn to create plots.
import matplotlib.pyplot as plt
df.plot(x='Age', y='Salary', kind='scatter')
plt.show()
8. Exporting Data: Save your processed data to various file formats.
df.to_csv('processed_data.csv', index=False)
df.to_excel('processed_data.xlsx', index=Fals