Introduction to Python's Data Science Libraries

Python is widely used in data science due to its powerful libraries that simplify data analysis and visualization.

Key Data Science Libraries:

NumPy: Fundamental package for numerical computations with support for large, multi-dimensional arrays and matrices.
Pandas: Provides high-performance data structures (like DataFrame) and data analysis tools.
Matplotlib: Plotting library for creating static, animated, and interactive visualizations.
SciPy: Library used for scientific and technical computing.
Scikit-learn: Machine learning library offering tools for data mining and data analysis.

Example Using NumPy:

import numpy as np

# Creating arrays
a = np.array([1, 2, 3])
b = np.array([[1, 2], [3, 4]])

# Basic operations
print(a + 5)        # Output: [6 7 8]
print(np.mean(b))   # Output: 2.5

Example Using Pandas:

import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)

# Displaying data
print(df)

# Accessing columns
print(df['Name'])

# Descriptive statistics
print(df.describe())

Example Using Matplotlib:

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4]
y = [10, 15, 20, 25]

# Creating a line plot
plt.plot(x, y)
plt.title('Sample Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Browse Articles

Related Knowledge Base Posts