Python is widely used in data science due to its powerful libraries that simplify data analysis and visualization.
Key Data Science Libraries:
- NumPy: Fundamental package for numerical computations with support for large, multi-dimensional arrays and matrices.
- Pandas: Provides high-performance data structures (like DataFrame) and data analysis tools.
- Matplotlib: Plotting library for creating static, animated, and interactive visualizations.
- SciPy: Library used for scientific and technical computing.
- Scikit-learn: Machine learning library offering tools for data mining and data analysis.
Example Using NumPy:
import numpy as np
# Creating arrays
a = np.array([1, 2, 3])
b = np.array([[1, 2], [3, 4]])
# Basic operations
print(a + 5) # Output: [6 7 8]
print(np.mean(b)) # Output: 2.5
Example Using Pandas:
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
# Displaying data
print(df)
# Accessing columns
print(df['Name'])
# Descriptive statistics
print(df.describe())
Example Using Matplotlib:
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4]
y = [10, 15, 20, 25]
# Creating a line plot
plt.plot(x, y)
plt.title('Sample Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()