Day 1 — Diving Deep into Numerical Python using NumPy

23 Jul 2025 4 mins read

Data Science NumPy Python 60DayDSChallenge

When restarting my data science journey, I knew exactly where I needed to begin: NumPy — the powerful foundation for all numerical computing in Python. Whether you’re building ML models, crunching data, or diving deep into neural networks, NumPy is where everything begins.

In this post, I’ll cover the key theory, code implementations, and why NumPy is essential for any data scientist or ML practitioner.

What is NumPy?

NumPy (short for Numerical Python) is the core library for scientific computing in Python. It offers:

A powerful n-dimensional array object (ndarray)
Tools for performing fast vectorized operations
Utilities for linear algebra, random number generation, and Fourier transforms

Think of it as the engine that powers most of your data pipelines, especially when performance matters.

Core Concepts

Here are the key building blocks you must know:

Concept	Explanation
`ndarray`	The core data structure – multi-dimensional array
Vectorization	Replacing loops with fast array-wide operations
Broadcasting	Auto-expanding arrays of different shapes to work together
Axis	Direction across which operations are applied
Shape & Size	Tuple of dimensions, and total number of elements

Array Basics in Python

import numpy as np

# Basic arrays
# note: `ndarray` is the class and np.array() is the function that calls ndarray object
a = np.array([1, 2, 3])
b = np.zeros((2, 3))
c = np.ones((3, 3))
d = np.eye(4)  # Identity matrix
e = np.random.randint(0, 10, (3, 3))

Array Properties

print(e.shape)     # e.g., (3, 3)
print(e.ndim)      # 2 (2D)
print(e.dtype)     # e.g., int64
print(e.size)      # 9

NumPy Operations You Should Know

Arithmetic Operations

x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

print(x + y)
print(x * y)
print(np.dot(x, y))

Reshaping and Flattening

z = np.arange(12).reshape(3, 4)
print(z.flatten())

Broadcasting Example

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr + np.array([10, 20, 30]))

This automatically adds the vector [10, 20, 30] to each row of the matrix.

Mini Challenge (Optional Exercise)

Load and process a CSV without using Pandas:

data = np.genfromtxt('iris.csv', delimiter=',', skip_header=1)
features = data[:, :-1]

# Normalize
normalized = (features - features.mean(axis=0)) / features.std(axis=0)

This gives you a taste of how much you can achieve with just NumPy.

Key Takeaways

NumPy is fast, lightweight, and vectorized – it makes for clean and efficient code.
Mastering NumPy sets you up for deeper libraries like Pandas, Scikit-Learn, PyTorch, and TensorFlow.
Concepts like broadcasting, reshaping, slicing are not just academic — they’re practical and powerful tools.
Even without Pandas or Scikit-learn, you can load, clean, and process data using pure NumPy.
It’s worth revisiting even if you’ve used it before — the depth is surprising.

Code for Day 1 can be found on my github repository: Day 1 of Data Science

What’s Next?

Tomorrow, I’ll explore the Pandas library — the powerhouse for handling structured data in Python.

All content is licensed under the CC BY-SA 4.0 License unless otherwise specified