Machine Learning Basic Tool: NumPy
Data science and AI are numerical languages. You can be making a recommendation system, an autonomous car, or even predicting the value of a home. Regardless of the type of machine learning model that you are creating, the data needs to become numbers.
However, standard Python lists aren’t designed to handle the massive amounts of numerical data required for modern Machine Learning (ML). This is where NumPy comes in. It is the fundamental building block of the entire Python data science ecosystem. Without NumPy, modern ML would be incredibly slow and difficult to implement.
In this comprehensive guide, we will explore why NumPy is the “gold standard” for numerical computing and how you can master its core features.
What is NumPy?
NumPy, which stands for Numerical Python, is an open-source library used for working with numerical data in Python. Created by Travis Oliphant in 2005, it provides a powerful object called the N-dimensional array (or ndarray) and a collection of functions for performing fast operations on these arrays.
Why not just use Python Lists?
You might wonder, “Why do I need NumPy when Python already has lists?”
Imagine you have a list of one million numbers and you want to multiply each of them by 2. Using a standard Python list, you would need to write a for loop, which is notoriously slow because Python has to check the data type of every single element during every iteration.
NumPy is superior for three main reasons:
- Speed (Vectorization): NumPy operations are implemented in C and Fortran, making them much faster than Python loops. It uses a concept called vectorization, which allows it to perform operations on whole arrays at once.
- Memory Efficiency: NumPy arrays are stored in a contiguous block of memory. This means they take up significantly less space than Python lists.
- Functionality: It contains a vast library of mathematical functions, including linear algebra, Fourier transforms, and random number generation, which are essential for Machine Learning.
How to Install and Import NumPy
If you have Python installed, you can install NumPy using pip:
pip install numpy
Once installed, the standard way to import it into your scripts is:
import numpy as np
Using as np It is a universal convention in the data science community.
Data Types and Their Attributes
In standard Python, a single list can contain a string, an integer, and a boolean all at once. NumPy, however, requires all elements in an array to be of the same data type. This homogeneity is exactly what makes it so fast.
Common NumPy Data Types (dtype)
NumPy provides several data types that allow you to control how much memory each number consumes:
- int64 / int32: Integers (whole numbers).
- float64 / float32: Floating-point numbers (decimals).
- bool: Boolean (True/False).
- complex128: Complex numbers.
Array Attributes
When you create a NumPy array, it comes with several “attributes” that tell you about its structure. Let’s look at an example:
import numpy as np
# Creating a 2D array (Matrix)
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Data Type: {arr.dtype}")
print(f"Shape: {arr.shape}")
print(f"Number of Dimensions: {arr.ndim}")
print(f"Total Number of Elements: {arr.size}")
Understanding the Attributes:
.dtype: Tells you the data type (e.g.,int64)..shape: This is the most important attribute in ML. It returns a tuple representing the size of each dimension. For a 2D array of 2 rows and 3 columns, the shape is(2, 3)..ndim: Tells you how many “axes” or dimensions the array has..size: The total count of elements in the array (e.g., 2 rows * 3 columns = 6 elements).
Arrays: The Heart of NumPy
The most important object in NumPy is the ndarray. Think of an array as a grid of values.
Visualizing Dimensions
To understand Machine Learning data, you must understand these three structures:
- Scalar (0D Array): A single number.
[5] - Vector (1D Array): A list of numbers.
[1, 2, 3] - Matrix (2D Array): A table of numbers (Rows and Columns).
- Tensor (3D+ Array): A stack of matrices (Used for images or video data).
Diagram Representation:
1D Array (Vector): [ * * * * ] (Shape: (4,))
2D Array (Matrix): [ [ * * * ] (Shape: (2, 3))
[ * * * ] ]
3D Array (Tensor): A cube of numbers (Shape: (depth, rows, cols))
Creating Arrays
There are several ways to generate arrays:
# 1. From a list
a = np.array([1, 2, 3])
# 2. Array of Zeros (Useful for initializing weights in ML)
zeros = np.zeros((3, 3))
# 3. Array of Ones
ones = np.ones((2, 4))
# 4. Range of numbers (similar to Python's range)
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# 5. Linearly spaced numbers
linspace_arr = np.linspace(0, 1, 5) # 5 numbers evenly spaced between 0 and 1
Indexing and Slicing
Just like Python lists, you can access elements using square brackets.
arr_2d = np.array([[10, 20, 30], [40, 50, 60]])
# Accessing a single element: arr[row, col]
print(arr_2d[0, 1]) # Output: 20
# Slicing: Accessing multiple elements
# Get the first row, elements 1 to 2
print(arr_2d[0, 1:3]) # Output: [20, 30]
Array Operations
Machine Learning involves a lot of math. NumPy makes this math incredibly easy to write and incredibly fast to execute.
1. Element-wise Arithmetic
You can perform math on two arrays as if they were single numbers.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5, 7, 9]
print(a * b) # [4, 10, 18]
print(a ** 2) # [1, 4, 9]
2. Universal Functions (Ufuncs)
NumPy provides built-in mathematical functions that operate on every element of an array.
arr = np.array([1, 4, 9])
print(np.sqrt(arr)) # [1, 2, 3]
print(np.exp(arr)) # Exponential
print(np.sin(arr)) # Sine values
3. Aggregation (Statistics)
In Machine Learning, we often need to find the average error or the maximum probability.
data = np.array([[1, 2], [3, 4]])
print(np.sum(data)) # 10
print(np.mean(data)) # 2.5 (Average)
print(np.max(data)) # 4
print(np.std(data)) # Standard Deviation
4. Broadcasting
Broadcasting is a powerful NumPy feature that allows you to perform operations on arrays of different shapes. For example, if you add a single number (scalar) to a matrix, NumPy “stretches” that number to match the matrix’s shape.
matrix = np.array([[1, 2, 3], [4, 5, 6]])
result = matrix + 10
# Technically, 10 becomes [[10, 10, 10], [10, 10, 10]] to match the shape.
5. Matrix Multiplication (The Heart of ML)
In Deep Learning, neural networks are essentially just a series of matrix multiplications.
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Dot Product
product = np.dot(A, B)
# OR using the @ symbol (Python 3.5+)
product_alt = A @ B
Sorting Arrays
Sorting data is important for ranking results, finding outliers, or organizing features.
Simple Sort
The np.sort() function returns a sorted copy of the array.
unordered = np.array([5, 2, 8, 1, 9])
ordered = np.sort(unordered)
print(ordered) # [1, 2, 5, 8, 9]
Sorting 2D Arrays (Matrices)
You can sort by rows or by columns using the axis parameter.
axis=0: Sorts along the columns (vertically).axis=1: Sorts along the rows (horizontally).
arr = np.array([[3, 2, 1], [6, 5, 4]])
sort_rows = np.sort(arr, axis=1)
print(sort_rows)
# [[1, 2, 3],
# [4, 5, 6]]
Argsort: The ML Secret Weapon
In Machine Learning, we often don’t want the sorted numbers; we want to know the index of the numbers. For example, if an AI predicts probabilities for 3 classes (Cat, Dog, Bird), we want the index of the highest probability.
probs = np.array([0.1, 0.7, 0.2]) # Cat, Dog, Bird
indices = np.argsort(probs)
print(indices) # [0, 2, 1]
# This tells us index 1 (Dog) has the highest value.
Conclusion: Why NumPy is your first step to ML
Mastering NumPy is not just about learning a library; it’s about learning how to handle data efficiently. Every major tool you will use later—Pandas for data manipulation, Matplotlib for visualization, and Scikit-learn for modeling—is built directly on top of NumPy.
Summary of what we covered:
- What is NumPy? A high-performance library for numerical data.
- Data Types: Fixed types like
float64andint32Make it fast. - Arrays: Understanding the shape and dimensions (1D, 2D, 3D) is crucial.
- Operations: Vectorization and Broadcasting allow us to avoid slow
forloops. - Sorting: Using
sortandargsortorganizing our data.
By understanding these basics, you have laid the foundation for becoming a professional Machine Learning engineer. Your next step is to take a real-world dataset and apply these NumPy techniques to clean and prepare it for a model!
Explore More IT Terms
A
- A/B testing
- Agile
- Algorithms and Data Structures in C#
- An overview of the C # programming language
- An overview of the Python programming language
- Anaconda Python
- Android
- Android App Bundle
- Android SDK
- Angular
- Ansible
- Apache
- Apache Airflow
- Apache Kafka
- Apache Tomcat
- App Store
- AppCode
- Array-based stack
- ArrayList
- ASCII
- ASP.NET
- Assembly Language Lessons
B
C
D
- Data Analytics: applications of data analysis in companies
- Data Engineer - Who is it, what does a data engineer do, and an overview of the profession
- Data modeling: what it is, types, and process steps.
- Data preprocessing: a complete guide for beginners and professionals.
- Data structure
- Data structures
- Defining Aliases
- Defining Arrays
- Deque
- Developing a Website from Scratch
- Digital data: understand the importance of this asset for businesses.
- Doubly linked lists
E
F
H
- Handling errors and exceptions
- How to effectively organize your workflow
- How to Learn Java: Tips for Beginner Developers
- How to Learn PHP: A Beginner's Guide
- How to Use S3 Storage in Kubernetes with CSI
- HTML
- HTML and CSS: Definition, Application, and Operating Principles
- HTML and CSS. Layout from Scratch: What to Learn, Where to Learn, and How Long Will It Take?
- HTML Frame Structure
- HTML Link Formatting
I
K
M
P
S
T
W
- What are databases, and why do they need DBMS and SQL?
- What do Linux distributions consist of?
- What is .NET and what is it used for?
- What is a GPU in a computer, in simple terms?
- What is Big Data? Introduction, Types, Characteristics, and Examples
- What is Golang and what is it used for?
- What is Haskell and what is it used for?
- What is Kotlin and what is it used for?
- What is Linux? The History of Linux
- What is Power BI: everything about the data analytics software
- What is the C++ programming language?
- What is the OSI Model: A Complete Explanation of the Seven Layers and Their Role in Networking
- Where to start learning the C programming language?
- Which Linux distribution should you choose? A Linux distribution overview
