Introduction to NumPy
March 20, 2018
NumPy, short for Numerical Python is a library for scientific computing in Python. As the name suggests, it provides a host of tools to conduct mathematical and numerical routines.
One amongst these high-performing tools is the NumPy array. This multidimensional array object is a powerful data structure for efficient computation on vectors and matrices. In this article, we will explore these arrays and their power-packed functionalities.
You are encouraged to follow along with the tutorial and play around with NumPy, trying various things and making sure you’re getting the hang of it. Let’s get started!
Import
As with any other package we start off by importing the library, NumPy in this case, by its most commonly used alias, np
.
import numpy as np
Array Creation Routines
Now with our NumPy library ready to use, we can jump right into array creation. This array data structure may look simple but comes with usefulness in multitudes.
# Create a 1-dimensional array
a = np.array([1,4,2,3,5,7,8,6])
print(a)
Not only so, the beauty of this extends further to n-dimensional array objects, ndarray
. Below is an example of a 2-dimensional array.
# Create a 2-dimensional array
b = np.array([[5,0,1,0,2,3], [1,3,0,1,2,0], [0,1,0,0,1,3]])
print(b)
Alternatively, we can initialize an array by calling on common functions like zeros, ones, full, and random. The first argument to all these functions is the shape of the array we would like.
# Create a 2x2 array of all zeros
print('A 2x2 array of all zeros')
print(np.zeros((2, 2)))
print()
# Create an 1x2 array of all ones
print('An 1x2 array of all ones')
print(np.ones((1, 2)))
print()
# Create a 2x2 constant array of the number 7
print('A 2x2 constant array of the number 7')
print(np.full((2, 2), 7))
print()
# Creates a 2x2 array filled with random values between 0 and 1
print('A 2x2 array filled with random values between 0 and 1')
np.random.seed(42) # set a seed to that we always get the same random values
c = np.random.random((2, 2))
print(c)
The np.eye
is used to create an identity matrix. Since identity matrix must be a square matrix, we pass it the argument 2 instead of (2, 2).
# Create a 2x2 identity matrix
print('A 2x2 identity matrix')
print(np.eye(2))
Reading documentation within Python
Did you know that you can read a function’s documentation directly within Python? You can do this using the help()
function. For example,
help(np.random.random)
The above documentation in this case starts with a description of the method. Then it describes the input parameters and the output returned by the method. Finally, it provides some illustrative examples.
Defining features of NumPy Arrays
Once we have some data as a NumPy array, it is useful to observe its defining features. Here are a few:
Shape and size
Find the dimensions (shape) of an array
print('Shape of NumPy array "a"')
print(a.shape)
print()
print('Shape of NumPy array "b"')
print(b.shape)
Find the number of dimensions (ndim) of an array
print('Number of dimensions of NumPy array "a"')
print(a.ndim)
print()
print('Number of dimensions of NumPy array "b"')
print(b.ndim)
Total number of elements (size) of an array
print('Total number of elements in NumPy array "a"')
print(a.size)
print()
print('Total number of elements in NumPy array "b"')
print(b.size)
print()
print('Total number of elements in NumPy array "c"')
print(c.size)
Datatype
Type of data stored in an array
print('Type of data stored in NumPy array "a"')
print(a.dtype.name)
print()
print('Type of data stored in NumPy array "b"')
print(b.dtype.name)
print()
print('Type of data stored in NumPy array "c"')
print(c.dtype.name)
dtype
tells us the type of each element stored in the array. When the array has integers, the type is 'int64'
, i.e. 64-bit integers. When the array has real numbers, the type is 'float64'
, i.e. 64-bit floating point values.
Note: If you’re running the code on your own computer, depending on whether your computer is 32-bit or 64-bit, you’ll get
'int32'
or'int64'
as the outputs above.
Size of an individual array element (in bytes)
print('Size of an individual array elements in NumPy array "a"')
print(a.itemsize)
print()
print('Size of an individual array elements in NumPy array "b"')
print(b.itemsize)
print()
print('Size of an individual array elements in NumPy array "c"')
print(c.itemsize)
Since 8 bits = 1 byte, and we have 64-bit integers and floats, the number of bytes per array element is 64 / 8 = 8.
Note: In a 32-bit computer, this value will be 32 / 8 = 4.
The type of array itself
print(type(a))
Awesome! We saw the important features of a NumPy array such as shape, size and type, both for the array and for individual elements in the array.
Often, we use these attributes to create new arrays as well. For example,
# Create an array of zeroes of the same shape as "b"
print('A new array of zeroes of the same shape as "b"')
print(np.zeros(b.shape))
print()
# Create an array of ones of the same shape as "a"
print('A new array of ones of the same shape as "a"')
print(np.ones(a.shape))
Slicing and Indexing
After creating arrays, we can access and modify different parts of the array through slicing and indexing.
Indexing
We use the notation a[i]
to pull out the i-th element from array a
(assuming a
is 1-dimensional). Indexing starts as 0, as in normal Python. Hence, a[0]
gives the first element, a[1]
gives the second element, and so on.
# Recall what array a looked like
print('array "a"')
print(a)
print()
# Indexing. As always, indexes start from 0.
print('First element of array "a"')
print(a[0]) # element 1
print()
print('Second element of array "a"')
print(a[1]) # element 2
For 2-dimensional arrays, we use the notation array[i, j]
to pull out the element in row i, column j.
# Recall what array b looked like
print('array "b"')
print(b)
print()
# Indexing. As always, indexes start from 0.
print('Element from row 0, column 1, of array "b"')
print(b[0, 1]) # (row 0, column 1)
Note: This syntax is different from having a list of lists in Python. In that case, you would do
a[i][j]
(as opposed toa[i, j]
).
Slicing
Instead of just a single element, it is also possible for us to look at sub-arrays or sub-parts of the original array. This is done using slicing.
# Slicing pulls out a subarray
# Example for pulling out elements 2 through 4
# (element 2 is included and 4 is not, same as python list slicing).
print('Elements 2 through 4 (4 not included), of array "a"')
print(a[2:4])
print()
# Example for pulling out first 3 elements
print('First 3 elements of array "a"')
print(a[:3])
Similar to indexing, slicing can be done on higher dimensional arrays as well.
# Example for pulling out first 2 rows and columns 1 though 3
# (column 1 is included and 3 is not, same as python list slicing).
print('First 2 rows and columns 1 through 3 (3 not included) of array "b" ')
print(b[:2, 1:3])
print()
# Values from row 1 through 2 and all columns
print('Values from row 1 through 2 and all columns, of array "b"')
print(b[1:2, :])
print()
# Values from all rows and column 2
print('Values from all rows and column 2')
print(b[:, 2])
Note that there is a small but important difference in the last two examples. b[1:2,:]
returns a 2-D array of shape (1, 6). Whereas b[:,2]
returns a 1-D array of shape (3, ).
Modifying elements
A index or a slice of an array is a view into the same data, so modifying it will modify the original array.
b[0, 0] = 7 # sets the first value in the first column to 7.
print(b)
We can also set an entire sub-array to be a particular value.
d = np.copy(b)
d[1:3, 2:4] = 9 # sets all values in the subarray to 9.
print(d)
We can also set each element in a sub-array to different values, as defined by another array. This array must have the same shape as the sub-array we select.
d[:2, 1:3] = np.array([[2, 0], [4, 3]])
print(d)
Conditionals and Conditional Indexing
Let’s see a conditional. A conditional returns an output of True
and False
values of the same size as the original array.
# Recall what array a looked like
print('array "a"')
print(a)
print()
print('Conditional of array "a", for elements of "a" greater than 2')
print(a > 2)
This array of True and False can be used for indexing.
print('Values of array "a" larger than 2')
print(a[a > 2]) # values of a larger than 2
Let’s do the same thing for array b:
# Recall what array b looked like
print('array "b"')
print(b)
print()
print('Conditional of array "b", for elements of "b" greater than 2')
print(b > 2)
print()
print('Values of array "b" larger than 2')
print(b[b > 2])
Note above that when we performed conditional indexing on a 2-D array, the result was a flattened array. This is true when the conditional (in this case, b > 2
) has the same shape as the original array (in this case, b
).
NumPy doesn’t have a choice here but to flatten the result, since it is possible for the result to include different number of values from each row and column. In the example above, 2 values in row 1 are greater than 2, but only 1 value from rows 2 and 3 are greater than 2.
Conclusion
In summary, advantages of using NumPy
- array oriented computing
- efficiently implemented multi-dimensional arrays
- designed for scientific computation
- sophisticated functions for initializing, indexing and slicing
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data (including strings and objects). For more information, you can check the documentation found here: NumPy Reference