Python lists:
However, flexibility often comes at the cost of performance, and lists are not the ideal object for numerical calculations.
This is where Numpy comes in. Numpy is a Python module that defines a powerful n-dimensional array object that uses C and Fortran code behind the scenes to provide high performance.
The downside of Numpy arrays is that they have a more rigid structure, and require a single numerical type (e.g. floating point values), but for a lot of scientific work, this is exactly what is needed.
The Numpy module is imported with:
import numpy
Although in the rest of this course, and in many packages, the following convention is used:
import numpy as np
This is because Numpy is so often used that it is shorter to type np
than numpy
.
The easiest way to create an array is from a Python list, using the array
function:
a = np.array([10, 20, 30, 40])
a
Numpy arrays have several attributes that contain useful information about them:
a.ndim # number of dimensions
a.shape # shape of the array
a.dtype # numerical type
Note: Numpy arrays actually support more than just one integer type and one floating point type - they support signed and unsigned 8-, 16-, 32-, and 64-bit integers, and 16-, 32-, and 64-bit floating point values.
There are several other ways to create arrays. For example, there is an arange
function that can be used similarly to the built-in Python range
function, with the exception that it can take floating-point input:
np.arange(10)
np.arange(3, 12, 2)
a = np.arange(1.2, 4.4, 0.1)
print(a)
Another useful function is linspace
, which can be used to create certain number of linearly spaced values within ranges (including the start and endpoint of the specified range):
np.linspace(10, 11, 11)
and a similar function can be used to create logarithmically spaced values between and including limits:
np.logspace(1., 4., 7)
Finally, the zeros
and ones
functions can be used to create arrays intially set to 0
and 1
respectively:
np.zeros(10)
np.ones(5)
Numpy arrays can be combined numerically using the standard +-*/**
operators. This is properly one of the most powerful and extremely useful features of Numpy.
x = np.array([1,2,3])
y = np.array([4,5,6])
x + 2 * y
x ** y
Note that this greatly differs from lists:
x = [1,2,3]
y = [4,5,6]
x + 2 * y
Create an array which contains 11 values logarithmically spaced between $10^{-20}$ and $10^{-10}$.
# your solution here
Create an array which contains the value 2 repeated 10 times (hint: there are two quick ways to do that - one is by multiplying a list, one by multiplying an array)
# your solution here
Try using np.empty(10)
and compare the results to np.zeros(10)
. Also compare to what the people next to you get for np.empty. What do you think is going on? You may also want to test different array sizes than 10.
# your solution here
print(np.empty(10))
print(np.zeros(10))
Create an array containing 5 times the value 0 as 32-bit floating point number (hint: have a look at the docs for np.zeros; the type you'd be looking for is called float32)
# your solution here
Similarly to lists, items in arrays can be accessed individually:
x = np.array([9,8,7])
x[0]
x[1]
and arrays can also be sliced by specifiying the first and last element of the slice and also the step size (where the last element is exclusive):
y = np.arange(10)
y[:5]
optionally specifying a step:
y[0:10:2]
As for lists, the start, end, and step are all optional, and default to 0
, len(array)
, and last
respectively:
y[::]
We can also reverse the order by using negative steps:
y[::-2]
Given an array x
with 10 elements, find the array dx
containing 9 values where dx[i] = x[i+1] - x[i]
. Do this without loops and think of how you could obtain x[i+1]
and what the final length of dx
is!
# your solution here
Numpy can be used for multi-dimensional arrays:
x = np.array([[1.,2.],[3.,4.]])
x.ndim
x.shape
y = np.ones([3,2,4]) # ones takes the shape of the array, not the values
y
y.shape
Using np.linalg.inv()
we can calculate the inverse of a square array, interpreted as a matrix:
a = np.array([[1,2],[3,4]])
np.linalg.inv(a)
Example: The solution of the system of equations
1*x + 2*y = 5
3*x + 4*y = 2
can be found using np.dot()
, which yields the matrix product of 2-D arrays
np.dot(np.linalg.inv(a), [5,2])
Multi-dimensional arrays can be sliced differently along different dimensions:
d = np.array([0,1,2,3,4,5])
print(d[::3])
z = np.ones([6,6,6]) # a 6x6x6 3D matrix
zz=z[::3, 1:4, :]
print (zz)
print (zz.shape)
In addition to an array class, Numpy contains a number of vectorized functions, which means functions that can act on all the elements of an array, typically much faster than what could be achieved by looping over the array.
For example:
theta = np.linspace(0., 2.*np.pi, 10)
theta
np.sin(theta)
Another useful package is the np.random
sub-package, which can be used to genenerate random numbers:
# uniform distribution between 0 and 1
np.random.random(10)
# 10 values from a gaussian distribution with mean 3 and sigma 1
np.random.normal(3., 1., 10)
Another very useful function in Numpy is np.loadtxt()
which makes it easy to read in data from column-based data. For example, consider the file data/autofahrt_2018.txt in the data directory.
We can either read it using a single multi-dimensional array:
data = np.loadtxt('data/autofahrt_2018.txt')
data
Or we can read the individual columns:
time, ax, ay, az = np.loadtxt('data/autofahrt_2018.txt', unpack=True)
time[:10]
ax[:10]
There are additional options to skip header rows, ignore comments, define delimiters, or read only certain columns. See the numpy.loadtxt documentation for more details.
The index notation [...]
is not limited to single element indexing, or multiple element slicing, but one can also pass a discrete list/array of indices:
x = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])
x[[1,2,4,3,3,2]]
which is returning a new array composed of elements 1, 2, 4, etc from the original array.
Alternatively, one can also pass a boolean array of True/False
values, called a mask, indicating which items to keep:
x[np.array([True, False, False, True, True, True, False, False, True, True, True, False, False, True])]
Now this doesn't look very useful because it is very verbose, but now consider that carrying out a comparison with the array will return such a boolean array:
x > 3.4
It is therefore possible to extract subsets from an array using the following simple notation:
x = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])
x[x > 3.4]
Conditions can be combined as usual:
x[(x<3.4) | (x>5.5)]
Note that we have to use the bitwise comparison here and not the Boolean comparison (|
vs. or
and &
vs. and
).
Of course, the boolean mask can be derived from a different array to x
as long as it has the same size:
x = np.linspace(-1., 1., 14)
y = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])
y[(x > -0.5) & (x < 0.4)]
Since the mask itself is an array, it can be stored in a variable and used as a mask for different arrays:
keep = (x > -0.5) & (x < 0.4)
x_new = x[keep]
y_new = y[keep]
x_new
y_new
A mask can also appear on the left hand side of an assignment:
y = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])
y[y > 5] = 0.
y
The file data/munich_temperatures_average_with_bad_data.txt provides the temperature in Munich every day for several years.
Read the file using np.loadtxt
. Note however, the file contains "bad" values, which you can identify by looking at the minimum and maximum temperatures. Use masking to get rid of the bad temperature values.
# your solution here