Lesson 10 - NumPy¶

The following topics are discussed in this notebook:¶

Create NumPy arrays.
Array operations.
Boolean masking.

Additional Resources¶

DataCamp: Intro to Python for Data Science, Ch 4

Packages¶

A package is a pre-built set of functions and data types that can be loaded into a Python session to extend the language's functionality.

The following block of code imports the math package, which contains many useful mathematical functions and constants.

import math

The math packages contains functions the following functions (along with many others):

sqrt() which is used to calculate the square root of a number.
factorial() which is used to calculate the factorial of an integer.

It also contains an object named pi which contains the value of the constant pi.

To access any of these items within the math package, we much precede its name with math..

print(math.sqrt(20))
print(math.factorial(10))
print(math.pi)

4.47213595499958
3628800
3.141592653589793

When the name of a package is long, it can become tedius to type its entire name every time you wish to use a function from it. Fortunately, we are able to rename packages when we import them. The following code imports the math package under the name mt.

import math as mt

print(mt.sqrt(40))

6.324555320336759

NumPy¶

NumPy, which is short for "Numerical Python" is a package that provides additional functionality for performing numerical calculations involving lists. It can greatly simplify certain types of tasks relating to lists that would otherwise require loops. In the next cell, we will import NumPy under the name np.

import numpy as np

At the core of NumPy is a new data type called an array. Arrays are similar to lists, and in many ways, arrays and lists behave the same. In the following cell, we create a list and an array, each containing the same elements.

myList = [4, 1, 7, 3, 5]
myArray = np.array([4, 1, 7, 3, 5])

In the next few cells, we show that lists and arrays can behave in very similar ways.

print(myList[3])
print(myArray[3])

3
3

print(myList[:3])
print(myArray[:3])

[4, 1, 7]
[4 1 7]

print(len(myList))
print(len(myArray))

5
5

print(type(myList))
print(type(myArray))

<class 'list'>
<class 'numpy.ndarray'>

Array Operations¶

The difference between arrays and lists is that certain types of operations can be performed more easily on arrays than on lists. Assume that we would like to print out a list/array that contains 5 times the elements in our previously defined list/array.

print(5 * myArray)

[20  5 35 15 25]

print(5 * myList)

[4, 1, 7, 3, 5, 4, 1, 7, 3, 5, 4, 1, 7, 3, 5, 4, 1, 7, 3, 5, 4, 1, 7, 3, 5]

temp = []
for i in range(0, len(myList)):
    temp.append(5 * myList[i])
print(temp)

[20, 5, 35, 15, 25]

We can perform other types of operations on NumPy arrays:

print(myArray ** 2)

[16  1 49  9 25]

print(myArray +  100)

[104 101 107 103 105]

NumPy also includes a meaningful way to multiply two arrays, as long as they are of the same length.

array1 = np.array([2,1,4])
array2 = np.array([3,9,2])

print(array1 * array2)

[6 9 8]

array1 = np.array([2,1,4])
array2 = np.array([3,9,2, 7])

#print(array1 * array2) # This results in an error

� Exercise

Two lists, sales and prices are provided below. Each entry of sales provides the number of units of a different product sold by a store during a given week. The prices lists provides the unit price of each of the products.

Without using NumPy, write some code that will print out a single number totalSales that is equal to the store's total revenue during the week.

sales = [24, 61, 17, 34, 41, 29, 32, 43]
prices = [10.50, 5.76, 13.49, 8.13, 7.79, 12.60, 9.51, 11.34]

totalSales = 0
for i in range(0, len(sales)):
    totalSales += sales[i] * prices[i]
    
print(totalSales)

2585.84

The cell below convers the lists sales and price into arrays. Use NumPy to accomplish to calculate totalSales. See if you can do it with only one new line of code.

sales = np.array(sales)
prices = np.array(prices)

totalSales = sum(sales * prices)
print(totalSales)

2585.84

Boolean Masking¶

Boolean masking is a tool for creating subset of NumPy arrays. We will explain this concept in steps.

In the cell below, we create two NumPy arrays. The array bList contains boolean values, while the other, myArray, contains numerical values.

We will pass bList to myArray as if it were an index, and will store the result in subArray. Can you explain what is happening here? How was subArray produced?

bList = np.array([True, True, False, True, False])
myArray = np.array([1,2,3,4,5])

subArray = myArray[bList]
print(subArray)

[1 2 4]

Unlike lists, we can perform numerical comparisons with arrays. The comparison is carried out for each element of the array, and the result is an array of boolean values, containing the results of each comparison.

someArray = np.array([4, 7, 6, 3, 9, 8])
print(someArray < 5)

[ True False False  True False False]

print(someArray % 2 == 0)

[ True False  True False False  True]

We can combine the concept of array comparisons and passing boolean arrays to create subsets of arrays by picking out the elements that satisfy certain conditions. This process is called boolean masking.

print(someArray[someArray % 2 == 0])

[4 6 8]

print(someArray[someArray > 5])

[7 6 9 8]

� Exercise

A NumPy array called E is given below. Use boolean masking to create the following two variables.

negSum should contain the sum of the positive elements of E.
posSum should contain the sum of the negative elemetns of E.

Print both negSum and posSum.

E = np.array([-1.23, 3.13, 2.62, -2.56, 1.64, -1.43, -2.36, 2.41, 2.15, -1.26, 3.17])

negSum = sum(E[E < 0])
posSum = sum(E[E > 0])

print(negSum)
print(posSum)

-8.84
15.120000000000001

Using Boolean Masks to Count¶

Since Python treats True as being equal to 1 and False as being equal to 0, we can use the sum function along with Boolean masking to count the number of elements in an array that satisfy a certain critera.

cat = np.array(['A', 'C', 'A', 'B', 'B', 'C', 'A', 'A' ,'C', 'B', 'C', 'C', 'A', 'B', 'A', 'A'])

print(sum(cat == 'A'))
print(sum(cat == 'B'))
print(sum(cat == 'C'))

7
4
5

val = np.array([8,1,3,6,10,6,12,4,6,1,4,8,5,4,12,4])

print(sum(val > 5) )
print(sum(val < 5) )
print(sum(val % 2 == 0) )
print(sum(val % 2 != 0) )

8
7
12
4

We can use the & and | operators to combine two boolean arrays into a single boolean array.

& performs the and operation on the elements of the two arrays, one pair at a time.
| performs the or operation on the elements of the two arrays, one pair at a time.

b1 = np.array([True, True, False, False])
b2 = np.array([True, False, True, False])

print(b1 & b2)
print(b1 | b2)

[ True False False False]
[ True  True  True False]

We can use these operators to perform counts that depend on two (or more) conditions.

� Exercise

Use boolean masking to count the number of elements in val that are both even and greater than 5.

count = sum( (val > 5) & (val % 2 == 0) )
print(count)

8

� Exercise

Use boolean masking to count the number of elements in val that are even, divisible by 3, and greater than 7.

count = sum( (val > 7) & (val % 2 == 0) & (val % 3 == 0))
print(count)

� Exercise

Use boolean masking to count the number of elements in cat that are equal to A, and for which the associated element of val is greater than 5.

count = sum( (val > 5) & (cat == 'A') )
print(count)