Boolean Masking¶

Boolean masking is a tool for creating subsets of NumPy arrays, or in other words, to filter arrays. Boolean masking is performed by providing an array of Boolean values to another array of the same size, as if it were an index. This returns a subset of elements of the outer array that correspond to True values within the Boolean array.

Let’s see an example.

boolArray = np.array([True, True, False, True, False])
myArray = np.array([1,2,3,4,5])

subArray = myArray[boolArray]
print(subArray)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-85a951a19f19> in <module>
----> 1 boolArray = np.array([True, True, False, True, False])
      2 myArray = np.array([1,2,3,4,5])
      3 
      4 subArray = myArray[boolArray]
      5 print(subArray)

NameError: name 'np' is not defined

Since array comparisons return Boolean arrays, we can use array comparisons inside of square braces to quickly filter an array based on some criteria.

cat = np.array(['A', 'C', 'A', 'B', 'B', 'C', 'A', 'A' ,
                'C', 'B', 'C', 'C', 'A', 'B', 'A', 'A'])

val = np.array([8, 1, 3, 6, 10, 6, 12, 4,
                6, 1, 4, 8,  5, 4, 12, 4])

The first line of the cell below selects and displays only the elements of val that are greater than 6. The second line selects and displays only the elements that are less than or equal to 6.

print(val[val > 6])
print(val[val <= 6])

We can perform Boolean making using array comparisons involving the modulus operator to select elements of a list that are odd or those that are even.

print(val[val % 2 == 0])
print(val[val % 2 != 0])

When we are working with parallel lists, we can use comparisons involving one list to select elements out of another list.

print(val[cat == 'A'])
print(val[cat == 'B'])
print(val[cat == 'C'])

In the cell below, we calculate the total value for objects in each of the three categories, A, B, and C.

print(np.sum(val[cat == 'A']))
print(np.sum(val[cat == 'B']))
print(np.sum(val[cat == 'C']))

Fancy Indexing¶

Python lists and numpy arrays both allow for basic indexing, as well as slicing. We have seen that we can also use Boolean arrays for selecting elements out of an array. Numpy provides us with one more tool for indexing that is not available in lists: fancy indexing. Fancy indexing refers to providing a list or array of indices to an array. This will return a subarray of elements associated with those indices, in the order determined by the indexing list/array.

my_array = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
print(my_array[[6, 3, 8]])