Additional NumPy Topics¶

np.where()¶

The numpy function np.where() is a useful tool for creating new arrays by applying logical rules to currently existing arrays. Assume that A, B, and C are arrays. The syntax for using np.where() is as follows:

D = np.where(condition involving elements of A, B, C)

In terms of the results, this code is equivalent to the following:

D = []
for i in range(0, len(A)):
    if condition is True for A[i]:
        D.append(B[i])
    else:
        D.append(C[i])
D = np.array(D)

Although the results are the same, the numpy version of this code will run significantly faster.

Example of np.where()¶

In the cell below, we provide a simple example of np.where(). In this example, we define three arrays, cond_array, arrayA, and arrayB. The elements of cond_array are all strings of the form 'A' or 'B'. The other two arrays contain integer values, with arrayA containing positive values and arrayB containing negative values. We will use the np.where() to create an array that selects elements out of either arrayA or arrayB, as indicated by the elements of cond_array.

cond_array = np.array(['A', 'B', 'B', 'A', 'B'])
arrayA = np.array([ 1,  2,  3,  4,  5])
arrayB = np.array([-1, -2, -3, -4, -5])

result = np.where(cond_array == 'A', arrayA, arrayB)
print(result)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-2cb1e6f9e7c4> in <module>
----> 1 cond_array = np.array(['A', 'B', 'B', 'A', 'B'])
      2 arrayA = np.array([ 1,  2,  3,  4,  5])
      3 arrayB = np.array([-1, -2, -3, -4, -5])
      4 
      5 result = np.where(cond_array == 'A', arrayA, arrayB)

NameError: name 'np' is not defined

Example: Calculating Likelihood¶

To see an example of using np.where(), assume that we have created a statistical model that estimates the probability that students will pass a certain profession exam based on several factors, such as the amount of time they spent studying, whether or not they attended a workshop, and so on. Assume that the model is applied to five students. In the cell below, we have two arrays. The array prob_of_passing tells us the probability of each student passing the exam, as determined by the model. The array result tells us whether or not the student actually passed.

prob_of_passing = np.array([0.3, 0.8, 0.6, 0.9, 0.1])
results = np.array(['F', 'P', 'F', 'P', 'F'])

We want to score how well the model did by calculating its likelihood score. This is equal to the probability that the students got exactly the results observed, according to the model. Since the model only directly estimates the probability of passing and some students failed, we will also need to find the probability of failing.

prob_of_failing = 1 - prob_of_passing
print(prob_of_failing)

We will now use np.where() to create an array that contains the model’s estimate of probability that each individual student got the outcome observed.

prob_observed = np.where(results == 'P', prob_of_passing, prob_of_failing)
print(prob_observed)

Finally, the model’s likelihood score is calculated by multiplying together all of the probabilities in prob_observed.

L = np.prod(prob_observed)
print(L)

Example: Assigning Grades¶

When using np.where(condition, B, C), we are allowed to use values with basic data types such (int, float, bool, str) instead of arrays for either or both of the values B and C. To see an example of this, assume that we have a list of exam scores for 10 students, and we want to create an array that contains strings Pass or Fail, to indicate whether each student passed or failed the exam. Assume that a grade of 70 or higher is required for passing.

scores = np.array([73, 92, 56, 61, 43, 87, 99, 75, 12, 94])
results = np.where(scores >= 70, 'Pass', 'Fail')
print(results)

A single np.where() statement behaves like an if-else statement inside of a loop. We can replicate the effects of an if-elif-else statment by nesting calls to np.where(). In the example below, we will assign letter grades to each of the students based on their exam scores.

grades = np.array(['A'] * len(scores))
grades = np.where(scores < 90, 'B', grades)
grades = np.where(scores < 80, 'C', grades)
grades = np.where(scores < 70, 'D', grades)
grades = np.where(scores < 60, 'F', grades)                 
print(grades)