Arrays¶

The primary tool provided by NumPy is the n-dimensional array data type. An array is essentially a collection of values arranged in some structured manner.

1-Dimensional Arrays¶

We will begin our discussion of arrays with simple 1-dimensional arrays. In many ways, an array is similar to a standard Python list. In fact, a NumPy array can be created from a Python list by using the np.array() function.

We demonstrate this in the next cell.

import numpy as np

my_list = [4, 1, 7, 3, 5]
my_array = np.array([4, 1, 7, 3, 5])

Let’s check the data type of the two objects we have created.

print('The type for my_list is: ', type(my_list))
print('The type for my_array is:', type(my_array))

The type for my_list is:  <class 'list'>
The type for my_array is: <class 'numpy.ndarray'>

Notice that the data type for my_array is ndarray, which is short for “n-dimensional array”.

Let’s print both of these objects and compare the results.

print(my_list)
print(my_array)

[4, 1, 7, 3, 5]
[4 1 7 3 5]

We see that the output is essentially the same. The only (superficial) difference is that NumPy arrays are displayed without commas separating the elements.

Array Indexing and Slicing¶

We can access an element of an array using an index in exactly the same way we would with a list.

print(my_list[2])
print(my_array[2])

7
7

Arrays also support slicing, just like lists.

print(my_list[:3])
print(my_array[:3])

[4, 1, 7]
[4 1 7]

Most (but not all) Python functions that accept lists as inputs will also work on arrays. This is demonstrated below using the len() function.

print(len(my_list))
print(len(my_array))

5
5

We can also pass arrays to the sum() function (although we will see later that numpy provides is own version of sum() that is optimized for arrays).

print(sum(my_list))
print(sum(my_array))

20
20

Array Operations¶

We have demonstrated many similarities between lists and arrays. Both are 1-dimensional (linear), ordered collections of values that can be accessed through the use of an index. However, there are some important differences between arrays and lists.

The most important difference between arrays and lists and that certain types of operations can be performed more easily on arrays than on lists. NumPy arrays are designed to easily support vectorized or elementwise operations.

As an example, consider the problem of multiplying each element of a list by 5 to create a new list. Perhaps the most obvious way to perform this operation would be to use a loop as shown in the next cell.

new_list = []
for item in my_list:
    new_list.append(5 * item)
print(new_list)

[20, 5, 35, 15, 25]

We could simplify this task to one line of code using a list comprension as follows:

new_list = [5*x for x in my_list]

While the list comprehension solution is more concise, is suffers a bit in terms of readability. This task is quite simple to accomplish with arrays, however. We simply multiply myArray by 5.

new_array = 5 * my_array
print(new_array)

[20  5 35 15 25]

Recall that we can also multiply lists by integers. However, this operation replicates the list rather than performing the multiplication operation elementwise.

print(5 * my_list)

[4, 1, 7, 3, 5, 4, 1, 7, 3, 5, 4, 1, 7, 3, 5, 4, 1, 7, 3, 5, 4, 1, 7, 3, 5]

We can perform other types of arithmetic operations on NumPy arrays. In each case, the specified operation is applied to each individual element of the array.

print(my_array ** 2)

[16  1 49  9 25]

print(my_array +  100)

[104 101 107 103 105]

Operations Involving Two Arrays¶

NumPy also includes a meaningful way to add, subtract, multiply, and divide arrays, as long as they are of the same length.

array1 = np.array([1,4,3])
array2 = np.array([5,8,2])

print('Sum:        ', array1 + array2)
print('Difference: ', array1 - array2)
print('Product:    ', array1 * array2)
print('Ratio:      ', array1 / array2)

Sum:         [ 6 12  5]
Difference:  [-4 -4  1]
Product:     [ 5 32  6]
Ratio:       [0.2 0.5 1.5]

If we attempt to perform an arithmetic operation on two lists of different sizes, this will produce an error (except in certain, very specific cases that we will discuss later).

array1 = np.array([2, 1, 4])
array2 = np.array([3, 9, 2, 7])

print(array1 + array2) # This results in an error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-1609c84544c8> in <module>
      2 array2 = np.array([3, 9, 2, 7])
      3 
----> 4 print(array1 + array2) # This results in an error

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

Data Types of Array Elements¶

Arrays can contain elements of any data type, but unlike lists, each element within an array must be of the same data type.

Note

Numpy does provide a data type called a structured array that can contain a mix of data types, but we will not discuss those in this lesson.

If we try to change the values of an array element to a value of the wrong data type, the value will be coerced to the correct data type, assuming that such a coercion is possible.

In the cell below, we attmpt to place a floating point value into an integer array. Notice that the float gets coerced into an integer.

int_array = np.array([8, 4, 5, 2, 4, 6, 3])
print(int_array)

int_array[2] = 7.9
print(int_array)

[8 4 5 2 4 6 3]
[8 4 7 2 4 6 3]

If a type coercion is not possible, then an error will be produced. For example, an attempt to change one of the values of int_array to a string value would result in an error.

Occasionally, there is a need to convert all elements of an array to a different data type. This can be accomplished using the astype() method. This is demonstrated in the cell below, where we use astype() to create an array of floats from int_array.

float_array = int_array.astype('float')
print(float_array)

float_array[2] = 7.9
print(float_array)

[8. 4. 5. 2. 4. 6. 3.]
[8.  4.  7.9 2.  4.  6.  3. ]

Functions for Creating Special Arrays¶

We can use the functions np.zeros(), np.ones(), np.arange(), and np.linspace() to create arrays with specific structures. We demonstrate the use of these functions below.

The function np.zeros() creates an array consisting of only zeros.

array0 = np.zeros(10)
print(array0)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

The function np.ones() creates an array consisting of only ones.

array1 = np.ones(10)
print(array1)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

The function np.arange() creates a sequence of evenly spaced elements. We specify where the sequence should start, where it should stop, and the difference between consecutive elements. Notice that the value provided to the stop parameter is not included within the returned array. This behavior is similar to that seen with the range() function.

array2 = np.arange(start=2, stop=4, step=0.25)
print(array2)

[2.   2.25 2.5  2.75 3.   3.25 3.5  3.75]

Like np.arange(), the function np.linspace() also creates a sequence of evenly spaced elements. Instead of specifying the step size, we provide np.linspace() with the number of elements to be created. Unlike np.arange(), the stop value for np.linspace() is actually included in the returned array.

array3 = np.linspace(start=2, stop=4, num=11)
print(array3)

[2.  2.2 2.4 2.6 2.8 3.  3.2 3.4 3.6 3.8 4. ]

Python for Data Science