Lesson 05 - Strings

The following topics are discussed in this notebook:

  • An introduction to the string data type.
  • Escape sequences.
  • Operations on strings.
  • String functions.

Introduction to Strings

A string object is a piece of text, or in other words, a sequence of characters. When defining a string, we must put the characters that compose it inside either single or double quotes. These quotes are what allows Python to distinguish between a string and a command.

In the cell below, we define a string variable, with the value "Hello World!", and we then print the result.

In [1]:
my_string = "Hello world!"
print(my_string)
Hello world!

The official name of the Python string type is str. We confirm this be calling type() on my_string.

In [2]:
type(my_string)
Out[2]:
str

Empty Strings

It is possible to define a string that contains no characters. Such a string is referred to as an empty string. We can define an empty string by placing two single or double quotes next to each other without any characters between them. We see an example of this in the next cell.

In [3]:
empty_string = ""
print(empty_string)

We will see a practical use of empty strings later in this lesson.

Single Quotes versus Double Quotes

As mentioned above, we can also use single quotes when defining a string. The following definition of my_string is equivalent to the previous definition.

In [4]:
my_string = 'Hello world!'
print(my_string)
Hello world!

One benefit of being able to use either single quotes or double quotes when creating strings is that it allows us a convenient way to create strings that themselves contain quotes as characters. For instance, assume that we want to create a variable containing the following string:

He yelled, "I have had enough!" before storming out of the room.

The following attempt at creating this string will give us an error.

In [5]:
sentence = "He yelled, "I have had enough!" before storming out of the room."
  File "<ipython-input-5-4965ff137a3f>", line 1
    sentence = "He yelled, "I have had enough!" before storming out of the room."
                            ^
SyntaxError: invalid syntax

In the example above, Python got confused by the quotation marks in the middle of the string. When it encountered the second quotation mark, it believed that this was the end of the string, although this was intended to be part of the string.

There are a few ways to fix this. The simplest is to use single quotes to define the string. When Python encounters the first single quote, it knows that a string is being defined. It won't stop reading characters into the string until it hits another single quote. Any double quotes that it encounters along the way will be treated as inert characters.

In [6]:
sentence = 'He yelled, "I have had enough!" before storming out of the room.'
print(sentence)
He yelled, "I have had enough!" before storming out of the room.

Escape Sequences

An escape sequence is a sequence of characters that Python applies special meaning to when they are encountered in a string. Several common escape sequences are listed below.

Escape Sequence Result
\' Prints a single quote.
\" Prints a double quote.
\n Inserts a newline.
\t Inserts a tab.
\\ Inserts a backslash.

For an example use case for escape sequences, assume that we want to define a variable containing the following string of characters:

He yelled, "I've had enough!" before storming out of the room.

As before, the presence of double quotes within the string prohibit us from being able to use double quotes to define the string. Furthermore, since the character for the apostrophe is the same as the character for a single quote, we are now no longer able to use single quotes to define our string. One solution is to use an escape sequence for the apostrophe so that Python knows to interpret it as text.

In [7]:
sentence = 'He yelled, "I\'ve had enough!" before storming out of the room.'
print(sentence)
He yelled, "I've had enough!" before storming out of the room.

Had we wished, we could have escaped the double quotes within the string as well as the apostrophe. In this case, we could have used either single or double quotes to define our string.

We can use the \n escape sequence to insert a newline inside of a string.

In [8]:
tale2cities = "It was the best of times.\nIt was the worst of times."
print(tale2cities)
It was the best of times.
It was the worst of times.

We can use \t to insert tabs in our string. This can be used for indenting lines, or for aligning output.

In [9]:
print("Regular.")
print("\tIndented.")
print("\t\tDouble indented.")
Regular.
	Indented.
		Double indented.

The tab escape sequence can be used to align portions of multi-line output. The following example shows how we might use tabs to align columns in the printout of an employee database.

In [10]:
print('ID\tEmployee Name\tSalary')
print('-------------------------------')
print('107\tJane Doe\t$54,000')
print('139\tJohn Smith\t$48,300')
print('162\tPat Jones\t$52,500')
ID	Employee Name	Salary
-------------------------------
107	Jane Doe	$54,000
139	John Smith	$48,300
162	Pat Jones	$52,500

Operations Involving Strings

When appearing between numbers, the symbols +, -, *, /, and ** perform the relevant arithmetic operations. However, these symbols can sometimes be used to combine instance of data types. We will see examples of this as we introduce new data types. The only one of these symbols that can be used between two strings is the + symbol.

When + is used between two strings, it combines, or concatenates the strings. The string that appears on the left side of + will come first, and the string on the right side will be appended to the end.

In [11]:
a = 'star'
b = 'wars'
c = a + b
print(c)
starwars

We can use + to combine several strings at once. It is not necessary for all of the string values to be stored in variables. We see this in the next example, which places a space between the words "star" and "wars".

In [12]:
d = a + ' ' + b
print(d)
star wars

Operations Involving Strings and Numbers

If we try to combine a string and a number with +, we will get an error.

In [13]:
print("one" + 2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-e8bf6f236ab3> in <module>
----> 1 print("one" + 2)

TypeError: can only concatenate str (not "int") to str

Note that numbers enclosed with quotes are also considered strings. Python does not recongnize them as numbers.

In [14]:
print("1" + 2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-2c8b8ec3d2f2> in <module>
----> 1 print("1" + 2)

TypeError: can only concatenate str (not "int") to str

Although we are not able to "add" strings to numbers, we are able to "multiply" a string by a number. The result will be a string that has concatenated with itself the specified number of times.

In [15]:
print("blah " * 5)
blah blah blah blah blah 

Since the product of a string and an integer produces another string, expressions of this type can be concatenated together.

In [16]:
print("la " * 4 + "doo " * 3)
la la la la doo doo doo 

Type Coercion with Strings

We will now explore under what situations we are able to convert between str objects and int or float objects.

We can convert a str object to an int or a float if the value contained within the string makes sense as the new data type.

In [17]:
a_str = '61'
a_int = int(a_str)
a_float = float(a_str)
print(a_int)
print(a_float)
61
61.0
In [18]:
b_str = '7.93'
b_float = float(b_str)
print(b_float)
7.93

Since the value of b_str is not interpretable as an integer, we will get an error if we attempt to coerce it to an integer.

In [19]:
b_int = int(b_str)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-c64df22698bc> in <module>
----> 1 b_int = int(b_str)

ValueError: invalid literal for int() with base 10: '7.93'

If we are very insistent about coercing b_int to an integer, we can first coerce it into a float, and then an integer.

In [20]:
b_int = int(float(b_str))
print(b_int)
7

We can always convert an int or a float object to a str using the str() function.

In [21]:
x_float = 4.5
x_str = str(x_float)
print(x_str)
4.5
In [22]:
y_int = 8675409
y_str = str(y_int)
print(y_str)
8675409

Converting numerical values to strings can be very useful if we want to output a message that contains a mixture of predetermined text, as well as numeric values that are stored in variables. Converting the numeric portions of the message to strings allowed them to be concatenated with the rest of the text.

Consider the following example.

In [23]:
z = 3.56
z2 = z**2

print('The square of ' + str(z) + ' is ' + str(z2) + '.')
The square of 3.56 is 12.6736.

We could have obtained the same result as above without using coercion by passing multiple arguments to the print() function and setting the sep parameter equal to the empty string, as shown below.

In [24]:
print('The square of ', z, ' is ', z2, '.', sep='')
The square of 3.56 is 12.6736.

The len() Function

Python provides several built-in functions for working with strings. The first such function we will discuss is len(). The len() function allows you to determine the length of a string.

In [25]:
x = "There are 39 characters in this string."
print(len(x))
39

As you might expect, an empty string has a length of zero.

In [26]:
print(len(empty_string))
0

Methods

The majority of the functions we will encounter when working with strings are methods. The difference between a method and other types of functions we will encounter is subtle, and will be discussed in greater detail later in the course. For now, we simply note the following points regarding methods:

  1. A method is a function that belongs to a specific object (such as an int, float, or str).
  2. To use a method on an object, you write the name of the object, followed by a dot, followed by the name of the method, followed by a set of parentheses.

In the following example, we consider three string methods:

  • upper() converts the string to uppercase.
  • lower() converts the string to lowercase.
  • title() capitalizes the first letter of each word in the string.

Note that none of these methods actually change the contents of the string. They instead provide a new string as their output.

In [27]:
myString = "There's a method in the madness."
print(myString.upper())
print(myString.lower())
print(myString.title())
THERE'S A METHOD IN THE MADNESS.
there's a method in the madness.
There'S A Method In The Madness.

Some methods accept inputs (also called arguments). One example is the count() method. This method searches the string to see how many times the supplied input (also a string) appears within the original string. This is demonstrated below.

In [28]:
print( myString.count("m") )
print( myString.count("e") )
2
5

The replace() method accepts two arguments. This method scans the source string, and replaces all occurences of the first argument with the second argument. Again, it does not actually change the contents of the original string. It instead returns a new string as output.

In [29]:
a = "a "
b =  "no "
print( myString.replace(a, b) )
There's no method in the madness.

f-Strings

Beginning with Version 3.6, Python has come equipped with a powerful tool for creating and formatting strings known as f-strings. We can define an f-string by simply placing an f character in front of the string, immediately before the initial quote.

In [30]:
boring_fstring = f'This is an example of an f-string.'
print(boring_fstring)
This is an example of an f-string.

The example above illustrates the basic syntax for creating an f-string, but it is not particularly useful. We could have obtained the same result using an ordinary string. The benefits of f-strings arise from situation in which we would like to define a string that incorporates values stored in variables. If we place a set of brackets containing a variable name within an f-string, then the value of that variable will be inserted into the string at that location. This is illustrated in the next example.

In [31]:
first = 'Robbie'
last = 'Beane'

name_message = f'Hello. My first name is {first} and my last name is {last}.'
print(name_message)
Hello. My first name is Robbie and my last name is Beane.

In the example above, the values we inserted into the f-string were both strings themselves. However, we can place variables containing numerical values inside of the brackets as well. When doing so, it is NOT necessary to coerce the variable into a string.

We are not restricted to using names of variables inside of brackets in an f-string. We can, in fact, place any expression that we would like inside of the brackets.

These concepts are illustrated in the next example.

In [35]:
z = 3.56
print(f'The square of {z} is {z**2}.')
The square of 3.56 is 12.6736.

Aligning Text with f-Strings

Occasionally, we would like to insert a value into a string with additional spaces padding the value on the left or the right so that the printed value and the additional spaces together constitute a specific number of characters. We can accomplish this using f-strings by following the expression within the braces with a colon :, one of the symbols < or >, and then an integer value. The integer indicates the number of characters that should be set aside for the inserted value, and the selected arrow symbol controls whether the value should be left-justified (<) or right-justified (>).

We illustrated this in the cell below.

In [33]:
my_text = 'text'

print(f'--{my_text:>6}--')
print(f'--{my_text:<6}--')
--  text--
--text  --

This feature of f-strings is particularly useful when we would like to display output in a tabular format, consisting of multiple rows and columns. We can use f-strings to make sure that each entry in a column takes the same amount of space, and is aligned on either the left or the right. This is illustrated below.

In [34]:
num1 = 1
fname1 = 'George'
lname1 = 'Washington'

num2 = 2
fname2 = 'John'
lname2 = 'Adams'

num3 = 16
fname3 = 'Abraham'
lname3 = 'Lincoln'


print('First Name     Last Name      Number')
print('--------------------------------------')
print(f'{fname1:<15}{lname1:<15}{num1:>6}')
print(f'{fname2:<15}{lname2:<15}{num2:>6}')
print(f'{fname3:<15}{lname3:<15}{num3:>6}')
First Name     Last Name      Number
--------------------------------------
George         Washington          1
John           Adams               2
Abraham        Lincoln            16