Strings¶
A string value represents a piece of text, or in other words, a sequence of characters. Strings can be used for a variety of applications, including storing a message to be provided to an end-user or storing text-based data, such as a user’s name. In this chapter we will discuss how to create strings, how to extract information from a string value, and how to perform operations on strings.
Creating a String Value¶
We form a string value in Python by using either a pair of single quotation marks or a pair of double quotation marks to surround the characters that compose the string.
In the cell below, we define a string variable, with the value "Hello World!"
, and we then print the result.
my_string = 'Hello world!'
print(my_string)
Hello world!
We will now check the type of our variable my_string
.
print(type(my_string))
<class 'str'>
The len()
Function¶
Python provides several built-in functions for working with strings. The first such function we will discuss is len()
. The len()
function accepts a string value as an input and returns the number of characters within that string.
x = 'There are 39 characters in this string.'
print(len(x))
39
Single Quotes versus Double Quotes¶
As mentioned above, we can use either single quotes or double quotes when defining a string. Both techniques are illustrated in the cell below.
string1 = 'This is a valid string.'
string2 = "This is also a proper string."
When Python first encounters a quotation mark within an expression, it interpret subsequent characters as being part of a string. It will continue to do so until it encounters another quotation mark of the same type used to start the string.
One benefit of being able to use either single quotes or double quotes when creating strings is that it allows us a convenient way to create strings that themselves contain quotation marks or apostrophes as characters. For instance, assume that we want to create a variable containing the following string:
He yelled, "I have had enough!" before storming out of the room.
Since this string contains double quotes within it, we will run into difficulties if we attempt to define it using double quotes. This is demonstrated below.
sentence = "He yelled, "I have had enough!" before storming out of the room."
File "<ipython-input-5-4965ff137a3f>", line 1
sentence = "He yelled, "I have had enough!" before storming out of the room."
^
SyntaxError: invalid syntax
The cell below resulted in a syntax error. Python got confused by the quotation marks in the middle of the string. When it encountered the second quotation mark, it believed that this was the end of the string, although that character was intended to be part of the string.
There are a few ways to fix this. The simplest is to use single quotes to define the string. When Python encounters the first single quote, it knows that a string is being defined. It won’t stop reading characters into the string until it hits another single quote. Any double quotes that it encounters along the way will be treated as inert characters.
sentence = 'He yelled, "I have had enough!" before storming out of the room.'
print(sentence)
He yelled, "I have had enough!" before storming out of the room.
Escape Sequences¶
An escape sequence is a sequence of characters to which Python applies special meaning when they are encountered in a string. Several common escape sequences are listed below.
Escape Sequence |
Result |
---|---|
|
Inserts a single quote. |
|
Inserts a double quote. |
|
Inserts a newline. |
|
Inserts a tab. |
|
Inserts a backslash. |
Quotation Marks
For an example use case for escape sequences, assume that we want to define a variable containing the following string of characters:
He yelled, "I've had enough!" before storming out of the room.
As discussed previously, the presence of double quotes within the string prevents us from easily using double quotes to define the string. Furthermore, since the character for the apostrophe is the same as the character for a single quote, we are also unable to define our string by simply surrounding the text above with single quotes. One solution is to use an escape sequence for the apostrophe so that Python knows to interpret it as text.
sentence = 'He yelled, "I\'ve had enough!" before storming out of the room.'
print(sentence)
He yelled, "I've had enough!" before storming out of the room.
We could have escaped the double quotes within the string as well as the apostrophe. In this case, we could have used either single or double quotes to define our string.
New Lines
Next, we will demonstrate how to use the \n
escape sequence to insert a newline into a string.
tale2cities = "It was the best of times.\nIt was the worst of times."
print(tale2cities)
It was the best of times.
It was the worst of times.
Notice that we did not include a space after the \n
escape sequence in the cell above. Had we done so, Python would have printed the space at the beginning of the second line. This would have caused our two lines of text to not be aligned on the left.
Tabs
We can use \t
to insert tabs in our string. This can be used for indenting lines, or for aligning output.
print("Regular.")
print("\tIndented.")
print("\t\tDouble indented.")
Regular.
Indented.
Double indented.
The tab escape sequence can be used to align portions of multi-line output. The following example shows how we might use tabs to align columns in the printout of an employee database.
print('ID\tEmployee Name\tSalary')
print('-------------------------------')
print('107\tJane Doe\t$54,000')
print('139\tJohn Smith\t$48,300')
print('162\tPat Jones\t$52,500')
ID Employee Name Salary
-------------------------------
107 Jane Doe $54,000
139 John Smith $48,300
162 Pat Jones $52,500
Empty Strings¶
It is possible to define a string that contains no characters. Such a string is referred to as an empty string. We can define an empty string by placing two single or double quotes next to each other without any characters between them. We see an example of this in the next cell.
empty_string = ""
print(empty_string)
We will see a practical use of empty strings later in this lesson.
The len()
Function¶
Python provides several built-in functions for working with strings. The first such function we will discuss is len()
. The len()
function allows you to determine the length of a string.
x = "There are 39 characters in this string."
print(len(x))
As you might expect, an empty string has a length of zero.
print(len(empty_string))
String Operations¶
When appearing between numbers, the symbols +
, -
, *
, /
, and **
perform the relevant arithmetic operations. However, these symbols can sometimes be used to combine instance of data types. We will see examples of this as we introduce new data types. The only one of these symbols that can be used between two strings is the +
symbol.
When +
is used between two strings, it combines, or concatenates the strings. The string that appears on the left side of +
will come first, and the string on the right side will be appended to the end.
a = 'star'
b = 'wars'
c = a + b
print(c)
We can use +
to combine several strings at once. It is not necessary for all of the string values to be stored in variables. We see this in the next example, which places a space between the words “star” and “wars”.
d = a + ' ' + b
print(d)
Operations Involving Strings and Numbers¶
If we try to combine a string and a number with +, we will get an error.
print("one" + 2)
Note that numbers enclosed with quotes are also considered strings. Python does not recongnize them as numbers.
print("1" + 2)
Although we are not able to “add” strings to numbers, we are able to “multiply” a string by a number. The result will be a string that has concatenated with itself the specified number of times.
print("blah " * 5)
Since the product of a string and an integer produces another string, expressions of this type can be concatenated together.
print("la " * 4 + "doo " * 3)
Type Coercion with Strings¶
We will now explore under what situations we are able to convert between str
objects and int
or float
objects.
We can convert a str
object to an int
or a float
if the value contained within the string makes sense as the new data type.
a_str = '61'
a_int = int(a_str)
a_float = float(a_str)
print(a_int)
print(a_float)
b_str = '7.93'
b_float = float(b_str)
print(b_float)
Since the value of b_str
is not interpretable as an integer, we will get an error if we attempt to coerce it to an integer.
b_int = int(b_str)
If we are very insistent about coercing b_int
to an integer, we can first coerce it into a float, and then an integer.
b_int = int(float(b_str))
print(b_int)
We can always convert an int
or a float
object to a str
using the str()
function.
x_float = 4.5
x_str = str(x_float)
print(x_str)
y_int = 8675409
y_str = str(y_int)
print(y_str)
Converting numerical values to strings can be very useful if we want to output a message that contains a mixture of predetermined text, as well as numeric values that are stored in variables. Converting the numeric portions of the message to strings allowed them to be concatenated with the rest of the text.
Consider the following example.
z = 3.56
z2 = z**2
print('The square of ' + str(z) + ' is ' + str(z2) + '.')
We could have obtained the same result as above without using coercion by passing multiple arguments to the print()
function and setting the sep
parameter equal to the empty string, as shown below.
print('The square of ', z, ' is ', z2, '.', sep='')
Methods¶
The majority of the functions we will encounter when working with strings are methods. The difference between a method and other types of functions we will encounter is subtle, and will be discussed in greater detail later in the course. For now, we simply note the following points regarding methods:
A method is a function that belongs to a specific object (such as an
int
,float
, orstr
).To use a method on an object, you write the name of the object, followed by a dot, followed by the name of the method, followed by a set of parentheses.
In the following example, we consider three string methods:
upper()
converts the string to uppercase.lower()
converts the string to lowercase.title()
capitalizes the first letter of each word in the string.
Note that none of these methods actually change the contents of the string. They instead provide a new string as their output.
myString = "There's a method in the madness."
print(myString.upper())
print(myString.lower())
print(myString.title())
Some methods accept inputs (also called arguments). One example is the count()
method. This method searches the string to see how many times the supplied input (also a string) appears within the original string. This is demonstrated below.
print( myString.count("m") )
print( myString.count("e") )
The replace()
method accepts two arguments. This method scans the source string, and replaces all occurences of the first argument with the second argument. Again, it does not actually change the contents of the original string. It instead returns a new string as output.
a = "a "
b = "no "
print( myString.replace(a, b) )
f-Strings¶
Beginning with Version 3.6, Python has come equipped with a powerful tool for creating and formatting strings known as f-strings. We can define an f-string by simply placing an f
character in front of the string, immediately before the initial quote.
boring_fstring = f'This is an example of an f-string.'
print(boring_fstring)
The example above illustrates the basic syntax for creating an f-string, but it is not particularly useful. We could have obtained the same result using an ordinary string. The benefits of f-strings arise from situation in which we would like to define a string that incorporates values stored in variables. If we place a set of brackets containing a variable name within an f-string, then the value of that variable will be inserted into the string at that location. This is illustrated in the next example.
first = 'Robbie'
last = 'Beane'
name_message = f'Hello. My first name is {first} and my last name is {last}.'
print(name_message)
In the example above, the values we inserted into the f-string were both strings themselves. However, we can place variables containing numerical values inside of the brackets as well. When doing so, it is NOT necessary to coerce the variable into a string.
We are not restricted to using names of variables inside of brackets in an f-string. We can, in fact, place any expression that we would like inside of the brackets.
These concepts are illustrated in the next example.
z = 3.56
print(f'The square of {z} is {z**2}.')
Aligning Text with f-Strings¶
Occasionally, we would like to insert a value into a string with additional spaces padding the value on the left or the right so that the printed value and the additional spaces together constitute a specific number of characters. We can accomplish this using f-strings by following the expression within the braces with a colon :
, one of the symbols <
or >
, and then an integer value. The integer indicates the number of characters that should be set aside for the inserted value, and the selected arrow symbol controls whether the value should be left-justified (<
) or right-justified (>
).
We illustrated this in the cell below.
my_text = 'text'
print(f'--{my_text:>6}--')
print(f'--{my_text:<6}--')
This feature of f-strings is particularly useful when we would like to display output in a tabular format, consisting of multiple rows and columns. We can use f-strings to make sure that each entry in a column takes the same amount of space, and is aligned on either the left or the right. This is illustrated below.
num1 = 1
fname1 = 'George'
lname1 = 'Washington'
num2 = 2
fname2 = 'John'
lname2 = 'Adams'
num3 = 16
fname3 = 'Abraham'
lname3 = 'Lincoln'
print('First Name Last Name Number')
print('--------------------------------------')
print(f'{fname1:<15}{lname1:<15}{num1:>6}')
print(f'{fname2:<15}{lname2:<15}{num2:>6}')
print(f'{fname3:<15}{lname3:<15}{num3:>6}')
String Indexing and Slicing¶
substrings