String methods¶
Most of the Python string methods are integrated
in the str type so that all str
objects
automatically have them:
>>> welcome = "hello pythonistas!\n"
>>> welcome.isupper()
False
>>> welcome.isalpha()
False
>>> welcome[0:5].isalpha()
True
>>> welcome.capitalize()
'Hello pythonistas!\n'
>>> welcome.title()
'Hello Pythonistas!\n'
>>> welcome.strip()
'Hello pythonistas!'
>>> welcome.split(" ")
['hello', 'pythonistas!\n']
>>> chunks = [snippet.strip() for snippet in welcome.split(" ")]
>>> chunks
['hello', 'pythonistas!']
>>> " ".join(chunks)
'hello pythonistas!'
>>> welcome.replace("\n", "")
'hello pythonistas!'
Below you will find an overview of the most common string methods:
Method |
Description |
---|---|
returns the number of non-overlapping occurrences of the string. |
|
returns |
|
returns |
|
uses the string as a delimiter for concatenating a sequence of other strings. |
|
returns the position of the first character in the string if
it was found in the string; triggers a |
|
returns the position of the first character of the first
occurrence of the substring in the string; like |
|
Returns the position of the first character of the last
occurrence of the substring in the string; returns |
|
replaces occurrences of a string with another string. |
|
strip spaces, including line breaks. |
|
splits a string into a list of substrings using the passed separator. |
|
converts alphabetic characters to lower case. |
|
converts alphabetic characters to upper case. |
|
converts characters to lower case and converts all region-specific variable character combinations to a common comparable form. |
|
left-aligned or right-aligned; fills the opposite side of the string with spaces (or another filler character) in order to obtain a character string with a minimum width. |
|
In Python 3.9 this can be used to extract the suffix or file name. |
str.split
and str.join
¶
While str.split()
returns a list of strings,
str.join()
takes a list of strings and joins them into a single
string. Normally str.split()
uses whitespace as a delimiter for
the strings to be split, but you can change this behaviour with an optional
parameter.
Warning
Concatenating strings with +
is useful but not efficient when it comes to
joining a large number of strings into a single string, as a new string
object is created each time +
is applied. "Hello" +
"Pythonistas!"
creates two objects, of which one is immediately discarded.
If you join strings with str.join()
, you can insert any characters
between the strings:
>>> " :: ".join(["License", "OSI Approved"])
'License :: OSI Approved'
You can also use an empty string, ""
, for example for the CamelCase
notation of Python classes:
>>> "".join(["My", "Class"])
'MyClass'
str.split()
is mostly used to split strings at spaces. However,
you can also split a string at a specific other string by passing an optional
parameter:
>>> example = "1. You can have\n\twhitespaces, newlines\n and tabs mixed in\n\tthe string."
>>> example.split()
['1.', 'You', 'can', 'have', 'whitespaces,', 'newlines', 'and', 'tabs', 'mixed', 'in', 'the', 'string.']
>>> license = "License :: OSI Approved"
>>> license.split(" :: ")
['License', 'OSI Approved']
Sometimes it is useful to allow the last field in a string to contain arbitrary text. You can do this by specifying an optional second parameter for how many splits should be performed:
>>> example.split(" ", 1)
['1.', 'You can have\n\twhitespaces, newlines\n and tabs mixed in\n\tthe string.']
If you want to use str.split()
with the optional second argument,
you must first specify a first argument. To ensure that all spaces are split,
use None as the first argument:
>>> example.split(None, 8)
['1.', 'You', 'can', 'have', 'whitespaces,', 'newlines', 'and', 'tabs', 'mixed in\n\tthe string.']
Tip
I use str.split()
and str.join()
extensively,
mostly for text files generated by other programmes. For writing
CSV or
JSON
files, however, I usually use the associated Python libraries.
Remove whitespace¶
str.strip()
returns a new string that differs from the original string
only in that all spaces at the beginning or end of the string have been removed.
str.lstrip()
and str.rstrip()
work similarly, but only remove
the spaces at the left or right end of the original string:
>>> example = " whitespaces, newlines \n\tand tabs. \n"
>>> example.strip()
'whitespaces, newlines \n\tand tabs.'
>>> example.lstrip()
'whitespaces, newlines \n\tand tabs. \n'
>>> example.rstrip()
' whitespaces, newlines \n\tand tabs.'
In this example, the newlines \n
are regarded as whitespace. The exact
assignment may differ from operating system to operating system. You can find
out what Python considers to be whitespace by accessing the constant
string.whitespace
. For me, the following is returned:
>>> import string
>>> string.whitespace
' \t\n\r\x0b\x0c'
The characters specified in hexadecimal format (\x0b
, \x0c
) represent
the vertical tab and feed characters.
Tip
Do not change the value of these variables to influence the functionality of
str.strip()
etc. You can pass characters as
additional parameters to determine which
characters these methods remove:
>>> url = "https://www.cusy.io/"
>>> url.strip("htps:/w.")
'cusy.io'
Search in strings¶
str offer several methods for a simple search for
character strings: The four basic methods for searching strings are
str.find()
, str.rfind()
, str.index()
and
str.rindex()
. A related method, str.count()
, counts how many
times a string can be found in another string.
str.find()
requires a single
parameter: the substring being searched for;
the position of the first occurrence is then returned, or -1
if there is no
occurrence:
>>> hipy = "Hello Pythonistas!\n"
>>> hipy.find("\n")
18
str.find()
can also accept one or two additional parameters:
start
The number of characters at the beginning of the string to be searched that should be ignored.
end
The Number of characters at the end of the string to be searched that should be ignored.
In contrast to find()
, rfind()
starts the search at the end of
the string and therefore returns the position of the last occurrence.
index()
and rindex()
differ from find()
and
rfind()
in that a ValueError
exception is triggered
instead of the return value -1
.
You can use two other string methods to search
for strings: str.startswith()
and str.endswith()
. These
methods return True
- or False
, depending on whether the string to which
they are applied starts or ends with one of the strings specified as
parameters:
>>> hipy.endswith("\n")
True
>>> hipy.endswith(("\n", "\r"))
True
There are also several methods that can be used to check the property of a character string:
Method |
|
|
|
|
|
---|---|---|---|---|---|
✅ |
✅ |
✅ |
✅ |
✅ |
|
❌ |
✅ |
✅ |
✅ |
✅ |
|
❌ |
❌ |
✅ |
✅ |
✅ |
|
❌ |
❌ |
❌ |
✅ |
✅ |
|
❌ |
❌ |
❌ |
❌ |
✅ |
str.isspace()
checks for spaces.
Changing strings¶
str are immutable, but they have several methods that can return a modified version of the original string.
str.replace()
can be used to replace occurrences of the firstparameter with the second, for example:
>>> hipy.replace("\n", "\n\r")
'Hello Pythonistas!\n\r'
str.maketrans()
and str.translate()
can be used together to
translate characters in strings into other characters, for example:
1>>> hipy = "Hello Pythonistas!\n"
2>>> trans_map = hipy.maketrans(" ", "-", "!\n")
3>>> hipy.translate(trans_map)
4'Hello-Pythonistas'
- Line 2
str.maketrans()
is used to create a translation table from the two string arguments. The two arguments must each contain the same number of characters. Characters that are not to be returned are passed as the third argument.- Line 3
The table generated by
str.maketrans()
is passed tostr.translate()
.
Checks¶
How can you change a heading such as
variables and expressions
so that it contains hyphens instead of spaces and can therefore be better used as a file name?If you want to check whether a line begins with
.. note::
, which method would you use? Are there any other options?Suppose you have a string with exclamation marks, quotation marks and line breaks. How can these be removed from the string?
How can you change all spaces and punctuation marks from a string to a hyphen (
-
)?