Strings and Character Data in Python

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Strings and Character Data in Python

In Python, string objects contain sequences of characters that allow you to manipulate textual data. It’s rare to find an application, program, or library that doesn’t need to manipulate strings to some extent. So, processing characters and strings is integral to programming and a fundamental skill for you as a Python programmer.

In this tutorial, you’ll learn how to:

Create strings using literals and the str() function
Use operators and built-in functions with strings
Index and slice strings
Do string interpolation and formatting
Use string methods

To get the most out of this tutorial, you should have a good understanding of core Python concepts, including variables, functions, and operators and expressions.

Take the Quiz: Test your knowledge with our interactive “Python Strings and Character Data” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Strings and Character Data

This quiz will evaluate your understanding of Python's string data type and your knowledge about manipulating textual data with string objects. You'll cover the basics of creating strings using literals and the `str()` function, applying string methods, using operators and built-in functions with strings, indexing and slicing strings, and more!

Getting to Know Strings and Characters in Python

Python provides the built-in string (str) data type to handle textual data. Other programming languages, such as Java, have a character data type for single characters. Python doesn’t have that. Single characters are strings of length one.

In practice, strings are immutable sequences of characters. This means you can’t change a string once you define it. Any operation that modifies a string will create a new string instead of modifying the original one.

A string is also a sequence, which means that the characters in a string have a consecutive order. This feature allows you to access characters using integer indices that start with 0. You’ll learn more about these concepts in the section about indexing strings. For now, you’ll learn about how to create strings in Python.

Creating Strings in Python

There are different ways to create strings in Python. The most common practice is to use string literals. Because strings are everywhere and have many use cases, you’ll find a few different types of string literals. There are standard literals, raw literals, and formatted literals.

Additionally, you can use the built-in str() function to create new strings from other existing objects.

In the following sections, you’ll learn about the multiple ways to create strings in Python and when to use each of them.

Standard String Literals

A standard string literal is just a piece of text or a sequence of characters that you enclose in quotes. To create single-line strings, you can use single ('') and double ("") quotes:

In the first example, you use single quotes to delimit the string literal. In the second example, you use double quotes.

You can define empty strings using quotes without placing characters between them:

An empty string doesn’t contain any characters, so when you use the built-in len() function with an empty string as an argument, you get 0 as a result.

To create multiline strings, you can use triple-quoted strings. In this case, you can use either single or double quotes:

The primary use case for triple-quoted strings is to create multiline strings. You can also use them to define single-line strings, but this is a less common practice.

Probably the most common use case for triple-quoted strings is when you need to provide docstrings for your packages, modules, functions, classes, and methods.

If you want to include a quote character within the string, then you can delimit that string with another type of quote. For example, if a string contains a single quote character, then you can delimit it with double quotes, and vice versa:

In the first example, your string includes a single quote as part of the text. To delimit the literal, you use double quotes. In the second example, you do the opposite.

Escape Sequences in String Literals

Sometimes, you want Python to interpret a character or sequence of characters within a string differently. This may occur in one of two ways. You may want to:

Apply special meaning to characters
Suppress special character meaning

To achieve these goals, you can use a backslash (\) character combined with other characters. The combination of a backslash and a specific character is called an escape sequence. That’s because the backslash causes the subsequent character to escape its usual meaning.

For example, you can escape a single quote character using the \' escape sequence in a string delimited by single quotes:

In this example, the backslash escapes the single quote character by suppressing its usual meaning, which is delimiting string literals. Now, Python knows that your intention isn’t to terminate the string but to embed the single quote.

The following table shows sequences that escape the default meaning of characters in string literals:

Character	Usual Interpretation	Escape Sequence	Escaped Interpretation
`'`	Delimits a string literal	`\'`	Literal single quote (`'`) character
`"`	Delimits a string literal	`\"`	Literal double quote (`"`) character
`<newline>`	Terminates the input line	`\<newline>`	Newline is ignored
`\`	Introduces an escape sequence	`\\`	Literal backslash (`\`) character

You already have an idea of how the first two escape sequences work. Now, how does the newline escape sequence work? Usually, a newline character terminates a physical line of input. To add a newline character, you just press Enter in the middle of a string. This will raise a SyntaxError exception:

When you press Enter after typing "Hello, you get a SyntaxError. If you need to break up a string over more than one line, then you can include a backslash before each newline or before pressing the Enter key:

When you add a backslash before you press Enter, Python ignores the newline and interprets the whole construct as a single physical line.

Sometimes, you need to include a literal backslash character in a string. If that backslash doesn’t precede a character with a special meaning, then you can insert it right away:

In this example, the character after the backslash doesn’t match any known escape sequence, so Python inserts the actual backslash for you. Note how the resulting string automatically doubles the backslash. Even though this example works, the best practice is to always double the backslash when you need this character in a string.

However, you may have the need to include a backslash right before a character that makes up an escape sequence:

Because the sequence \" matches a known escape sequence, your string fails with a SyntaxError. To avoid this issue, you can double the backslash:

In this update, you double the backslash to escape the character and prevent Python from raising an error.

Up to this point, you’ve learned how to suppress the meaning of a given character by escaping it. Suppose you need to create a string containing a tab character. Some text editors may allow you to insert a tab character directly into your code. However, this is considered a poor practice for several reasons:

Computers can distinguish between tabs and a sequence of spaces, but human beings can’t because these characters are visually indistinguishable.
Some text editors automatically eliminate tabs by expanding them to an appropriate number of spaces.
Some Python REPL environments won’t insert tabs into code.

In Python, you can specify a tab character with the \t escape sequence:

The \t escape sequence changes the usual meaning of the letter t, making Python interpret the combination as a tab character.

Here’s a list of escape sequences that cause Python to apply special meaning to some characters instead of interpreting them literally:

Escape Sequence	Escaped Interpretation
`\a`	ASCII Bell (`BEL`) character
`\b`	ASCII Backspace (`BS`) character
`\f`	ASCII Formfeed (`FF`) character
`\n`	ASCII Linefeed (`LF`) character
`\N{<name>}`	Character from Unicode database with given `<name>`
`\r`	ASCII Carriage return (`CR`) character
`\t`	ASCII Horizontal tab (`TAB`) character
`\uxxxx`	Unicode character with 16-bit hex value `xxxx`
`\Uxxxxxxxx`	Unicode character with 32-bit hex value `xxxxxxxx`
`\v`	ASCII Vertical tab (`VT`) character
`\ooo`	Character with octal value `ooo`
`\xhh`	Character with hex value `hh`

The newline or linefeed character (\n) is probably the most popular of these escape sequences. This sequence is commonly used to create nicely formatted text outputs that span multiple lines.

Here are a few examples of the escape sequences in action:

These escape sequences are useful when you need to insert characters that aren’t readily generated from the keyboard or aren’t easily readable or printable.

Raw String Literals

With raw string literals, you can create strings that don’t translate escape sequences. Any backslash characters are left in the string.

To create a raw string, you have to prepend the string literal with the letter r or R:

The raw string suppresses the meaning of the escape sequence, such as \t, and presents the characters as they are.

Raw strings are commonly used to create regular expressions because they allow you to use several different characters that may have special meanings without restrictions.

For example, say that you want to create a regular expression to match email addresses. To do this, you can use a raw string to create the regular expression like in the code below:

The pattern variable holds a raw string that makes up a regular expression to match email addresses. Note how the string contains several backslashes that are escaped and inserted into the resulting string as they are. Then, you compile the regular expression and check for matches in a sample piece of text.

Formatted String Literals

Formatted string literals, or f-strings for short, allow you to interpolate values into your strings and format them as you need.

To create a string with an f-string literal, you must prepend the literal with an f or F letter. F-strings let you interpolate values into replacement fields in your string literal. You create these fields using curly brackets.

Here’s a quick example of an f-string literal:

In this example, you interpolate the name variable into your string using an f-string literal and a replacement field.

You’ll learn more about using f-strings for string interpolation and formatting in the Doing String Interpolation and Formatting section.

The Built-in `str()` Function

You can create new strings using the built-in str() function. However, a more common use case for this function is to convert other data types into strings, which is also a way to create new strings.

Consider the following examples:

In the first example, you call the str() function without an argument to create an empty string. Then, you use the function to convert objects from different built-in types into strings.

Using Operators on Strings

You can also use a few operators with string values as operands. For example, the + and * operators allow you to perform string concatenation and repetition, respectively. In the following sections, you’ll learn how these and other operators work with strings.

Concatenating Strings: The `+` Operator

The + operator lets you concatenate strings. So, in this context, you can call it the concatenation operator. Concatenation involves joining two or more string objects to create a single new string:

In this example, you use the plus operator to concatenate several string objects. You start with the greeting variable and add a comma followed by a space. Next, you add the name variable and finally, some exclamation points. Note that you’ve combined variables holding strings and string literals to build the final string.

The concatenation operator also has an augmented variation denoted by +=. This operator allows you to do something like string = string + another_string but in a shorter way:

In this example, you create a filename incrementally. You start with the filename without the file extension. Then, you use the augmented concatenation operator to add the file extension to the existing filename.

Note that the augmented operator doesn’t change the initial string because strings are immutable. Instead, it creates a new string and reassigns it to the name file.

Repeating Strings: The `*` Operator

The * operator allows you to repeat a given string a certain number of times. In this context, this operator is known as the repetition operator. Its syntax is shown below:

The repetition operator takes two operands. One operand is the string that you want to repeat, while the other operand is an integer number representing the number of times you want to repeat the target string:

In the first example, you repeat the = character ten times. In the second example, you repeat the text Hi! ten times as well. Note that the order of the operands doesn’t affect the result.

The multiplier operand, n, is usually a positive integer. If it’s a zero or negative integer, the result is an empty string:

You need to be aware of this behavior in situations where you compute n dynamically, and 0 or negative values can occur. Otherwise, this operator comes in handy when you need to present some output to your users in a table format.

For example, say that you have the following data about your company’s employees:

You want to create a function that takes this data, puts it into a table, and prints it to the screen. In this situation, you can write a function like the following:

This function takes the employees’ data as an argument and the headers for the table. Then, it gets the maximum header length, prints the headers using the pipe character (|) as a separator, and justifies the headers to the left.

To separate the headers from the table’s content, you build a line of hyphens. To create the line, you use the repetition operator with the maximum header length as the multiplier operand. The effect of using the operator here is that you’ll get a visual line under each header that’s always as long as the longest header field. Finally, you print the employees’ data.

Here’s how the function works in practice:

Again, creating tables like this one is a great use case for the repetition operator because it allows you to create separators dynamically.

The repetition operator also has an augmented variation that lets you do something like string = string * n in a shorter way:

In this example, the *= operator repeats the string in sep ten times and assigns the result back to the sep variable.

Finding Substrings in a String: The `in` and `not in` Operators

Python also provides the in and not in operators that you can use with strings. The in operator returns True if the left-hand operand is contained within the right-hand one and False otherwise. This type of check is known as a membership test.

You can take advantage of a membership test when you need to determine if a substring appears in a given string:

In these examples, you check whether the string "food" is a substring of the strings on the right-hand side.

The not in operator does the opposite check:

With the not in operator, you can check if a substring isn’t found in a given string. If the substring isn’t in the target string, then you get True. Otherwise, you get False.

Exploring Built-in Functions for String Processing

Python provides many functions that are built into the language and, therefore, are always available to you. Here are a few of these functions that are especially useful when you’re processing strings:

Function	Description
len()	Returns the length of a string
str()	Returns a user-friendly string representation of an object
repr()	Returns a developer-friendly string representation of an object
format()	Allows for string formatting
ord()	Converts a character to an integer
chr()	Converts an integer to a character

In the following sections, you’ll learn how to use these functions to work with your strings. To kick things off, you’ll start by determining the length of an existing string with len().

Finding the Number of Characters: `len()`

A common operation that you’ll perform on strings is determining their number of characters. To complete this task, you can use the built-in len() function.

With a string as an argument, the len() function returns the length of the input string or the number of characters in that string. Here are a couple of examples of how to use len():

The word "Python" has six letters, so calling len() with this word as an argument returns 6. Note that len() returns 0 when you call it with empty strings. The function calculates the number of characters in a string object, independent of how you constructed the string:

You use the \N escape sequence to define a string with a single snake emoji. Even though you wrote nine characters in the string literal, the string object consist of a single 🐍 character. And len() reports its length as one. You get the same result when you create the string with a literal emoji character.

Converting Objects Into Strings: `str()` and `repr()`

To convert objects into their string representation, you can use the built-in str() and repr() functions. With the str() function, you can convert a given object into its user-friendly representation. This type of string representation is targeted at end users.

For example, if you pass an object of a built-in type to str(), then you get a representation that typically consists of a string containing the literal that defines the object at hand:

In these examples, the outputs show user-friendly representations of the input objects. These representations coincide with the objects’ literals.

You can provide a user-friendly string representation for your custom classes through the .__str__() special method, as you’ll learn in a moment.

Similarly, the built-in repr() function gives you a developer-friendly representation of the object at hand:

Ideally, you should be able to copy the output of repr() to re-create the original object. Also, this representation lets you know how the object is built. It’s because of these two distinctions that you can say repr() returns a developer-friendly string representation of the object at hand.

As you can see in the example above, the output of str() and repr() is the same for built-in data types. Both functions return the object’s literal as a string.

To see the difference between str() and repr(), consider the Person class in the following example:

The .__repr__() method gives you a developer-friendly string representation of specific Person objects. You’ll be able to re-create the object using the output of the repr() function. In contrast, the string representation that you get from calling .__str__() should aim to be readable and informative for end users. You can access this representation with the str() function.

Here’s how the two functions work with an instance of your class:

In this example, you can use the output of repr() to re-create the object stored in john. You also have all the information about how this specific object was built. Meanwhile, the output of str() gives you no clue about how to build the object. However, it does provide useful information for the end user.

Formatting Strings: `format()`

You can use the built-in format() function to format strings. This function converts an input value into a formatted representation. To define the desired format, you can use a format specifier, which should be a string that follows the syntax defined in the string formatting mini-language.

Consider the following examples of using the format() function with different input values and format specifiers:

In these examples, the ".4f" specifier formats the input value as a floating-point number with four decimal places.

With the ",.2f" format specifier, you can format a number using commas as thousand separators and with two decimal places, which could be appropriate for formatting currency values.

Next, you use the "=^30" specifier to format the string "Header" centered in a width of 30 characters using the equal sign as a filler character. Finally, you use "%a %b %d, %Y" to format the date.

Processing Characters Through Code Points: `ord()` and `chr()`

The ord() function returns an integer value representing the Unicode code point for the given character.

At the most basic level, computers store all information as numbers. To represent character data, computers use a translation scheme that maps each character to its associated number, which is its code point.

The most commonly used scheme is ASCII, which covers the familiar Latin characters you’re probably most accustomed to working with. For these characters, ord() returns the ASCII value that corresponds to the character at hand:

While the ASCII character set is fine for representing English, many natural languages are used worldwide, and countless symbols and glyphs appear in digital media. The complete set of characters that you may need to represent in code surpasses the Latin letters, numbers, and symbols.

Unicode is a generally accepted standard that attempts to provide a numeric code for every possible character in every possible natural language on every platform or operating system. The Unicode character set is the standard in Python.

The ord() function will return numeric values for Unicode characters as well:

In the Unicode table, the "€" character has 8364 as its associated code point, and the "∑" character has the 8721 code point.

The chr() function does the reverse of ord(). It returns the character value associated with a given code point:

Given the numeric value n, chr(n) returns the character that corresponds to n. Note that chr() handles Unicode characters as well.

Indexing and Slicing Strings

Python’s strings are ordered sequences of characters. Because of this, you can access individual characters from a string using the characters’ associated index. What’s a character’s index? Each character in a string has an index that specifies its position. Indices are integer numbers that start at 0 and go up to the number of characters in the string minus one.

You can use these indices to perform two common operations:

Indexing: Consists of retrieving a given character using its associated index.
Slicing: Allows you to get a portion of an existing string using indices.

In the following sections, you’ll learn how to run these two operations on your Python strings.

Indexing Strings

You can access individual characters in a string by specifying the string object followed by the character’s index in square brackets ([]). This operation is known as indexing.

String indexing in Python is zero-based, which means that the first character in the string has an index of 0, the next has an index of 1, and so on. The index of the last character will be the length of the string minus one.

For example, a schematic diagram of the indices of the string "foobar" would look like this:

String Indices

You can access individual characters by their index:

Using the indexing operator [index], you can access individual characters in an existing string. Note that attempting to index beyond the end of the string results in an error:

When you use an index that’s beyond the number of characters in the string minus one, you get an IndexError exception.

You can also use negative numbers as indices. If you do that, then the indexing occurs from the end of the string backward. Index -1 refers to the last character, index -2 refers to the second-last character, and so on.

Here’s the diagram showing both positive and negative indices in the string "foobar":

Positive and Negative String Indices

Here are some examples of negative indexing:

With negative indices, you can access the characters from the end of the string to its beginning. Again, attempting to index with negative numbers beyond the beginning of the string results in an error:

Because the "foobar" string only has six characters, the last negative index that you can use is -6, which corresponds to the first character in the string.

In short, for any non-empty string, s[len(s) - 1] and s[-1] both return the last character. Similarly, s[0] and s[-len(s)] return the first character. Note that no index makes sense on an empty string.

Finally, you should keep in mind that because strings are immutable, you can’t perform index assignments on them. The following example illustrates this fact:

If you try to change a character in an existing string, then you’ll get a TypeError telling you that strings don’t support item assignment.

Slicing Strings

Python allows you to extract substrings from a string. This operation is known as slicing. If s is a string, then an expression of the form s[m:n] returns the portion of s starting at index m, and up to but not including index n:

The first index sets the starting point of the indexing operation. The second index specifies the first character that isn’t included in the result—the character "r" with index 5 in the example above.

If you omit the first index, then the slicing starts at the beginning of the string:

Therefore, slicings like s[:m] and s[0:m] are equivalent. Similarly, if you omit the second index as in s[n:], then the slicing extends from the first index to the end of the string. This is an excellent and concise alternative to the more cumbersome s[n:len(s)] syntax:

When you omit the last index, the slicing operation extends up to the end of the string.

For any string s and any integer n, provided that n ≥ 0, you’ll have s[:n] + s[n:] is equal to s:

The first slicing extracts the characters from the beginning and up to index 4, which isn’t included. The second slicing extracts the characters from 4 to the end of the string. So, the result will contain the same characters as the original string.

Omitting both indices returns the original string in its entirety. It doesn’t return a copy but a reference to the original string:

When you omit both indices, you tell Python that you want to extract a slice containing the whole string. Again, for efficiency reasons, Python returns a reference instead of a copy of the original string. You can confirm this behavior using the built-in id() function.

If the first index in a slicing is greater than or equal to the second index, then Python returns an empty string:

As you can see, this behavior provides a somewhat obfuscated way to generate empty strings.

You can also use negative indices in slicing. Just as with indexing, -1 refers to the last character, -2 to the second-last character, and so on. The diagram below shows how to slice the substring "oob" from the string "foobar" using both positive and negative indices:

String Slicing With Positive and Negative Indices

Here’s the corresponding Python code:

Slicing with negative indices really shines when you need to get slices that have the last character of the string as a reference point.

There’s one more variant of the slicing syntax to discuss. An additional colon (:) and a third index designate a stride, also known as a step. This index indicates how many characters to jump after retrieving each character in the slice.

For example, for the string "foobar", the slice [0:6:2] starts with the first character and ends with the last character, skipping every second character. This behavior is shown in the following diagram, where the green items are the ones included in the slice:

String Indexing With Stride

Similarly, [1:6:2] specifies a slice that starts with the second character, which is index 1, and ends with the last character. Again, the stride value 2 skips every other character:

Another String Indexing With Stride

The illustrative REPL code is shown below:

As with any slicing, the first and second indices can be omitted and default to the first and last characters, respectively:

In the first slicing, you omit the first and second indices, making the slice start at index 0 and go up to the end of the string. In the second slicing, you only omit the second index, and the slice goes up to the end of the string.

You can also specify a negative stride value, in which case Python steps backward through the string. Note that the initial index should be greater than the ending index:

In the above example, [5:0:-2] means this: start at the last character and step backward by 2, up to but not including the first character.

When you step backward, if the first and second indices are omitted, the defaults are reversed intuitively. In other words, the initial index defaults to the end of the string, and the ending index defaults to the beginning.

Here’s an example:

Using a step equal to -1 provides a common technique for reversing strings:

To dive deeper into how to reverse strings, check out the Reverse Strings in Python: reversed(), Slicing, and More tutorial.

Doing String Interpolation and Formatting

String interpolation refers to building new strings by inserting values into a string that works as a template with replacement fields for each value you insert. String formatting, on the other hand, involves applying a desired format to the values you insert in a given string through interpolation.

In the following sections, you’ll learn the basics of string interpolation and formatting in Python.

Using F-Strings

You can also use f-strings to interpolate values into a string and format them, as you quickly touched on in the section about f-string literals. To do this, you use format specifiers that use the syntax defined in Python’s string format mini-language. For example, here’s how you can present numeric values using a currency format:

Inside the replacement field delimited by the curly brackets, you have the variable you want to interpolate and the format specifier, which is the string that starts with a colon (:). In this example, the format specifier defines a floating-point number with two decimal places.

F-strings have a clean syntax and are quite popular in Python code. However, they’re only suitable for those situations where you want to do the interpolation eagerly because the interpolated values are inserted when Python executes the strings.

You can’t use f-strings for lazy interpolation. In other words, you can’t use f-string for those situations where you need a reusable template that you’ll fill with values dynamically in different parts of your code. In this situation, you can use the str.format() method.

Using the `.format()` Method

Python’s str class has a method called .format() that allows for string interpolation and formatting. This method is pretty powerful. Like f-strings, it supports the string formatting mini-language, but unlike f-strings, it allows for both eager and lazy string interpolation.

Consider the following example that uses .format() to create a balance report:

In this example, you first define two variables to hold the debit and credit of a bank account. Then, you create a template to construct a report showing the account’s credit, debit, and balance. Finally, you call .format() with the required argument to create the report.

Using the Modulo Operator (`%`)

Python has another tool you can use for string interpolation and formatting called the modulo operator (%). This operator was the first tool Python provided for string interpolation and formatting. It also allows you to do both eager and lazy interpolation.

While you can use the modulo operator for string formatting, its formatting capabilities are limited and it doesn’t support the string formatting mini-language.

Here’s how you would use the modulo operator to code the example from the previous section:

In this version of the report template, you use named placeholders with the modulo operator syntax. To provide the values to interpolate, you use a dictionary. The operator unpacks it using the keys that match the replacement fields.

Exploring `str` Class Methods

Methods are functions that you define inside classes. Like a function, you call a method to perform a specific task. Unlike a function, you invoke a method on a specific object or class so it has knowledge of its containing object during execution.

The syntax for invoking a method on an object is shown below:

object.method([arg_0, arg_2, ..., arg_n])
class.method([arg_0, arg_2, ..., arg_n])

This invokes method .method() on object object. Inside the parentheses, you can specify the arguments for the method to work, but these arguments are optional. Note that you can have both instance and class methods.

Python’s str class provides a rich set of methods that you can use to format your strings in several different ways. In the next section, you’ll learn the basics of some of the more commonly used str methods.

Manipulating Casing

The str methods you’ll look at in this section allow you to perform case conversion on the target string. Note that the methods in this section only affect letters. Non-letter characters remain the same because they don’t have uppercase and lowercase variations.

To kick things off, you’ll start by capitalizing some strings using the .capitalize() method.

`.capitalize()`

The .capitalize() method returns a copy of the target string with its first character converted to uppercase, and all other characters converted to lowercase:

In this example, only the first letter in the target string is converted to uppercase. The rest of the letters are converted to lowercase.

`.lower()`

The .lower() method returns a copy of the target string with all alphabetic characters converted to lowercase:

You can see in this example how all the uppercase letters were converted to lowercase.

`.swapcase()`

The .swapcase() method returns a copy of the target string with uppercase alphabetic characters converted to lowercase and vice versa:

In this example, the call to .swapcase() turned the lowercase letters into uppercase and the uppercase letters into lowercase.

`.title()`

The .title() method returns a copy of the target string in which the first letter of each word is converted to uppercase, and the remaining letters are lowercase:

Note that this method doesn’t attempt to distinguish between important and unimportant words, and it doesn’t handle apostrophes, possessives, or acronyms gracefully:

The .title() method converts the first letter of each word in the target string to uppercase. As you can see, it doesn’t do a great job with apostrophes and acronyms!

`.upper()`

The .upper() method returns a copy of the target string with all alphabetic characters converted to uppercase:

By calling .upper() on a string, you get a new string with all the letters in uppercase.

Finding and Replacing Substrings

The methods in this section provide various ways to search the target string for a specified substring.

Each method supports optional start and end arguments. These arguments mean that the action of the method is restricted to the portion of the target string starting with the character at index start and up to but not including the character at index end. If start is specified but end isn’t, then the method applies to the portion of the target string from start through the end of the string.

`.count(sub[, start[, end]])`

The .count(sub) method returns the number of non-overlapping occurrences of the substring sub in the target string:

The count is restricted to the number of occurrences within the substring indicated by start and end if you specify them:

In this example, the search starts at index 0 and ends at index 8. Because of this, the last occurrence of "oo" isn’t counted.

`.find(sub[, start[, end]])`

You can use .find() to check whether a string contains a particular substring. Calling .find(sub) returns the lowest index in the target string where sub is found:

Note that .find() returns -1 if the specified substring isn’t found in the target string:

Again, the search is restricted to the substring indicated by start and end if you specify them:

With the start and end indices, you restrict the search to a portion of the target string.

`.index(sub[, start[, end]])`

The .index() method is similar to .find(), except that it raises an exception rather than returning -1 if sub isn’t found in the target string:

If the substring exists in the target string, then .index() returns the index at which the first instance of the substring starts. If the substring isn’t found in the target string, then you get a ValueError exception.

`.rfind(sub[, start[, end]])`

The .rfind(sub) call returns the highest index in the target string where the substring sub is found:

As with .find(), if the substring isn’t found, then .rfind() returns -1:

You can restrict the search to the substring indicated by start and end:

Again, with the start and end indices, you restrict the search to a portion of the target string.

`.rindex(sub[, start[, end]])`

The .rindex() method is similar to .rfind(), except that it raises an exception if sub isn’t found in the target string:

If the substring isn’t found in the target string, then rindex() raises a ValueError exception.

`.startswith(prefix[, start[, end]])`

The .startswith(prefix) call returns True if the target string starts with the specified prefix and False otherwise:

The comparison is restricted to the substring indicated by start and end if they’re specified:

In these examples, you use the start and end indices to restrict the search to a given portion of the target string.

`.endswith(suffix[, start[, end]])`

The .endswith(suffix) call returns True if the target string ends with the specified suffix and False otherwise:

The comparison is restricted to the substring indicated by start and end if you specify them.

Classifying Strings

The methods in this section classify a string based on its characters. In all cases, the methods are predicates, meaning they return True or False depending on the condition they check.

`.isalnum()`

The .isalnum() method returns True if the target string isn’t empty and all its characters are alphanumeric, meaning either a letter or number. Otherwise, it returns False:

In the first example, you get True because the target string contains only letters and numbers. In the second example, you get False because of the "$" character.

`.isalpha()`

The .isalpha() method returns True if the target string isn’t empty and all its characters are alphabetic. Otherwise, it returns False:

The .isalpha() method allows you to check whether all the characters in a given string are letters. Note that whitespaces aren’t considered alpha characters, which makes sense but might be missed if you’re working with normal text.

`.isdigit()`

You can use the .isdigit() method to check whether your string is made of only digits:

The .isdigit() method returns True if the target string is not empty and all its characters are numeric digits. Otherwise, it returns False.

`.isidentifier()`

The .isidentifier() method returns True if the target string is a valid Python identifier according to the language definition. Otherwise, it returns False:

It’s important to note that .isidentifier() will return True for a string that matches a Python keyword even though that wouldn’t be a valid identifier:

You can test whether a string matches a Python keyword using a function called iskeyword(), which is contained in a module called keyword. One possible way to do this is shown below:

If you want to ensure that a string serves as a valid Python identifier, you should check that .isidentifier() returns True and that iskeyword() returns False.

`.islower()`

The .islower() method returns True if the target string isn’t empty and all its alphabetic characters are lowercase. Otherwise, it returns False:

Note that non-alphabetic characters are ignored when you call the .islower() method on a given string.

`.isprintable()`

The .isprintable() method returns True if the target string is empty or if all its alphabetic characters are printable:

You get False if the target string contains at least one non-printable character. Again, non-alphabetic characters are ignored.

Note that .isprintable() is one of two .is*() methods that return True if the target string is empty. The second one is .isascii().

`.isspace()`

The .isspace() method returns True if the target string isn’t empty and all its characters are whitespaces. Otherwise, it returns False.

The most commonly used whitespace characters are space (" "), tab ("\t"), and newline ("\n"):

There are a few other ASCII characters that qualify as whitespace characters, and if you account for Unicode characters, there are quite a few beyond that:

The "\f" and "\r" combinations are the escape sequences for the ASCII form feed and carriage return characters. The "\u2005" combination is the escape sequence for the Unicode four-per-em space.

`.istitle()`

The .istitle() method returns True if the target string isn’t empty, the first alphabetic character of each word is uppercase, and all other alphabetic characters in each word are lowercase. It returns False otherwise:

This method returns True if the string is title-cased as would result from calling .title(). It returns False otherwise.

`.isupper()`

The .isupper() method returns True if the target string isn’t empty and all its alphabetic characters are uppercase. Otherwise, it returns False:

Again, Python ignores non-alphabetic characters when you call the .isupper() method on a given string object.

Formatting Strings

The methods in this section allow you to modify or enhance the format of a string in many different ways. You’ll start by learning how to center your string within a given space.

`.center(width[, fill])`

The .center(width) call returns a string consisting of the target string centered in a field of width characters. By default, padding consists of the ASCII space character:

If you specify the optional fill argument, then it’s used as the padding character:

Note that if the target string is as long as width or longer, then it’s returned unchanged.

`.expandtabs(tabsize=8)`

The .expandtabs() method replaces each tab character ("\t") found in the target string with spaces. By default, spaces are filled in assuming eight characters per tab:

In the final example, you use the tabsize argument, which is an optional argument that specifies an alternate tab size.

`.ljust(width[, fill])`

The .ljust(width) call returns a string consisting of the target string left-justified in a field of width characters. By default, the padding consists of the ASCII space character:

If you specify the optional fill argument, then it’s used as the padding character:

If the target string is as long as width or longer, then it’s returned unchanged.

`.rjust(width[, fill])`

The .rjust(width) call returns a string consisting of the target string right-justified in a field of width characters. By default, the padding consists of the ASCII space character:

If you specify the optional fill argument, then it’s used as the padding character:

If the target string is as long as width or longer, then it’s returned unchanged:

When the target string is as long as width or longer, then .rjust() won’t have space to justify the text, so you get the original string back.

`.removeprefix(prefix)`

The .removeprefix() method returns a copy of the target string with prefix removed from the beginning:

The prefix is removed if the target string begins with that exact substring. If the original string doesn’t begin with prefix, then the string is returned unchanged.

`.removesuffix(suffix)`

The .removesuffix() method returns a copy of the target string with suffix removed from the end:

The suffix is removed if the target string ends with that exact substring. If the original string doesn’t end with suffix, then the string is returned unchanged. The .removeprefix() and .removesuffix() methods were introduced in Python 3.9.

`.lstrip([chars])`

The .lstrip() method returns a copy of the target string with any whitespace characters removed from the left end:

If you specify the optional chars argument, then it’s a string that specifies the set of characters to be removed:

In this example, you remove the http:// prefix from the input URL using the .lstrip() method with the "/:htp" characters as an argument.

Note that this call to .lstrip() works only because the URL doesn’t start with any of the target characters:

In this example, the result isn’t correct because the URL starts with a p, which is included in the set of letters that you want to remove. You should use .removeprefix() instead.

`.rstrip([chars])`

The .rstrip() method with no arguments returns a copy of the target string with any whitespace characters removed from the right end:

If you specify the optional chars argument, then it should be a string that specifies the set of characters to be removed:

By providing a string value to chars, you can control the set of characters to remove from the right side of the target string.

`.strip([chars])`

The .strip() method is equivalent to invoking .lstrip() and .rstrip() in succession. Without the chars argument, it removes leading and trailing whitespace in one go:

As with .lstrip() and .rstrip(), the optional chars argument specifies the set of characters to be removed:

Again, with chars, you can control the characters you want to remove from the original string, which defaults to whitespaces.

`.replace(old, new[, count])`

To replace a substring of a string, you can use the .replace() method. This method returns a copy of the target string with all the occurrences of the old substring replaced by new:

If you specify the optional count argument, then a maximum of count replacements are performed, starting at the left end of the target string:

The count argument lets you specify how many replacements you want to perform in the original string.

`.zfill(width)`

The .zfill(width) call returns a copy of the target string left-padded with zeroes to the specified width:

If the target string contains a leading sign, then it remains at the left edge of the result string after zeros are inserted:

If the target string is as long as width or longer, then it’s returned unchanged:

Python will still zero-pad a string that isn’t a numeric value:

The .zfill() method is useful for representing numeric values. However, it also works with textual strings.

Joining and Splitting Strings

The methods you’ll explore in this section allow you to convert between a string and some composite data types by joining objects together to make a string or by breaking a string up into pieces.

These methods operate on or return iterables, which are collections of objects. For example, many of these methods return a list or a tuple.

`.join(iterable)`

The .join() method takes an iterable of string objects and returns the string that results from concatenating the objects in the input iterable separated by the target string.

Note that .join() is invoked on a string that works as the separator between each item in the input iterable, which must contain string objects.

In the following example, the separator is a string consisting of a comma and a space. The input iterable is a list of string values:

The result of this call to .join() is a single string consisting of string objects separated by commas.

Again, the input iterable must contain string objects. Otherwise, you’ll get an error:

This example fails because the second object in the input iterable isn’t a string object but an integer number.

When you have iterables of values that aren’t strings, then you can use the str() function to convert them before passing them to .join(). Consider the following example:

In this example, you use the str() function to convert the values in numbers to strings before feeding them to .join(). To do the conversion, you use a generator expression.

`.partition(sep)`

The .partition(sep) call splits the target string at the first occurrence of string sep. The return value is a tuple with three objects:

The portion of the target string that precedes sep
The sep object itself
The portion of the target string that follows sep

Here are a couple of examples of .partition() in action:

Note that if the string ends with the target sep, then the last item in the tuple is an empty string. Similarly, if sep isn’t found in the target string, then the returned tuple contains the string followed by two empty strings, as you see in the final example.

`.rpartition(sep)`

The .rpartition(sep) call works like .partition(sep), except that the target string is split at the last occurrence of sep instead of the first:

In this case, the string partition is done starting from the right side of the target string.

`.split(sep=None, maxsplit=-1)`

Without arguments, .split() splits the target string into substrings delimited by any sequence of whitespace and returns the substrings as a list:

If you specify the sep argument, then it’s used as the delimiter for the splitting:

When sep is explicitly given as a delimiter, consecutive instances of the delimiter in the target string are assumed to delimit empty strings:

However, consecutive whitespace characters are combined into a single delimiter, and the resulting list will never contain empty strings:

If the optional parameter maxsplit is specified, then a maximum of that many splits are performed, starting from the left end of the target string:

In this example, .split() performs only one split, starting from the left end of the target string.

`.rsplit(sep=None, maxsplit=-1)`

The .rsplit() method behaves like .split(), except that if maxsplit is specified, then the splits are counted from the right end of the target string rather than from the left end:

If maxsplit isn’t specified, then the results of .split() and .rsplit() are indistinguishable.

`.splitlines([keepends])`

The .splitlines() method splits the target string into lines and returns them in a list. The following escape sequences can work as line boundaries:

Sequence	Description
`\n`	Newline
`\r`	Carriage return
`\r\n`	Carriage return + line feed
`\v` or `\x0b`	Line tabulation
`\f` or `\x0c`	Form feed
`\x1c`	File separator
`\x1d`	Group separator
`\x1e`	Record separator
`\x85`	Next line (C1 control code)
`\u2028`	Unicode line separator
`\u2029`	Unicode paragraph separator

Here’s an example of using several different line separators:

If consecutive line boundary characters are present in the target string, then they’re assumed to delimit blank lines, which will appear in the result list as empty strings:

Note that if you set the optional keepends argument to True, then the line boundaries are retained in the result strings.

Conclusion

You now know about Python’s many tools for string creation and manipulation, including string literals, operators, and built-in functions. You’ve also learned how to extract items and parts of existing strings using the indexing and slicing operators. Finally, you’ve looked at the methods the str class provides for string formatting and processing.

In this tutorial, you’ve learned how to:

Create strings using literals and the str() function
Use operators and built-in functions with string objects
Extract characters and portions of a string through indexing and slicing
Interpolate and format values into your strings
Use various string methods

With this knowledge, you’re ready to start creating and processing textual data in your Python code. You can always come back to this tutorial if you ever need a quick reference on how to use string methods.

Take the Quiz: Test your knowledge with our interactive “Python Strings and Character Data” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Strings and Character Data

Or copy the link:

Copy

Copied!

Happy Pythoning!

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Strings and Character Data in Python

Getting to Know Strings and Characters in Python

Creating Strings in Python

Standard String Literals

Escape Sequences in String Literals

Raw String Literals

Formatted String Literals

The Built-in str() Function

Using Operators on Strings

Concatenating Strings: The + Operator

Repeating Strings: The * Operator

Finding Substrings in a String: The in and not in Operators

Exploring Built-in Functions for String Processing

Finding the Number of Characters: len()

Converting Objects Into Strings: str() and repr()

Formatting Strings: format()

Processing Characters Through Code Points: ord() and chr()

Indexing and Slicing Strings

Indexing Strings

Slicing Strings

Doing String Interpolation and Formatting

Using F-Strings

Using the .format() Method

Using the Modulo Operator (%)

Exploring str Class Methods

Manipulating Casing

.capitalize()

.lower()

.swapcase()

.title()

.upper()

Finding and Replacing Substrings

.count(sub[, start[, end]])

.find(sub[, start[, end]])

.index(sub[, start[, end]])

.rfind(sub[, start[, end]])

.rindex(sub[, start[, end]])

.startswith(prefix[, start[, end]])

.endswith(suffix[, start[, end]])

Classifying Strings

.isalnum()

.isalpha()

.isdigit()

.isidentifier()

.islower()

.isprintable()

.isspace()

.istitle()

.isupper()

Formatting Strings

.center(width[, fill])

.expandtabs(tabsize=8)

.ljust(width[, fill])

.rjust(width[, fill])

.removeprefix(prefix)

.removesuffix(suffix)

.lstrip([chars])

.rstrip([chars])

.strip([chars])

.replace(old, new[, count])

.zfill(width)

Joining and Splitting Strings

.join(iterable)

.partition(sep)

.rpartition(sep)

.split(sep=None, maxsplit=-1)

.rsplit(sep=None, maxsplit=-1)

.splitlines([keepends])

Conclusion

Related stories

Other stories

The Built-in `str()` Function

Concatenating Strings: The `+` Operator

Repeating Strings: The `*` Operator

Finding Substrings in a String: The `in` and `not in` Operators

Finding the Number of Characters: `len()`

Converting Objects Into Strings: `str()` and `repr()`

Formatting Strings: `format()`

Processing Characters Through Code Points: `ord()` and `chr()`

Using the `.format()` Method

Using the Modulo Operator (`%`)

Exploring `str` Class Methods

`.capitalize()`

`.lower()`

`.swapcase()`

`.title()`

`.upper()`

`.count(sub[, start[, end]])`

`.find(sub[, start[, end]])`

`.index(sub[, start[, end]])`

`.rfind(sub[, start[, end]])`

`.rindex(sub[, start[, end]])`

`.startswith(prefix[, start[, end]])`

`.endswith(suffix[, start[, end]])`

`.isalnum()`

`.isalpha()`

`.isdigit()`

`.isidentifier()`

`.islower()`

`.isprintable()`

`.isspace()`

`.istitle()`

`.isupper()`

`.center(width[, fill])`

`.expandtabs(tabsize=8)`

`.ljust(width[, fill])`

`.rjust(width[, fill])`

`.removeprefix(prefix)`

`.removesuffix(suffix)`

`.lstrip([chars])`

`.rstrip([chars])`

`.strip([chars])`

`.replace(old, new[, count])`

`.zfill(width)`

`.join(iterable)`

`.partition(sep)`

`.rpartition(sep)`

`.split(sep=None, maxsplit=-1)`

`.rsplit(sep=None, maxsplit=-1)`

`.splitlines([keepends])`