{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CS 200: Strings in Python\n", "\n", "

\n", "\n", "

\n", "This notebook mirrors the Google Python Course: Strings\n", "\n", "### Video:\n", "\n", "See Strings from Socratica." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### str class\n", "\n", "Python strings are instances of the str class.\n", "str(object) is a constructor which creates a new string from the given object." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'123'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "str(123)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'10'" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "str(3 + 7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strings can be notated with single quotes ('), double quotes (\"), or triple quotes (''') for strings that span multiple lines." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'hello world'" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'hello world'" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'hello world'" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"hello world\"" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\nwait for it:\\nhello world!\\n'" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'''\n", "wait for it:\n", "hello world!\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The '\\n' character is \"newline\"." ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "wait for it:\n", "hello world!\n", "\n" ] } ], "source": [ "print('\\nwait for it:\\nhello world!\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### escape sequences\n", "\n", "If you want to include a single or double quote inside a string, you can escape it with a backslash (\\\\). If you want to include a backslash in a string, use two backslashes. See Python 3 escape sequences for more information." ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"I can't wait!\"" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'I can\\'t wait!'" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"I can't wait\"" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"I can't wait\"" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\"Who are you?\"'" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'\"Who are you?\"'" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "heaven \\ hell\n" ] } ], "source": [ "print(\"heaven \\\\ hell\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### string methods\n", "\n", "The str class has a boatload of methods." ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['__add__',\n", " '__class__',\n", " '__contains__',\n", " '__delattr__',\n", " '__dir__',\n", " '__doc__',\n", " '__eq__',\n", " '__format__',\n", " '__ge__',\n", " '__getattribute__',\n", " '__getitem__',\n", " '__getnewargs__',\n", " '__gt__',\n", " '__hash__',\n", " '__init__',\n", " '__init_subclass__',\n", " '__iter__',\n", " '__le__',\n", " '__len__',\n", " '__lt__',\n", " '__mod__',\n", " '__mul__',\n", " '__ne__',\n", " '__new__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__rmod__',\n", " '__rmul__',\n", " '__setattr__',\n", " '__sizeof__',\n", " '__str__',\n", " '__subclasshook__',\n", " 'capitalize',\n", " 'casefold',\n", " 'center',\n", " 'count',\n", " 'encode',\n", " 'endswith',\n", " 'expandtabs',\n", " 'find',\n", " 'format',\n", " 'format_map',\n", " 'index',\n", " 'isalnum',\n", " 'isalpha',\n", " 'isascii',\n", " 'isdecimal',\n", " 'isdigit',\n", " 'isidentifier',\n", " 'islower',\n", " 'isnumeric',\n", " 'isprintable',\n", " 'isspace',\n", " 'istitle',\n", " 'isupper',\n", " 'join',\n", " 'ljust',\n", " 'lower',\n", " 'lstrip',\n", " 'maketrans',\n", " 'partition',\n", " 'removeprefix',\n", " 'removesuffix',\n", " 'replace',\n", " 'rfind',\n", " 'rindex',\n", " 'rjust',\n", " 'rpartition',\n", " 'rsplit',\n", " 'rstrip',\n", " 'split',\n", " 'splitlines',\n", " 'startswith',\n", " 'strip',\n", " 'swapcase',\n", " 'title',\n", " 'translate',\n", " 'upper',\n", " 'zfill']" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dir(str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Character Case\n", "\n", "lower() and upper() convert a string to all lowercase or all uppercase, respectively. \n", "\n", "Methods are invoked on instances of the object using the syntax\n", "\n", "instance.method()" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "s = \" Hello World! \"" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' Hello World! '" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' hello world! '" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.lower()" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' Hello World! '" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' HELLO WORLD! '" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.upper()" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' Hello World! '" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that lower and upper methods are not destructive. They work on a copy of the given string and do not change the original string. strip() removes leading and trailing spaces from a string." ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello World!'" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.strip()" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' Hello World! '" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Character types: alpha, digit, space\n", "\n", "Three common character categories are alpha, digit, and space. There are methods to check if a given string is in one of these categories." ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.isalpha()" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'abcde'.isalpha()" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'123'.isdigit()" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'123.456'.isdigit()" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.isspace()" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "' \\n\\t\\r\\f'.isspace()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that whitespace includes space, newline, tab, return, and formfeed. These characters do not put ink (or pixels) on the page.\n", "\n", "A common string operation is to match the start, end, or middle of a string." ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "s = s.strip()" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello World!'" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.startswith('Hello')" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.endswith('World!')" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.find(' ')" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello World!'" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.find('Q')" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "x = ' xxx '" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [], "source": [ "y = x" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [], "source": [ "x = x.strip()" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'xxx'" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' xxx '" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello World!'" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5 indicates the sixth character of the string, of course. -1 indicates that the given character is not in the string.\n", "\n", "### String indexing\n", "\n", "Strings are indexed using zero-based indexing. That means that the first character of the string is string[0]" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'H'" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Positive integers index the string from left to right, starting with 0. Negative integers index the string from right to left, starting with -1.\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "
H e l l o W o r l d !
0 1 2 3 4 5 6 7 8 9 10 11
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
\n" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'!'" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[11]" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'!'" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[-1]" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'W'" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[-6]" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "string index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [108]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m s[\u001b[38;5;241m12\u001b[39m]\n", "\u001b[0;31mIndexError\u001b[0m: string index out of range" ] } ], "source": [ "s[12]" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(s)" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len('')" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'x'" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'x'[0]" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "string index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [112]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124m'\u001b[39m[\u001b[38;5;241m0\u001b[39m]\n", "\u001b[0;31mIndexError\u001b[0m: string index out of range" ] } ], "source": [ "''[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "len() tells you how many characters are in a string or the length of the string. The empty string is of zero length. \n", "\n", "If you try to index the string out of bounds, you thrown an IndexError.\n", "\n", "### String slices\n", "\n", "You can specify a subset of a string using the [start:end] notation, where start is inclusive, but end is not.\n", "\n", "If you leave off the start or end, you get beginning or end of the string. Thus, [:] gives you the entire string, or more precisely, a copy of the string." ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ell'" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[1:4]" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'o World!'" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[4:]" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hell'" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:4]" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello World!'" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:] ## useful python idiom for copying a string" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Worl'" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[-6:-2]" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello World'" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[:-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### id() gives the memory address\n", "\n", "As noted above, you can make a copy of a string with the [:] slice, which makes a new copy of the string. The id() function gives you the memory address of an object. Objects with the same address are identical.\n" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [], "source": [ "s = \"a string\"" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139683819578672" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id(s)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [], "source": [ "s2 = s" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139683819578672" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id(s2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "s and s2 have the same address in memory. They are identical. \n", "\n", "== compares values.\n", "\n", "is compares memory addresses." ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s == s2" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s is s2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "s and s2 have the same value and the same address in memory." ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [], "source": [ "s3 = s[:]" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a string'" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139683819578672" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id(s3)" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s == s3" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s is s3" ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139683819578672" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: even though s3 is a copy of s, it has the same address because Python interns strings to save storage. When you create a string, Python checks to see if there is already a string with that value. If so, it just reuses it.\n", "\n" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [], "source": [ "s = s +''" ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139683819578672" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id(s)" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s is s3" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [], "source": [ "s = s + 'x'" ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139684083129072" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id(s)" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s is s3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### extended string slices\n", "\n", "You may use a third parameter to specify the step for the slice, string[start:end:step]" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a stringx'" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 143, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'asrnx'" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[::2] ## every other letter" ] }, { "cell_type": "code", "execution_count": 144, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'atn'" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[::3] ## every third letter" ] }, { "cell_type": "code", "execution_count": 145, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a stringx'" ] }, "execution_count": 145, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[::1] ## every letter" ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'xgnirts a'" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[::-1] ### Reverses the string!" ] }, { "cell_type": "code", "execution_count": 147, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s == s[::-1]" ] }, { "cell_type": "code", "execution_count": 148, "metadata": {}, "outputs": [], "source": [ "x = 'radar'" ] }, { "cell_type": "code", "execution_count": 149, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 149, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x == x[::-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an easy way to check for palindromes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

string replace, split, and join

\n", "\n", "s.replace('old','new') - replace every occurence of \"old\" in s with \"new\"\n", "\n", "s.split(delimiter) - return a list of the elements of string s using the given delimiter to partition the string.\n", "\n", "s.join(list) - splice together the sequential elements of list using the string s as the glue." ] }, { "cell_type": "code", "execution_count": 150, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a stringx'" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a stringx'" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.replace('l','*')" ] }, { "cell_type": "code", "execution_count": 152, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a stringx'" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: replace works on a copy. It does not change the original string." ] }, { "cell_type": "code", "execution_count": 153, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bad boy'" ] }, "execution_count": 153, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'good boy'.replace('good','bad')" ] }, { "cell_type": "code", "execution_count": 154, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bad girl'" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'good boy'.replace('good','bad').replace('boy','girl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: the string returned by the first replace becomes the argument for the second replace." ] }, { "cell_type": "code", "execution_count": 155, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello Worll'" ] }, "execution_count": 155, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Hello World'.replace('l','d').replace('d','l')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need to be careful when doing sequential replaces.\n", "\n", "### split()" ] }, { "cell_type": "code", "execution_count": 156, "metadata": {}, "outputs": [], "source": [ "r = 'Romeo and Juliet'.split()" ] }, { "cell_type": "code", "execution_count": 157, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Romeo', 'and', 'Juliet']" ] }, "execution_count": 157, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r" ] }, { "cell_type": "code", "execution_count": 158, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 158, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(r)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "split returns a list, which we will cover later." ] }, { "cell_type": "code", "execution_count": 159, "metadata": {}, "outputs": [], "source": [ "r2 = 'Romeo and Juliet'.split(' ')" ] }, { "cell_type": "code", "execution_count": 160, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Romeo', 'and', 'Juliet']" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default delimiter is space." ] }, { "cell_type": "code", "execution_count": 161, "metadata": {}, "outputs": [], "source": [ "r3 = '203-555-1212'.split()" ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['203-555-1212']" ] }, "execution_count": 162, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r3" ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [], "source": [ "r4 = '203-555-1212'.split('-')" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['203', '555', '1212']" ] }, "execution_count": 164, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r4" ] }, { "cell_type": "code", "execution_count": 165, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2035551212'" ] }, "execution_count": 165, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r3[0].replace('-','')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The r3 phone number had no spaces, so it did not get split. Using a hyphen as a delimiter, we split r4 in three parts.\n", "\n", "### join()\n", "\n", "We now can glue the lists together with join." ] }, { "cell_type": "code", "execution_count": 166, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Romeo and Juliet'" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "' '.join(r2)" ] }, { "cell_type": "code", "execution_count": 167, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Romeo***and***Juliet'" ] }, "execution_count": 167, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'***'.join(r2)" ] }, { "cell_type": "code", "execution_count": 168, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'203----555----1212'" ] }, "execution_count": 168, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'----'.join(r4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ASCII (and Unicode) characters\n", "\n", "Computer characters from the Roman alphabet are represented as numbers using the American Standard Code for Information Interchange (ASCII)\n", "\n", "However, there are thousands of other characters. Those are represented using Unicode\n", "\n", "Python has two functions, ord(character), and chr(number) which convert between characters and numeric codes." ] }, { "cell_type": "code", "execution_count": 169, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "65" ] }, "execution_count": 169, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ord('A')" ] }, { "cell_type": "code", "execution_count": 170, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "66" ] }, "execution_count": 170, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ord('B')" ] }, { "cell_type": "code", "execution_count": 171, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "97" ] }, "execution_count": 171, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ord('a')" ] }, { "cell_type": "code", "execution_count": 172, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "98" ] }, "execution_count": 172, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ord('b')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that ASCII is designed so that sorting words by their numerical values results in sorting alphabetically. However, upper case letters sort before lower case letters. In UNIX, the ls command for listing directories often reflects this property." ] }, { "cell_type": "code", "execution_count": 173, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'A'" ] }, "execution_count": 173, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(65)" ] }, { "cell_type": "code", "execution_count": 174, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a'" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(97)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can specify Unicode characters using hexidecimal (base 16) notation." ] }, { "cell_type": "code", "execution_count": 175, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 175, "metadata": {}, "output_type": "execute_result" } ], "source": [ "0xA" ] }, { "cell_type": "code", "execution_count": 176, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 176, "metadata": {}, "output_type": "execute_result" } ], "source": [ "0x10" ] }, { "cell_type": "code", "execution_count": 177, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "948" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "0x3b4" ] }, { "cell_type": "code", "execution_count": 178, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'δ'" ] }, "execution_count": 178, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(0x3b4) ## delta" ] }, { "cell_type": "code", "execution_count": 179, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ε'" ] }, "execution_count": 179, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(0x3b5) ## epsilon" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'λ'" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(0x3bb) ## lambda" ] }, { "cell_type": "code", "execution_count": 181, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Δ'" ] }, "execution_count": 181, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(0x394) ## DELTA" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Ε'" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(0x395) ## EPSILON" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'Λ'" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chr(0x39b) ## LAMBDA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### encode and decode\n", "\n", "We can convert a string into an array of bytes, with a specified encoding. The Python string method encode has lots of options. \n", "See encode() and\n", "list of standard encodings" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "s = 'café'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(s)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "b = s.encode('utf8')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'caf\\xc3\\xa9'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'caf\\xc3\\xa9'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.encode('UTF-8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above example is from Chapter 4 of Fluent Python.\n", "\n", "The str 'café' has four Unicode characters.\n", "\n", "\n", "Encode str to bytes using UTF-8 encoding.\n", "\n", "\n", "bytes literals have a b prefix.\n", "\n", "\n", "bytes b has five bytes (the code point for “é” is encoded as two bytes in UTF-8).\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'café'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.decode('utf8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Decode bytes to str using UTF-8 encoding." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See Fluent Python for more details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "End of strings notebook.\n", "\n", "

\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 4 }