{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## CS 200: Sorting in Python\n", "\n", "\n", "\n", "

\n", "\n", "This notebook mirrors the Google Python Class: Sorting\n", "\n", "### Video:\n", "\n", "See Sorting from Socratica.\n", "\n", "In the Lists notebook, we introduced the sort() method for lists." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "l = [1,2,3,1,2,3,1,2,3]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "l2 = l[:] ## make a copy of l" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 1, 2, 3, 1, 2, 3]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l2" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "l2.sort()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 1, 1, 2, 2, 2, 3, 3, 3]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l2" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 1, 2, 3, 1, 2, 3]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the sort() method is destructive. It clobbers the old version of l2. It sorts the list in place\n", "\n", "Python has a non-destructive sort function: sorted(). It works on a copy of the list. Note that it is a function, not a method. It takes a list as an argument." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 1, 2, 3, 1, 2, 3]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 1, 1, 2, 2, 2, 3, 3, 3]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 1, 2, 3, 1, 2, 3]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3, 3, 3, 2, 2, 2, 1, 1, 1]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l, reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "sorted(), like sort(), has an optional reverse parameter for sorting in descending order.\n", "\n", "We can visualize what's going on inside Python with the PythonTutor web application. We will open it \n", "and insert the following code.\n", "\n", "

\n",
    "l = [1,2,3,1,2,3,1,2,3]\n",
    "l2 = l[:]\n",
    "l2.sort()\n",
    "l3 = l\n",
    "l3.sort()\n",
    "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sorting Strings\n", "\n", "We can also sort lists of strings." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "s = 'to be or not to be that is the question'" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "l = s.split()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['be', 'be', 'is', 'not', 'or', 'question', 'that', 'the', 'to', 'to']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['to', 'to', 'the', 'that', 'question', 'or', 'not', 'is', 'be', 'be']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l, reverse=True)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "sorting is one of the fundamental algorithms in computer science. There are dozens of different sort routines including insert sort, bubble sort, merge sort, quick sort, and heap sort. One thing in common in all these alrogithms is the need to compare two items to see if they are already in order or not. The default way to compare two items is using less than (<). This works for numbers and strings, as we have seen that strings are basically numbers, underneath the hood.\n", "\n", "However, there are many other possible comparison functions. Suppose that we have a list of words and we want to sort them by length. sort and sorted have a key parameter which can specify a function." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['to', 'be', 'or', 'to', 'be', 'is', 'not', 'the', 'that', 'question']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l, key=len)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['question', 'that', 'not', 'the', 'to', 'be', 'or', 'to', 'be', 'is']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l,key=len, reverse=True)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "l2 = ['aa', 'BB', 'CC','zz']" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['BB', 'CC', 'aa', 'zz']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As noted above, ASCII uppercase characters are numerically less than lowercase letters. If we want to sort a mixed case list, we can use the key parameter to specify the lower method for strings." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['aa', 'BB', 'CC', 'zz']" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l2, key=str.lower)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the list elements have not changed. str.lower() is applied when performing the comparison.\n", "\n", "The key argument does not need to be a builtin Python function or method. It can be defined by the user. Below we define a function to return the last character of a string." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def MyFn(s):\n", " return s[-1]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "l3 = ['wd', 'zc', 'xb', 'ya']" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ya', 'xb', 'zc', 'wd']" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l3, key=MyFn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use a lambda expression." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['wd', 'zc', 'xb', 'ya']" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l3" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ya', 'xb', 'zc', 'wd']" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(l3, key=lambda x: x[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tuples\n", "\n", "Tuples are immutable lists, delimited by parentheses." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "tuple = (1, 2, 'hi')" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(tuple)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'hi'" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tuple[2]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'tuple' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtuple\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" ] } ], "source": [ "tuple[1] = 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create a tuple of length 1, the lone element must be followed by a comma. Otherwise, the parentheses are ignored and the value is simply the value of element inside." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "newtuple = ('hi',)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'newtuple' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnewtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'newtuple' is not defined" ] } ], "source": [ "type(newtuple)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "badtuple = ('hi')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(badtuple)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'hi'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "badtuple" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

List Comprensions - introduction

\n", "\n", "Python allows the users to embed for loops inside the list! This notation is known as a list comprehension. Here are some examples." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "nums = [1,2,3,4]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 4, 9, 16]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[n * n for n in nums]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above list comprehension is equivalent to the following for loop." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "result = []\n", "for n in nums:\n", " result.append(n*n)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 4, 9, 16]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "strs = ['hello', 'and', 'goodbye']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['HELLO!!!', 'AND!!!', 'GOODBYE!!!']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[s.upper() + '!!!' for s in strs]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "result = []\n", "for s in strs:\n", " result.append(s.upper() + '!!!')" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['HELLO!!!', 'AND!!!', 'GOODBYE!!!']" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also introduce conditionals inside the for loops." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nums" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[n for n in nums if n <= 2]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "fruits = ['apple', 'banana', 'cherry', 'lemon']" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['APPLE', 'BANANA']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[s.upper() for s in fruits if 'a' in s]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "End of sorting notebook.\n", "\n", "\n", "

\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 4 }