CS 200: Data Structures and HW4

We have seen a variety of basic data types in Python, including integers, strings, lists, tuples, and dictionaries.

We have also seen how object oriented programming allows us to define classes that have methods and properties to encapsulate data.

Now, we will use classes to define additional data structures. If you consider the primitive data types as atomic elements, then data structures can be viewed as molecules that are formed by combining various elements.

In this notebook, we shall define and discuss the following data types:

Stacks

A common use of classes is to implement data structures. Below is an example of a stack, which is a LIFO - last in first out - structure. It is a collection.

Items are added to the stack with push and removed with pop.

We will see that the python virtual machine for interpreting byte code is based on a stack architecture.

Let's take our stack out for a test drive.

hw4 problem 1 (8 points)

Write a procedure balanced(string) that reads string, and determines whether its parentheses are "balanced."

Hint: for left delimiters, push onto stack; for right delimiters, pop from stack and check whether popped element matches right delimiter.

We will import the staff solution to demonstrate the functions.

Queues

In the homework, we ask you to write the queue class.

Write a queue data structure, similar to the stack above. Whereas a stack is LIFO (last in first out), a queue is FIFO = first in, first out

See Skiena, page 71. The Algorithm Design Manual Steven Skiena

Yale online book

hw4 problem 3 (10 points)

Create a queue using two stacks: s1 and s2.

enqueue() pushes items on s1.

dequeue() pops s2, unless s2 is empty, in which case keep popping s1 onto s2 until s1 is empty. Then pop s2.

peek is similar to dequeue, except no final pop.

hw4 problem 4 (10 points)

Write a procedure to reverse a queue. It modifies the original queue! It should work with either q implementation. That is, the function should use the standard methods, enqueue and dequeue which are common to both implementations. This demonstrates the value of encapsulation.

Hash Tables. hw4 problem 5 (20 points)

Python dicts are implemented as hash tables.

Reading: Skiena pages 89-93

Video: hash tables

Create a hash table. It will be a list of size buckets. Each bucket will itself contain a list. If two items fall in the same bucket, the respective list will contain both items.

See Skiena page 89

Create a hash function using the djb2 algorithm.

We will show you some bad hash functions below.

Bad hash functions

Hash functions are common and useful in many programming applications. They are critical in many cryptography systems. For examples, bitcoin depends on its hash function being (nearly) impossible to invert (one-way). We will return to the topic of cryptography in a few weeks.

A crypto hash function h(x) must provide the following:

Below are some bad hash functions.

This function achieves the first three objectives: compression, efficiency, and one-way. However, it is not collision resistant. Lots of strings will end up in the same bucket. You want a function that will spread the keys around to different buckets.

Dynamic hash tables can grow over time. The hash table will start with a table size of N, and once N/2 items have been inserted, the table will expand to 2N. Doing so insures that the table does not fill up. Of course this technique will not be effective if the hash function throws every key in the same handful of buckets.

Below is another bad function. This one sums the ASCII values of the characters in the string.

It is a slight improvement over the first hash. However, it still is deplorable.

If you have the same characters in a different order, you get the same hash value. Let's try to fix that problem.

By inserting the multiplication step, we have reduced the collision problem.

The djb2 hash function follows this approach of combining addition with multiplication. Note that ((hash << 5) + hash) is the same as multiplying by 33. It is just faster, since multiplication is typically much slower than shifts and addition. Here is the C++ code for djb2.

    unsigned long
    hash(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;

        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

        return hash;
    }

hw4 problem 6 ** (10 points)

Use your hash function to implement remove duplicates for strings.

Hint: you want to use the hash table to answer the question: have I seen this character already?

Heaps

Video: heap sort

See heap data structure

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap (with no parents) is called the root node.

Below is a max heap.

We use the python heapq algorithm for a min heap.

hw4 problem 7 (20 points)

Reading: Skiena pages 109-115

Skienna sorting chapter

Implement a min heap per the description in Skiena.

hw4 problem 8 (10 points)

Write a function that takes in a list of positive integers of size n and returns a sorted list containing the n/2 smallest elements. Use a heap.

Trees

Graphs