## CS 200: Dicts and Files in Python

This notebook mirrors the <a target=ee href="https://developer.google.com/edu/python/dict-files">Google Python Course: Dict - Files</a>

<script language="JavaScript">
    document.write("Last modified: " + document.lastModified)
</script>

#### Video: 

See <a target=ww href="https://www.socratica.com/lesson/dictionaries">Dictionaries</a> from Socratica


Python supports dictionaries as a native data type.  Dictionaries, or dicts, are content addressable arrays of data.  They are normally implemented as hash tables.  We wlll see how to create a hash table from scratch later on.

Strings are delimited by quotes.  Lists are delimited by square balckets.  Tuples are delimited by parentheses. Dicts are delimited by curly brackets.

In [63]:
d = {}

In [64]:
d['a'] = 'alpha'

In [65]:
d['b'] = 'beta'

In [66]:
d['c'] = 'gamma'

In [67]:
d

{'a': 'alpha', 'b': 'beta', 'c': 'gamma'}

In [68]:
len(d)

3

In [69]:
d['b']

'beta'

In [70]:
d['s']

KeyError: 's'

Python lists and strings are indexed by position, e.g., 0, 1, 2.  Python dicts are indexed by content.  Above we assign the string 'alpha' to the dict entry 'a'.  We access the values using these labels as well.

Dicts, like lists, but unlike tuples, are mutable.

In [71]:
d['b'] = 'new value'

In [72]:
d['b'] 

'new value'

In [73]:
d

{'a': 'alpha', 'b': 'new value', 'c': 'gamma'}

In [74]:
d['z']

KeyError: 'z'

In [75]:
'a' in d

True

In [76]:
'z' in d

False

If you ask for a value that is not in the dict, python throws a KeyError.  You may use the <b>in</b> operator first to check if the value is in the dictionary.

By default, iterating over a dict uses the keys.

In [77]:
for key in d:
     print (key)

a
b
c


In [78]:
for key in d:
    print (d[key])

alpha
new value
gamma


In [79]:
[k for k in d]

['a', 'b', 'c']

### dict methods

Using dir() we can see other native dict methods.

In [80]:
dir(d)

['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__ror__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [81]:
d.keys()

dict_keys(['a', 'b', 'c'])

In [82]:
type(d.keys())

dict_keys

In [83]:
list(d.keys())

['a', 'b', 'c']

In [84]:
for key in d.keys():
    print (key)

a
b
c


In [85]:
for value in d.values():
    print (value)

alpha
new value
gamma


In [86]:
d.items()

dict_items([('a', 'alpha'), ('b', 'new value'), ('c', 'gamma')])

In [87]:
for key, value in d.items():
    print (key, value)

a alpha
b new value
c gamma


### Tuple Assignment

In [88]:
(a, b, c) = 1,2,3

In [89]:
a

1

In [90]:
b

2

In [91]:
c

3

In [92]:
(a,b) = b,a

In [93]:
b

1

In [94]:
a

2

We were able to swap a and b without using a temporary variable.

### copy()

We will try the following code in <a target=wiwiwi href="https://pythontutor.com/">PythonTutor</a>

<pre>
d = {}
d['a'] = 'alpha'
d['b'] = 'beta'
d['c'] = 'gamma'
newd = d.copy()
</pre>


In [95]:
newd = d.copy()

In [62]:
newd

{'a': 'alpha', 'b': 'new value', 'c': 'gamma'}

In [35]:
d == newd

True

In [36]:
newd['a'] = 'new alpha'

In [37]:
newd

{'a': 'new alpha', 'b': 'new value', 'c': 'gamma'}

In [38]:
d == newd

False

In [39]:
d

{'a': 'alpha', 'b': 'new value', 'c': 'gamma'}

In [40]:
newd.clear()

In [41]:
newd

{}

In [42]:
keys = ['a', 'b', 'c']

In [43]:
d2 = dict.fromkeys(keys)

In [44]:
d2

{'a': None, 'b': None, 'c': None}

In [45]:
d3 = dict.fromkeys(keys, 'something')

In [46]:
d3

{'a': 'something', 'b': 'something', 'c': 'something'}

fromkeys() is a <b>class method</b>.  It is invoked using the class, not an instance of the class.  It creates a new dict with the given keys and value.  If no value is specified, it uses 'None', a special python value.

In [47]:
d

{'a': 'alpha', 'b': 'new value', 'c': 'gamma'}

In [48]:
d.get('a')

'alpha'

In [49]:
d['a']

'alpha'

get() is an alternative to the square bracket notation.

In [50]:
d.pop('c')

'gamma'

In [51]:
d

{'a': 'alpha', 'b': 'new value'}

In [52]:
d.pop('z')

KeyError: 'z'

pop(key) removes the item with the given key from the dict and returns its value.  It there is no item with the given key, python throws a KeyError.

In [53]:
d.popitem()

('b', 'new value')

In [54]:
d

{'a': 'alpha'}

In [55]:
d.popitem()

('a', 'alpha')

In [56]:
d

{}

In [57]:
d.popitem()

KeyError: 'popitem(): dictionary is empty'

popitem() removes the last item from the dict.  If the dict is empty, python throws a KeyError.  Note: pop() returns the key, and popitem() returns the key-value pair as a dictionary.  Both remove the last item.

In [58]:
person = {'Name': 'Jon', 'Age': 10}

In [59]:
person

{'Name': 'Jon', 'Age': 10}

In [60]:
person.setdefault('Name', None)

'Jon'

In [61]:
person.setdefault('Gender','unknown')

'unknown'

In [62]:
person

{'Name': 'Jon', 'Age': 10, 'Gender': 'unknown'}

The setdefault() method provides a way to handle missing keys in a dict, avoiding the dreaded KeyError.

In [63]:
d = {1: 'one', 2: 'three'}

In [64]:
d2 = {2: 'two', 3: "three"}

In [65]:
d

{1: 'one', 2: 'three'}

In [66]:
d2

{2: 'two', 3: 'three'}

In [67]:
d.update(d2)

In [68]:
d

{1: 'one', 2: 'two', 3: 'three'}

The update() method merges two dictionaries, changing or inserting items as needed

### del() function

Finally, the del() function can be used with variables, lists, and dicts. It removes the given item.  As they say in the CIA, it terminates with extreme prejudice. 

In [69]:
x = 4

In [70]:
x

4

In [71]:
del(x)

In [72]:
x

NameError: name 'x' is not defined

In [73]:
lst = [1,2,3,4]

In [74]:
del(lst[0])

In [75]:
lst

[2, 3, 4]

In [76]:
d 

{1: 'one', 2: 'two', 3: 'three'}

In [77]:
del(d[1])

In [78]:
d

{2: 'two', 3: 'three'}

## Files

### Video:

See <a target=tt href="https://www.socratica.com/lesson/text-files">Text Files</a> from Socratica

It is quite common for computer programs to use files for input and output.  That is, a program may read from a file (input) or write to a file (output).  It may also add to the end of an existing file (append).

The open(filename, mode) command is used for both reading and writing. The filename specifes the actual file. The mode is a string value of either r (read), w (write), a (append), or x (create a file, return an error if it already exists). You should call close() once you are finished reading or writing.

In [103]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position

In [104]:
f = open("testfile", 'w')

In [105]:
for x in "this is a test".split():
    f.write(x)
    f.write('\n')

In [106]:
f.close()

open() creates an iterator which can process the file one line at a time.

In [107]:
for line in open("testfile", "r"):
    print(line, end='')

this
is
a
test


Note: file is implicitly closed  once the iteration is complete.

In [86]:
f = open("testfile", 'a')
f.write("this is another line\n")
f.close()

In [87]:
for line in open("testfile", 'r'):
    print(line)

this

is

a

test

this is another line



### word count

We will now read in the file, split it up into words, and then use a dict to count how many times each word occurs.  We use the read() method which reads the entire file.

In [88]:
f = open("testfile", 'r')

In [89]:
lines = f.read()

In [90]:
lines

'this\nis\na\ntest\nthis is another line\n'

In [91]:
words = lines.split()

In [92]:
words

['this', 'is', 'a', 'test', 'this', 'is', 'another', 'line']

In [93]:
count = {}

In [94]:
for word in words:
    if word in count:
        count[word] += 1
    else:
        count[word] = 1

In [111]:
count

{'this': 3, 'is': 5, 'a': 1, 'test': 1, 'another': 4, 'linethis': 2, 'line': 2}

End of dict-files notebook.