
import. They are also reserved words!import but not an install.We can test for whether a given value is found in a list:
2 in [1,2,3,4] # Is 2 in the list?
True
0 in [1,2,3,4]
False
We can even scan for a value’s first occurrence:
[1,2,3,4].index(3) # In what position is 3?
2
We can generate subsets of lists (or other sequences) using slicing
alist[start_index : end_index]
start_index is the index of the first item to includeend_index is the index just after the slice you want-1 to indicate the last index on the list, -2 the next to last, etc.numbers = [1,"two",3.0,(1+1+1+1)]
numbers
[1, 'two', 3.0, 4]
numbers[1:2]
['two']
numbers[:2]
[1, 'two']
numbers[:-1]
[1, 'two', 3.0]
numbers[:-2]
[1, 'two']
Try this yourself, one line at a time:
bunch = ["Mike", "Carol", ["Greg","Marcia", "Peter", "Jan", "Bobby", "Cindy"]]
bunch[2]
bunch[3]
bunch[2][1]
bunch[2][1][2]
bunch[1]
bunch[1][1]
bunch[1][1][1]
bunch = ["Mike", "Carol", ["Greg","Marcia", "Peter", "Jan", "Bobby", "Cindy"]]
bunch[2]
['Greg', 'Marcia', 'Peter', 'Jan', 'Bobby', 'Cindy']
bunch[3]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-44-2c8a33e857c6> in <module>() ----> 1 bunch[3] IndexError: list index out of range
bunch[2][1]
'Marcia'
bunch[2][1][2]
'r'
bunch[1]
'Carol'
bunch[1][1]
'a'
bunch[1][1][1]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-49-b31211a60f29> in <module>() ----> 1 bunch[1][1][1] IndexError: string index out of range
# Add / Append
area_codes = [212,646]
area_codes += [347, 718, 917, 929]
area_codes
[212, 646, 347, 718, 917, 929]
# Replace
area_codes[2] = 110
area_codes
[212, 646, 110, 718, 917, 929]
# Delete
del area_codes[2]
area_codes
[212, 646, 718, 917, 929]
# Insert
area_codes[2:2] = [347]
area_codes
[212, 646, 347, 718, 917, 929]
sum=0 # numerical accumulator variable
cpy=[] # list accumulator variable
for v in [2, 5, 2]:
sum += v # same as sum = sum + v
cpy += [v] # same as cpy = cpy + [v]
print(sum,cpy)
9 [2, 5, 2]
We can create a copy of any sequence with the list() conversion function (like int(), float(), str())
list( (1,2,3)) # from a tuple (more in a few minutes)
[1, 2, 3]
list("abcd") # from a string (more in a few minutes)
['a', 'b', 'c', 'd']
list(range(4)) # from a range generator function
[0, 1, 2, 3]
[ < output exp > for < item > in < orig seq > if < boolean exp > ]
# Note that keywords for/in/if are used to separate the clauses
[i for i in range(1,x+1) if int(x/i)==x/i]
x=100
# Use an accumulator to build a list of factors
factors=[] # initialize an empty factors list
for i in range(1,x+1):
if int(x/i)==x/i: # test for divisibility (no remainder)
factors += [i] # add to factors list
print(factors)
# Now do it with a list comprehension
factors=[i for i in range(1,x+1) if int(x/i)==x/i]
print(factors)
[1, 2, 4, 5, 10, 20, 25, 50, 100] [1, 2, 4, 5, 10, 20, 25, 50, 100]
list2 to list1 modifies list1list2 to list1 creates a new list# create two lists
list1 = [212,646]
list2 = [347,718,917,929]
list1 + list2 # Concatenate list2 to list1
[212, 646, 347, 718, 917, 929]
list1 # unchanged
[212, 646]
list1 += list2 # Append list2 to list1
list1 # changed
[212, 646, 347, 718, 917, 929]
A tuple is a fixed (literal) sequence of items enclosed with parentheses (instead of square brackets).
You've already seen them in math class:
Tuples work just like lists except they are immutable: their contents cannot be modified in any way after creation.
x = (1,2,3)
print(type(x), x)
<class 'tuple'> (1, 2, 3)
y = [1,2,3]
print(type(y), y)
<class 'list'> [1, 2, 3]
z = tuple(y)
print(type(z),z)
<class 'tuple'> (1, 2, 3)
(1,2,3) + (4,5) # Concatenate tuples
(1, 2, 3, 4, 5)
x += (4,5) # Append tuples works but actually creates a new tuple
x
(1, 2, 3, 4, 5)
(1,2,3)[:-1] # Slicing works fine
(1, 2)
( i*(i-1)/2 for i in range(10) ) # Comprehensions don't quite work though
# Warning: it doesn't throw an error!
<generator object <genexpr> at 0x10e452780>
Go back and look at function definitions again:
def add_two_numbers(a,b)
return a+b
The parameter definitions are actually tuple-like:
x =(1,2)
y = x # y is a copy of x
x += (3,4) # modify x
print(x)
print(y)
(1, 2, 3, 4) (1, 2)
If we modify the original list we also modify the alias list
x = [1,2]
y = x # y is an alias for x
x += [3,4] # modify x
print(x)
print(y)
[1, 2, 3, 4] [1, 2, 3, 4]
String values are said to be literals (like tuples). When displayed in the interpreter they are always shown with quotes:
"ABC"
'ABC'
Three kinds of string literals:
'ABC'
"I can say 'ABC'"
'''can span
multiple lines.'''
tuple("ABC")
('A', 'B', 'C')
"ABC"[1]
'B'
"ABC" + "DE"
'ABCDE'
"ABC" += "DE"
File "<ipython-input-77-1dc1d4028867>", line 1 "ABC" += "DE" ^ SyntaxError: can't assign to literal
Strings get all of the tuple methods for free. However, they also come with lots of additional built-in methods. (RTFM.)
upper(), lower(), capitalize()strip(), lstrip(), rstrip()center(), ljust(), rjust()count(),find(), rfind(), index(), rindex(),replace(), format()Once again: Strings are no more editable than tuples. upper(), strip(), etc. just return new strings.
We can split a delimited string into a list of items
"a,b,c".split(",") # produces ['a','b','c']
We can also do the reverse, building a delimited string from a list of items.
",".join(['a','b','c']) # produces 'a,b,c'
Many of the relational comparisons that apply to numbers also apply to strings using lexicographic ordering.
"XYZ" == "ABC"
False
"ABC" < "XYZ"
True
"XYZ" < "ABC"
False
for Loops¶This works exactly like a tuple, list, or any other sequence:
s = "ABC"
# use s like a tuple or list
for c in s:
print(c)
# does the same thing but more verbose
for i in range(len(s)):
print(s[i])
This should be no surprise since strings are just a special kind of sequence.
while Loops¶while loops work just like before.
s = "ABC"
i = 0
while i < len(s):
print(i,s[i])
i += 1
We can, of course, do lots more inside the loop body. This is just an example.
in (and not in)¶We have already seen in used in for loops
for c in s:
print(c)
Technically, the 'c in s' part of the for loop header is actually a boolean expression that can be used to test if the value of variable c is inside the sequence s.
It works fine with literals as well:
"B" in "ABCD" # True
"B" not in "ABCD" # False
1 in [2,1,3] # True
Regular Expressions (regex) are a very, very old idea first explored in the early 1950s.
re standard library.import re
# the string to be searched; in this case a slightly mangled line of course info
base_string = "34379,AY,0010,01,Intro Four-Field Anthropology,3 TF 1100-1215pm,Lacy S,WDIV"
# the regex pattern; compiled before use
pattern = re.compile('(TBA|[Bb]y [Aa]rrangement|[Oo]nline|[MTWRFSU]+ [0-9]{4}-[0-9]{4}[PpAa][Mm])')
# find all pattern matches in the base string; returns a list of strings
timecodes = pattern.findall(base_string)
timecodes
['TF 1100-1215pm']
Dictionaries are a lot like lists:
However, instead of using numerical indexes (0, 1, 2, ...), dictionaries use keys, which can be any imutable type (though we usually use strings). We can even use integer keys, just like a list.
Instead of [ and ], dictionaries are bracketed with { and }:
{ <key-1> : <value-1>, <key-2> : <value-2>, ... }
Each key:value pair represents one term in the dictionary:
For example,
nums = {'one':1,'two':2,'three':3}
nums['one']
1
We can even use integer keys, just like a list.
listlike_dict = {1:'a',2:'b',3:'c'}
listlike_dict[2]
'b'
md5 hash function on ‘Go Stags!’:import hashlib
hashlib.md5(b'Go Stags!').hexdigest()
'59a060123aeddcba30023c46396aa5d8'
nums = {'one':1,'two':2,'three':3}
nums['four'] = 4 # add a key-value pair
nums
{'one': 1, 'two': 2, 'three': 3, 'four': 4}
nums['four']= 5 # Replace an item
nums
{'one': 1, 'two': 2, 'three': 3, 'four': 5}
del nums['four'] # Delete an item
nums
{'one': 1, 'two': 2, 'three': 3}
nums['one':'four'] # An attempt to slice
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-86-1601752d7262> in <module>() ----> 1 nums['one':'four'] # An attempt to slice TypeError: unhashable type: 'slice'
nums = {'one':1,'two':2,'three':3}
ints = nums # ints is an alias of nums
ints
{'one': 1, 'two': 2, 'three': 3}
del nums['three']
ints
{'one': 1, 'two': 2}
ints = dict(nums) # ints is a copy of nums
nums['three']=3
ints
{'one': 1, 'two': 2}
Dictionaries come with a few methods designed to make iteration really simple:
nums = {'one':1,'two':2,'three':3}
print(nums.keys())
print(nums.values())
for (k,v) in nums.items(): # steps through each key:value pair
print(k, "maps to", v)
FYI: the (k,v) in the for loop is a tuple. Each pass through the loop gets a fresh (k,v) tuple.
nums = {'one':1,'two':2,'three':3}
print(nums.keys())
print(nums.values())
for (k,v) in nums.items():
print(k, "maps to", v)
dict_keys(['one', 'two', 'three']) dict_values([1, 2, 3]) one maps to 1 two maps to 2 three maps to 3
zip() Function¶The zip() function matches up corresponding items in two equal-length lists to generate key-value pairs.
columns = ['first_name', 'last_name']
dict( zip(columns, ['Al','Gebra']) )
{'first_name': 'Al', 'last_name': 'Gebra'}
dict( zip(columns, ['Betty','Boop']) )
{'first_name': 'Betty', 'last_name': 'Boop'}
Most datasets are organized into tables. Consider, for example, the typical CSV file:
This suggests a simple data structure:
columns = ['ID','Waist','Hip','Gender']
table = []
table += [dict(zip(columns,[1,30,32,'M']))]
table += [dict(zip(columns,[2,32,37,'M']))]
table += [dict(zip(columns,[3,30,36,'M']))]
table
[{'ID': 1, 'Waist': 30, 'Hip': 32, 'Gender': 'M'},
{'ID': 2, 'Waist': 32, 'Hip': 37, 'Gender': 'M'},
{'ID': 3, 'Waist': 30, 'Hip': 36, 'Gender': 'M'}]

A relative path navigates to the file from where your Python program resides.
data1.txt ../MyData/data2.txt ../../otherFiles/ExtraData/data4.txt Each segment of the path is an instruction:
<subfolder name>/../ <file name> Try it: For each path at the top of the slide, say exactly what navigation directions are being given.
Due to a poor design decision by IBM in the early 1980s, Windows uses \ (backslash) instead of / (slash) in its file paths.
In modern versions of Windows, slash and backslash can be used fairly interchangeably. However, there are bugs ...
Fortunately, Python's os and os.path libraries provide utilities for using OS-independent (canonical) paths in your code.


bash-2.3\$ is a prompt from the Bash 'shell'. Bash is yet another language (complete with variables, loops, conditionals, etc.) that is commonly used for terminal sessions on Unix systems. pwd is a bash command to display the present working directory.
ls to see things in other folders. In this case the path is a subfolder of the current folder. . (dot) (or if you prefer, ./). If no path is supplied then ls assumes you meant ./. 
cd (change directory) command, again supplying a path.cd without a path argument returns us to the user's home folder. 
.. (dot-dot).../../open() function returns a Python object that we can use to read or write data from/to a file. close() method closes it when we are done. f_in=open(<filepath>,"r") # open for reading
# read data
f_in.close() # close the file when done reading
f_out=open(<filepath>,"w") # open for writing
# write data
f_out.close() # close the file when done writing
To avoid data corruption, never open a file for reading and writing at the same.
lines = list(f)
for loop:for current_line in f:
# do something with current_line
readline() or readlines():readline() # read the next line
readlines(n) # read up to the next n lines
Each line of the file is a string. Typically, a data file will use a delimiter character like a tab, space, or comma to divide each line (string) into fields:
We use the string split() method to generate a list of strings (one per field) for each line.
fields = line.split(',')
The string for the last field on a line will have a newline character (or two!) at the end. You'll need to strip that out yourself. (RTFM.)
Writing to a text file is similar to reading from one, except that you have to explicitly write a newline character \n at the end of each line.
f=open("myfile.txt","w")
f.write("a line of text\n")
f.close()
Always, always, close the file when you are done writing. File systems write data in chunks instead of one character at a time. Closing the file forces Python to write the last chunk.
with Statements¶To avoid ever forgetting to close a file, use a with statement:
with open(<filepath>) as <file alias>:
# do something with the file
For example ...
with open("myfile.txt") as f:
f.readline() # read the first line
Python automatically calls close() for us at the end of the with statement body.
The following is due before class next week:
Please email chuntley@fairfield.edu if you have any problems or questions.