A brief introduction to the Python programming language¶
What is Python?¶
Python is a "high-level" programming language - one that is fairly easy for a human to learn to understand and use.
It relies on a program called an "interpreter" which "translates" human-readable instructions that we write in the Python language into machine code that can be used to directly manipulate computer hardware.
It is the most popular programming language at present, by many measures. It is especially popular with scientific researchers for its readability, relative simplicity (as compared to other programming languages), and because of the many, many "libraries" of useful Python programs that have been authored and open-sourced (meaning we can all use them for free).
Basics¶
Let's begin with a print statement. A print statement accepts any piece of information that we can render as text and displays it for us.
print statements are incredibly useful for developing programs that display information to the user of the program, as well as debugging (figuring out why your program isn't working).
You can run the following code cells by clicking the "play" button, or using Shift+Enter.
The first cell you run will take awhile as Google Colab sets up a connection for you.
# You can pass in a piece of text and display it.
print('Hello, world!')
# You must surround that text with single or double quotes.
print("Hello, world!")
Note that the above two lines of code do the exact same thing. In Python, a string (an ordered sequence of text characters) can be specified using either single quotes or double quotes. Both are fine!
We can check whether something is a string, a number, or any other sort of data type using the type function.
# The type() function tells you that 'Hello, world!' is a string: type str.
print(type('Hello, world!'))
# Single or double quotes - it's still a string!
print(type("Hello, world!"))
Common data types¶
Besides the str type for text, some other important data types are int (integers), float (numbers with decimal points), and bool (booleans, which take the values True and False).
# Integers are numbers with no decimal point.
print('the type of 37 is:', type(37))
# Floating point numbers have a decimal point somewhere.
print('the type of 37.0 is:', type(37.0))
# Actually, nothing needs to follow the decimal point!
print('the type of 37. is:', type(37.))
# Note that True is capitalized and does not have quotes!
print('the type of True is:', type(True))
# Same for False!
print('the type of False is:', type(False))
You may have noticed by this point that text written in the code cells that is preceded by a hashtag (#) symbol is just more writing - it is not considered code!
These are referred to as comments - text that is intended to be ignored by the Python interpreter.
Comments are important because they allow us to remember what a piece of code does if we come back to it after a long time. They also help other programmers understand our code without looking too closely.
Mathematics¶
Computers are mainly designed to work with numbers, or things representable as numbers. Here are the basic mathematical operations that Python offers.
print('7 + 3 =', 7 + 3) # Addition.
print('7 - 3 =', 7 - 3) # Subtraction.
print('3 - 7 =', 3 - 7) # Negatives are fine!
print('7 * 3 =', 7 * 3) # Multiplication uses an asterisk.
print('7 / 3 =', 7 / 3) # Division uses a forward slash.
print('7 ** 3 =', 7 ** 3) # Power (exponent) - seven to the third power.
print('7 // 3 =', 7 // 3) # Integer division - 7/3, and chop off the decimal.
print('7 % 3 =', 7 % 3) # Modulo - 7/3, and give just the remainder.
Boolean logic (True and False)¶
There are many expressions and operators for deciding if a statement is True or False. This is important, because it is a simple way for the program to make different decisions in different circumstances! We'll see how that works in the next section.
print('7 == 3:', 7 == 3) # Check for equality - note double equal sign!
print('7 != 3:', 7 != 3) # Not equal - exclamation points often mean "not" in programming.
print('7 > 3:', 7 > 3) # Greater.
print('7 < 3:', 7 < 3) # Less.
print('7 >= 3:', 7 >= 3) # Greater OR equal.
print('7 <= 3:', 7 <= 3) # Less OR equal.
Additionally, we can combine several conditions together using logical operators (and, or, and not):
print(' True and True: ', True and True)
print(' True or not True: ', True or not True)
print('not False and True: ', not False and True)
print(' False or False: ', False or False)
Using boolean logic in a program¶
Let's have the program make a choice! We let a program make decisions using if, elif (short for else-if) and else statements.
An if statement checks whether a condition is True or False. If the condition is True, the code associated with that if statement is executed.
We may optionally add elif or else statements, but these are never required.
We associate code with each "branch" of the if-elif-else statements by indenting it (starting each line with a Tab character). In general, code associated with some particular context is tabbed over the requisite number of times.
NOTE: In any given collection of if-elif-else statements, only ONE of the branches is executed.
Try playing with the condition below and see how this works!
# Store the value of the statement (5 > 3) into a variable.
# The variable name is "condition".
#
# The following line of code can be read as:
# "Store the value of the expression (5 > 3) into the condition variable."
#
# Note the single equals sign (=) which means:
# "Store the value on the right into the variable on the left."
condition = (5 < 3) # Evaluates the expression (5 > 3), then stores it.
if condition: # An if statement checks if a condition is True.
print('5 is greater than 3.') # line begins with a Tab
print('Condition evaluates to:', condition) # this line too!
else: # If the statement is False, we can include an else statement to execute instead.
print('5 is not greater than 3.') # Tab once again, this time for the else statement
print('Condition evaluates to:', condition) # and again!
In the above example, note that we did not put quotes around the word "condition" when writing our code.
This is because condition is not a string or a piece of text here - it is the name of a variable.
A variable is a way to refer back to a stored value. Whenever we use that variable, we access and operate on its value.
A slightly more complicated example is below with 3 branches. Try playing with values of x and y to see what happens.
x = 7 # Store the value 7 into the variable x.
y = 2 # Store the value 2 into the variable y.
if x > y: # You can evaluate the condition right in the if statement!
print('x is more.')
elif y > x: # Same for an elif statement.
print('y is more.')
else: # If neither is more than the other, they must be equal!
print('x and y are the same.')
Note that the first branch whose condition evaluates to True is executed!
x = 6 # Now, store the value 6 into the variable x.
y = 6 # Also, update y to contain the value 6.
if x == y: # If x and y are both 6, this is True.
print('x equals y.')
elif x >= y: # If x and y are both 6, this is also True, but is skipped.
print('x is greater than y or equal to y.')
else:
print('x is less than y.')
Data structures¶
A data structure is a data type that is used to organize, access, and store pieces of data.
Python implements a few basic ones. Many others are available through built-in libraries such as collections and third-party libraries available through PyPI.
The basic ones are often powerful enough for most purposes.
Lists¶
Lists are ordered sequences of data. They are mutable (you can update them after creating them by adding more items, removing items, replacing items, etc).
A list is specified using square brackets [] and comma-separating the entries.
# Creating a list:
x = [5, 6, 7, 8, 9, 10, 11, 12]
print(' original list: ', x)
# Appending an entry to the end of a list:
x.append(13)
print(' after appending 13 to the end: ', x)
# Replacing an entry in a list - note that the first position is numbered 0!
x[2] = 30 # Updating the third position to have the value 30.
print('replaced third position (index 2) with value 30: ', x)
# Deleting an entry from a list:
del x[2]
print(' deleted that entry: ', x)
# Inserting an entry into a list:
x.insert(2, 7)
print(' inserted 7 at third position (index 2): ', x)
# "Slicing" = selecting some contiguous part of the list.
# Note that we are selecting indices 2-4 inclusive, which are
# the third, fourth, and fifth positions:
print(' selecting entries at positions 3, 4, and 5: ', x[2:5])
# Checking for membership of some element:
print(' checking whether 5 is in the list: ', 5 in x)
print(' checking whether 1 is in the list: ', 1 in x)
# Checking how many entries are in the list:
print(' the length of the list x: ', len(x))
Tuples¶
Tuples are just like lists, except for two key differences:
- They are specified using parentheses
()instead of brackets. - They are immutable (cannot be modified after being created).
# Creating a tuple:
x = (5, 6, 7, 8, 9, 10, 11, 12)
print(' original tuple: ', x)
# Slicing is possible, nothing else is though!
print('selecting entries at positions 3, 4, and 5: ', x[2:5])
# Checking for membership of some element:
print(' checking whether 5 is in the tuple: ', 5 in x)
print(' checking whether 1 is in the tuple: ', 1 in x)
# Checking how many entries are in the list:
print(' the length of the tuple x: ', len(x))
Sets¶
Sets are also just like lists, except for three key differences:
- They are specified using braces
{}instead of brackets. - They are unordered - inserting elements in some order does not guarantee they will be found at the corresponding index.
- They do not allow duplicates - an element either is or is not in the set; it cannot be added multiple times.
x = {9, 8, 7, 6, 5}
print(' original set: ', x)
# Checking for membership of some element:
print(' checking whether 5 is in the set: ', 5 in x)
print(' checking whether 1 is in the set: ', 1 in x)
# Printing the elements contained in x:
print(' iterating over the contents of x: ', end=' ')
for element in x:
print(element, end=' ') # We also indent the code "inside" a for loop.
print()
# Comparing two sets that look different but aren't!
print(' x is the same as {9, 9, 8, 7, 6, 5}: ', x == {9, 9, 8, 7, 6, 5})
# Checking how many entries are in the set:
print(' the length of the set x: ', len(x))
# Joining two sets together:
print(' union of {1, 2, 3} with {4, 5, 6}: ', {1, 2, 3}.union({4, 5, 6}))
The construction we just used above is called a for loop.
A for loop is generally used when we know there are a certain number of executions of a piece of code we would like to repeat.
Rather than rewrite the same piece of code $n$ times with slight variations each time, we can write a single for loop that encodes all the same instructions!
The above for loop will assign the variable element to take every possible value contained in the data structure x, one after the other.
The code inside the for loop (clearly "inside" because it's indented) will be repeated for each of those assignments of element.
Therefore, the loop prints the value of each item in our set x, one after the other.
Dictionaries¶
A dictionary (dict) is kind of a set and kind of a list.
A dictionary can be thought of as a set of keys (no duplicate keys allowed, and unoredered).
Every key in our dictionary can be associated with some piece of data, called a value. Keys are generally integers, strings, or some other very simple data type. Values can be anything you want.
We can index/iterate over a dictionary using the keys. A for loop iterating over a dictionary will assign the loop variable (in the example below, the loop variable is called key) with the value of each key from the dictionary's keys.
x = {
'a': [1, 2, 3],
'b': (4, 5, 6),
'c': {7, 8, 9}
}
print(" x['a'] contains: ", x['a']) # Indexing with the string 'a' gives the list [1, 2, 3].
print(" x['b'] contains: ", x['b']) # Indexing with the string 'b' gives the tuple (4, 5, 6).
print(" x['c'] contains: ", x['c']) # Indexing with the string 'c' gives the set {7, 8, 9}.
# Assigning a value to a key in a dictionary:
x['d'] = 5
print("x['d'] created, contains: ", x['d'])
# Overwriting a value in a dictionary:
x['a'] = [11, 12, 13]
print("x['a'] updated, contains: ", x['a'])
# Iterating over a dictionary:
for key in x:
print()
print(' key: ', key)
print(' value: ', x[key])
print()
# Checking how many entries are in the dict:
print('the length of the dict x: ', len(x))
Exercises Part I¶
Let's practice what we learned with some exercises.
- Define a variable called
helloand assign it the string value "world".
# YOUR CODE HERE:
- Print the type and the value of the expression
7 % 3 == 1.
# YOUR CODE HERE:
- Assign a variable named
conditionany boolean expression you like (something that evaluates toTrueorFalse). Write a program that prints1ifconditionevaluates toTrueand prints0if condition evaluates toFalse.
# YOUR CODE HERE:
- Create an empty list called
z. Use aforloop to append the integers 1 through 8 inclusive. Check that the length of the list is 8 when you finish.
HINT: The expression range(x, y) gives a data structure containing the integers $x$ through $y-1$, inclusive.
# YOUR CODE HERE:
- Define a variable
nameand assign it your first name (as a string) as its value. Define a dictionary calledchar_to_index.
Iterate over the characters in name and add key-value pairs to the char_to_index dictionary such that each character in your name is mapped to the index of that character in your name.
Example: My name is "Dan", so my dictionary would look like the following when my code finishes running:
char_to_index = {
"D": 0, # "D" is the first character in "Dan" (index 0)
"a": 1, # "a" is the second character in "Dan" (index 1)
"n": 2 # "n" is the third character in "Dan" (index 2)
}
Extra challenge: handle duplicates by letting each value be a list of integers rather than just one integer.
# YOUR CODE HERE:
Functions¶
Functions are the most important and useful construction in Python. They allow you to define custom behaviors, and are highly re-usable. Write them once and use them as many times as you need to!
def average(numbers): # Define the function "average". It expects an argument called "numbers".
total = 0 # Start a sum off at 0.
for number in numbers: # Iterate over the given numbers.
total = total + number # Add number to the total (update the value of total).
avg = total / len(numbers) # Divide the total by the number of elements.
return avg # That's the average. Send that information back.
# Calling the average() function replaces the call with the value it returns.
print(average([2, 4])) # average([2, 4]) is a "call" to the average() function.
print(average([1, 2, 8, 9])) # Another call to the average function with different numbers.
Importing libraries¶
Pandas¶
pandas is a library that lets you define and use DataFrames, much like in R.
!pip install pandas
# Import the pandas library, refer to it as "pd".
import pandas as pd
df = pd.read_csv( # Gives a DataFrame much like R.
'sample_data/california_housing_test.csv', # File is in the sample_data folder.
sep=',' # Entries are comma-separated.
)
print(df.head(5)) # Show first 5 rows of the df.
SciPy¶
scipy is a library with a lot of statistical functions and other scientific tools.
# Import the pearsonr() function from the scipy.stats library.
from scipy.stats import pearsonr
print( # Pearson rank correlation on two lists.
pearsonr([1, 4, 3, 2, 5], [2, 1, 3, 4, 5])
)
print( # Pearson rank correlation on two columns from our df.
pearsonr(df['median_income'], df['median_house_value'])
)
StatsModels¶
statsmodels is a library with various statistical modeling functionalities (regressions of all kinds, among other things).
# Import the api submodule of statsmodels, call it "sm".
import statsmodels.api as sm
model = sm.OLS( # Fit an ordinary least-squares linear regression.
df['median_house_value'], # Predict the house value column,
df.drop(columns=['median_house_value']) # using all the other columns.
)
result = model.fit()
print(result.summary())
Transformers¶
transformers is a HuggingFace library that provides access to a huge number of open-source AI models that can do things like speaker diarization (segmenting audio into pieces depending on who is talking) and automatic speech recognition (speech-to-text transcription). Many of these models run on your device and preserve your data privacy.
You can try running Whisper on an audio file I provide here.
# Import the pipeline() function from the transformers library.
from transformers import pipeline
pipe = pipeline( # Get a pipeline and assign it to the "pipe" variable.
"automatic-speech-recognition", # Task is speech-to-text.
model="openai/whisper-large-v3-turbo" # Use this model, Whisper Turbo.
)
print(pipe('audio.wav')) # Result of processing this audio file with Whisper Turbo.
MatPlotLib¶
matplotlib is a library that serves as the basis for many data visualization (plotting) libraries, and it has many basic functionalities of its own.
# Import the pyplot module and call it "plt".
import matplotlib.pyplot as plt
plt.scatter( # Scatter plot.
df['median_income'], # x-values.
df['median_house_value'], # y-values.
s=2 # Make dots small, size 2.
)
plt.title('Median house value vs. median income') # Add a title.
plt.xlabel('Median income (scaled)') # Add an x-axis label.
plt.ylabel('Median house value') # Add a y-axis label.
plt.show() # Display the plot.
Plotly¶
plotly is a more advanced data visualization library that produces interactive visualizations.
# Import graph_objects module, call it "go".
import plotly.graph_objects as go
# Copied an example from the internet and changed it a bit.
fig = go.Figure(data=go.Scattergeo(
locationmode = 'USA-states',
lon = df['longitude'],
lat = df['latitude'],
hovertext = df['median_house_value'],
mode = 'markers',
marker = dict(
size = 8,
opacity = 0.8,
reversescale = True,
autocolorscale = False,
symbol = 'square',
line = dict(
width=1,
color='rgba(102, 102, 102)'
),
colorscale = 'Viridis',
color = df['median_house_value'],
cmin = df['median_house_value'].min(),
cmax = df['median_house_value'].max(),
colorbar=dict(
title=dict(
text="Median house value"
)
)
)))
fig.update_layout(
title = 'California cities by median house value',
geo = dict(
scope='usa',
projection_type='albers usa',
showland = True,
landcolor = "rgb(250, 250, 250)",
subunitcolor = "rgb(217, 217, 217)",
countrycolor = "rgb(217, 217, 217)",
countrywidth = 0.5,
subunitwidth = 0.5
),
)
fig.show()
Exercises Part II¶
- Write a function
sum_evenwhich takes a list of numbers callednumbersand computes the sum of all the numbers in that list which are even.
HINT: You can check if a number x is even with the expression x % 2 == 0.
# YOUR CODE HERE:
- Write a function
concatenatewhich takes a list of strings calledstringsand joins them together, end-to-end.
HINT: The + operator can be used on strings in a very intuitive way.
# YOUR CODE HERE:
- Head to the documentation for
pd.DataFrame. Choose any three of the Methods available to a Pandas DataFrame, execute them, and describe what they do.
# YOUR CODE HERE:
- Head to the documentation for
scipy.stats.shapiroand run a Shapiro-Wilk test on each of the columns in ourdf.
HINT: you can iterate over df.columns.
# YOUR CODE HERE:
- Head to the documentation for
statsmodels.formula.api.smf.mixedlm. Use formula notation to modelmedian_house_valueas a funtion ofmedian_income, grouping by'housing_median_age'.
NOTE: The example they give in the documentation is almost exactly what you want, but note that our data is stored in df already - we don't need lines 1 or 3 in their example.
# YOUR CODE HERE:
[IF TIME] Classes¶
Classes are one of the trickiest concepts to deal with when learning to program.
The right way to think of a class is as a "blueprint" or a "template."
We can define a class to represent any kind of data we want - these are basically custom data types.
In Python, there is a class called int which basically implements all the functionality that integers have in Python. Similarly, there are bool, str, and float classes as well.
We won't get into all the details of how this code works, but it is useful to see, and will help us to understand the later syntax we will see using the dot . notation.
class Car:
# When we call the Car() class, this function is executed.
def __init__(
self,
color: str, # Expects a str for "color".
make: str, # Expects a str for "make".
model: str, # Expects a str for "model".
x: float, # Expects a number for "x".
y: float # Expects a number for "y".
):
self.color = color # Store this car's color long-term.
self.make = make # Store this car's make long-term.
self.model = model # Store this car's model long-term.
self.x = x # Store this car's x-coordinate long-term.
self.y = y # Store this car's y-coordinate long-term.
# Print a message after creating this car.
print(f'Placed a {self.color} {self.make} {self.model} at: ({self.x}, {self.y}).')
# When we call .drive(...) on the car, this function is executed.
def drive(
self,
x_change: float, # Expects a float for x_change.
y_change: float # Expects a float for y_change.
):
self.x += x_change # Add x_change to this car's x-coord.
self.y += y_change # Add y_change to this car's y-coord.
# Print a message whenever we move this car.
print(f'Drove the {self.color} {self.make} {self.model} car to: ({self.x}, {self.y}).')
car1 = Car('red', 'Porsche', '911', 1, 7) # Create a car and refer to it as "car1".
car2 = Car('black', 'Buick', 'GNX', 4, 2) # Create another car and refer to it as "car2".
car1.drive(-3, 5) # Drive 3 to the left, 5 up.
car2.drive(4, 7) # Drive 4 to the right, 7 up.
# Dot notation allows us to access info about car2.
print('What color is car2?', car2.color)
# Similar dot notation allows us to access similar info about car1.
print('What color is car1?', car1.color)