## Dataframes
Another important technique in programming languages is being able to create "composite" variables which
have several different parts to them. These are used to group things together. In very general case these are called "structures" or "classes" in Python (as well as in many other programming languages). 

Before we look at how to create general Classes and Structures, we will look at a specific example, Dataframe. Dataframes in Python are a particular example of a class/structure in Python. It is a very useful tool and it shows what can be dome with Classes, Structures and Functions that we will learn about. 

Here we use Dataframes together with Numpy arrays tomake some plots of sea-level versus time for different locations.

Dataframes are very like spread-sheets, but geared toward use in a program. 

In [None]:
# Check your conda environment to make sure pandas is installed. (In Anconda Navigator this is easy)
# ! conda install pandas
import pandas
import matplotlib.pyplot as plt

In [None]:
# Lets create a really simple Dataframe
mydataset = { 'course number': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ,
              'course name': ["CEE", "Mech E", "DMSE", "Arch", "Chem", "EECS", "Bio", "Phys", "BCS", "Chem E"]}

In [None]:
myDF=pandas.DataFrame(mydataset)

In [None]:
# Now we have a simple table
myDF

In [None]:
# We can lookup a course number!
myDF[myDF['course number']==10]

In [None]:
# Note we also used a Dictionary to intialize e.g.
print("Type of variable mydataset =", type(mydataset))
# The dictionary keys are
print("Dictionary keys ", mydataset.keys())

In [None]:
# As a "Class" the Dataframe package includes all sorts of nice featueres.
#
# For example it knows how to find and read HTML tables from a simple page. 
#
# Here is an example scraping some data related t tide-gauge data measuring sea-surface height. 
# Site -  https://www.psmsl.org maintains tide gauge data for sea-level for locations all around the world,
#         including Woods Hole.
#
# The site page "obtaining/" has a table that pandas can identify and read. 
#
# This table lists tide gauges all over the planet that measure the sea-
# 
# The read_html dataframe  sl_list_read contains only one item (? sl_list_read to 
# check).  sl_list = sl_list_read[0] creates a data frame with each line as a 
# separate item (check length of frame with ? sl_list)
if ( 'sl_list_read' in locals()) :
    print('sl_list_read already read')
else:     
    sl_list_read = pandas.read_html("https://www.psmsl.org/data/obtaining/")
    print('Getting sl_list_read from www.psmsl.org')
    
sl_list = sl_list_read[0]
sl_list

In [None]:
? sl_list

In [None]:
# As a "Class" Pandas includes useful functions for its sorts of data.
# For example it includes code for searching its tables, which we can use to find
# data from Woods Hole.
swh=sl_list[sl_list['Station Name'].str.contains('WOODS')==True]
swh

In [None]:
# We can the read the Woods Hole data and extract into a Numpy array
# and make a plot
# 1. construct station URL using ID field (per PSMSL documentation)
# 2. read data
# 3. convert table to an array of rows and columns
# 4. plot height v time
whurl="http://www.psmsl.org/data/obtaining/rlr.monthly.data/%d.rlrdata"%(swh['ID'])
print(whurl)
df = pandas.read_csv(whurl,delimiter=';')
npdat =df.to_numpy()
t=npdat[:,0];h=npdat[:,1]
plt.rcParams['figure.figsize'] = [20, 10]
plt.plot(t,(h-h[0])/10);

In [None]:
# Uh-oh, that looks weird!
# Check the minimum - it has a "bad-data" flag by the looks of it.
print("Min h =",h.min())
# Fix and try again
df = pandas.read_csv(whurl,delimiter=';')
df=df[df.iloc[:,1]>=-1000]
npdat =df.to_numpy()
t=npdat[:,0];h=npdat[:,1]
plt.rcParams['figure.figsize'] = [20, 10]
plt.plot(t,(h-h[0])/10);
plt.xlabel("Year",fontsize=20)
plt.ylabel("Tide Height (cm)",fontsize=20)
# Sea-level looks to have risen by about 30cm at the Woods Hole location over the last 100 years.

In [None]:
# Now lets look at another location - this time near Finland
sfi=sl_list[sl_list['Station Name'].str.contains('FOGLO')==True]
sfi
fiurl="http://www.psmsl.org/data/obtaining/rlr.monthly.data/%d.rlrdata"%(sfi['ID'])
print(fiurl)

In [None]:
df = pandas.read_csv(fiurl,delimiter=';')
df=df[df.iloc[:,1]>=-1000]

In [None]:
npdat =df.to_numpy()

In [None]:
t=npdat[:,0];h=npdat[:,1]

In [None]:
plt.rcParams['figure.figsize'] = [20, 10]
plt.plot(t,(h-h[0])/10)
plt.xlabel("Year",fontsize=20)
plt.ylabel("Tide Height (cm)",fontsize=20)

In [None]:
# Any thoughts why the series near Finland appears to go down by ~20cm over 100 years!