Lesson 7: Databases

Filesystems Review Questions

  • What might happen to a busy ext2 volume on power loss?
  • ext3?
  • ext4?

Note

  • ext2 may not be consistent upon restart
  • ext3 and ext 4 are not
  • but consistency only guarantees metadata is intact
  • What might happen to a busy ext2 volume on power loss?
  • ext3?
  • ext4?

But what about our poor data?

  • Possibly gone, like the wind
  • Or worse: Half completed writes!
  • General purpose operating systems, by design, don’t understand structured data

Enter the Database

A database system’s fundamental goal is to provide consistent views of structured data using the tools the operating system makes available.

Chief among them is fsync(2)

Note

fsync instructs the operating system to flush all writes to disk before returning

Structure

SQL databases are structured around Relational Algebra

  • Tables
    • Columns are fields
    • Rows define a relation between fields
  • A Primary key is a set of columns that uniquely identify rows in a table
  • A Foreign key is a column that matches the primary key of another table

Relational Algebra Visualized


_images/inner-outer-join-venn.jpg

Note

joins are the principle use of relations.

Installing MariaDB (better, open source MySQL)

$ sudo apt-get install python-software-properties

# Add MariaDB repositories
$ sudo vim /etc/apt/sources.list.d/mariadb.list

# Add these lines
deb http://mirror.jmu.edu/pub/mariadb/repo/10.0/debian wheezy main
deb-src http://mirror.jmu.edu/pub/mariadb/repo/10.0/debian wheezy main

# Import repository GPG key
sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 0xcbcb082a1bb943db

# Install MariaDB
sudo apt-get update && sudo apt-get install mariadb-server mariadb-client

Managing MariaDB

# Check if MariaDB is running
$ sudo service mysql status

# Start/Stop/Restart MariaDB
$ sudo service mysql start/stop/restart

# Check if database server is up
$ mysqladmin -p ping

# Create table
$ mysqladmin -p create nobel

Configuration

  • /etc/my.conf

  • The most important MySQL tuning rule:

    • almost always prefer InnoDB (MariaDB default)

Users & Permissions

$ sudo mysql -p
mysql> CREATE USER 'vagrant'@'localhost'
       IDENTIFIED BY 'password';

mysql> GRANT ALL PRIVILEGES ON nobel.*
       TO 'vagrant'@'localhost'
       WITH GRANT OPTION;

Importing Data

$ wget https://sw.cs.wwu.edu/~griffi21/nobel.mysql
$ mysql -p nobel < nobel.mysql
$ mysql -p nobel
SHOW TABLES;
DESCRIBE nobel;

Installing PostgreSQL (Usually pronounced Postgres)

$ sudo apt-get install postgresql postgresql-client

Managing PostgreSQL

# Check if PostgreSQL is running
$ sudo service postgresql status

# Start/Stop/Restart Postgres
$ sudo service postgresql start/stop/restart

Configuration

  • /etc/postgresql/<version>/postgresql.conf
    • Version is probably 9.1

Users & Permissions

$ sudo su postgres
$ psql
postgres=# CREATE USER vagrant WITH PASSWORD 'password';
postgres=# CREATE DATABASE nobel OWNER vagrant;

Importing Data

$ wget https://sw.cs.wwu.edu/~griffi21/nobel.psql
$ psql nobel < nobel.psql
# Show tables
postgres=# \dt

# Describe nobel table
postgres=# \d+ nobel

Basic Queries

4 basic operations on data:

  • SELECT
  • INSERT
  • UPDATE
  • DELETE

SELECT

SELECT
   yr, subject, winner
FROM
   nobel
WHERE
   yr = 1960;

Practice

  • Who won the prize for Medicine in 1952?
  • How many people were awarded the 1903 Nobel in Physics?
  • How many prizes were awarded to Linus Pauling?
  • How many people have won more than once? (Difficult)

INSERT

INSERT VALUES
   ('2013','Literature','Herta Müller')
INTO
   nobel;

Note

this data stops at 2008, so lets insert some 2009 awards

Practice

In 2009:

UPDATE

UPDATE
   nobel
SET
   winner='Andrew Ryan'
WHERE
   subject='Peace' AND yr='1951';

Note

obviously Andrew Ryan deserves the peace price for his work in the Rapture planned community

Practice

  • Brigid Tenenbaum Medicine prize in 1952

DELETE

DELETE FROM
   nobel
WHERE
   yr = 1989, subject = peace;

Note

peace prizes can be controversial, and perhaps there’s a political interest in censoring our database?

Further Reading, Resources, etc.

  • Codd, E.F. (1970). “A Relational Model of Data for Large Shared Data Banks”. Communications of the ACM 13 (6): 377–387.
  • sqlzoo.net
  • CS 440: Database Management Systems

Hands-On: Make a Database

  • Create a new database
mysql> create database systemview

mysql> GRANT ALL PRIVILEGES ON systemview.*
       TO 'vagrant'@'localhost'
       WITH GRANT OPTION;
  • Grant a user privileges on your new database

Note

challenge them to do this based on the material in the last hour, maybe also demo the mysql console. Make sure everyone remembers the username and password for the next step.

Databases in Applications

Applications love databases.

  • Application data - the information to be displayed and manipulated
  • User data - complex authentication and authorization
  • Logging, statistics, state and session data, etc...

Note

All the various things an app might use a database for - note that the vast majority of web apps use them for something

Native SQL

Most languages allow you to speak directly to a database

Python:

#!/usr/bin/python
import MySQLdb

db = ("localhost","testuser","test123","nobel" )

cursor = db.cursor()

cursor.execute("SELECT subject, yr, winner FROM nobel WHERE yr = 1960)

data = cursor.fetchall()

for winner in data:
    print "%s winner in %s: %s " % (winner[0], winner[1], winner[2])

db.close()

Note

Note the plain SQL statement, recognizable from earlier. Point out the cumbersome nature of creating the connection, creating a cursor, sending the sql, getting data from the cursor (iterating over it if you want multiple results), etc. Similar interfaces exist for virtually all languages.

Introducing the ORM

Object Relational Mapper

  • Maps an Object in an application to a database table or relationship
  • Talks SQL to the database, your favorite language to you
  • Lets you point to different databases with the same syntax
  • Intelligently manages transactions to the database

Note

Make sure people know what you mean by “object”, mention possible difference between Postgres, sqlite, MySql, etc. Objects may map to one table, but might also incorporate relationships. ORMs also often optimize queries and manage transactions to make database queries as efficient as possible (like all other magic, though, sometimes this can backfire).

Life With a Python ORM


Look, ma! No SQL!

for subject, yr, winner in session.query(Nobel).filter_by(yr=1960):
    print "%s winner in %s: %s " % (subject, yr, winner)

Much easier to read and understand, but requires some setting up first.

Note

Of course we actually have to do a lot of setup work - setting up the model, engine, session, etc - but you do that once and can interact with the database as much as you want, without worrying about the cursor or connection. Note that we have no SQL in this statement, it is pythonic and has pythonic methods. The database table is now an object.

Setting Up the Magic - SqlAlchemy

SqlAlchemy - a popular Python ORM, frequently used in Flask apps (like SystemView!).

To use it, we’ll need to:

  • Import sqlalchemy
  • Create a “model” - a representation of our data in code
  • Create an “engine” and connect it to the database
  • Create a session to store the model instances and transactions

Note

Model:A object with all the properties, attributes, etc of our data, can also include code to manipulate that data in order to represent a specific view (i.e. automatically returning sorted results). It’s just a python class, instances are just python objects.
Engine:This handles the authentication with the database, it’s like the MySQLdb.connect above.
Session:An in-memory record of your changes to objects - all the orm objects you instantiate live int he session, and are only saved to the database when you say so.

Let’s Databasify Systemview

Project:

  • Store search terms, then provide them as links on the search page, so you can just click the most common terms you search for.

What else? Ideas?

Note

Solicit ideas for another column or two, maybe number of times the term is used (easy incrementing example), or number of results from the least search.

Hands On

  • Install the following packages:
sudo yum install python-devel
sudo yum install mysql-devel
  • Check out systemview from GitHub (if you don’t have it already)
git clone git@github.com:DevOpsBootcamp/systemview

Hands On (Cont...)

  • Switch to ‘save-search’ branch
git checkout -tb save-search origin/save-search
  • Activate your virtualenv
source <path to virtualenv>/bin/activate
  • Install the requirements
pip install -r requirements.txt

Note

Talk about git branches again, explain tracking, git pull for people who already have it cloned, etc. Talk about the virtualenv, have people create a new one if they have lost the one they made last time. Talk about pip and what requirements.txt is all about - point out how easy it is to set up an app this way. Make sure requirements.txt contains sqlalchemy.

DANGER! - people will need mysql-dev package! name varies by distribution, for centos it is libmysqlclient-dev

Goals

  • Connect the app to your new database
  • Add a new column
  • Save data to that column whenever someone searches
  • Fetch the data from that column and display it on the search page
  • challenge: limit the returned result to only 5 terms

http://docs.sqlalchemy.org/en/rel_0_9/orm/tutorial.html

Note

The code in the repo should have a simple model with one column, ‘term’, you can make a models.py, or just put it all in one file. If you separate them, talk about MVC. The code should start an sqlalchemy engine and session, save the search term normalized (lowercased, stripped), the column should be set to unique. Make sure the code handles the case of the term already existing in the database (when you add a count, increment the count when the term exists). You should probably initialize the db directly in the code, otherwise you’ll have to open up a python console, import the app and run the db update.