GEOG 4/575:

Digital Compliation and Database Design


Instructor: Jackson Voelkel (jvoelkel@pdx.edu)

Winter 2018

GIS Data Management

Database - Definition

A very large, integrated collection of data.

A shared collection of logically related data designed to meet the information needs of an organization.

  • User doesn’t need to know how data are physically stored
  • Requires access control, etc.

Models of the “real world”.

  • Entities (e.g. students, courses)
  • Relationships (e.g. ‘Jackson [person] likes Pizza [food]’)

What Does a Database Do?

ESRI Stack

Open Source Stack

What is a ‘relation’?

Cardinality

How does the data get stored?

geog575=# \d countries
                                   Table "public.countries"
   Column   |         Type          |                        Modifiers                        
------------+-----------------------+---------------------------------------------------------
 gid        | integer               | not null default nextval('countries_gid_seq'::regclass)
 fips_cntry | character varying(2)  | 
 gmi_cntry  | character varying(3)  | 
 iso_2digit | character varying(2)  | 
 iso_3digit | character varying(3)  | 
 iso_num    | smallint              | 
 cntry_name | character varying(40) | 
 long_name  | character varying(40) | 
 isoshrtnam | character varying(45) | 
 unshrtnam  | character varying(55) | 
 locshrtnam | character varying(43) | 
 loclngnam  | character varying(74) | 
 status     | character varying(60) | 
 pop2007    | double precision      | 
 sqkm       | double precision      | 
 sqmi       | double precision      | 
 land_sqkm  | integer               | 
 colormap   | smallint              | default 0
 the_geom   | geometry              | 

Indexes:
 "countries_pkey" PRIMARY KEY, btree (gid)
Check constraints:
  "countries_sqkm_check" CHECK (sqkm > 0::double precision)
  "countries_sqmi_check" CHECK (sqmi > 0::double precision)
  "countries_cntry_name_check" CHECK (cntry_name::text !~ '[^a-zA-Z.&, ''-]'::text)
  "enforce_dims_the_geom" CHECK (st_ndims(the_geom) = 2)
  "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) = 'MULTIPOLYGON' OR the_geom IS NULL)
  "enforce_srid_the_geom" CHECK (st_srid(the_geom) = 3995)

geog575=# \d countries
                                   Table "public.countries"
   Column   |         Type          |                        Modifiers                        
------------+-----------------------+---------------------------------------------------------
 gid        | integer               | not null default nextval('countries_gid_seq'::regclass)
 fips_cntry | character varying(2)  | 
 gmi_cntry  | character varying(3)  | 
 iso_2digit | character varying(2)  | 
 iso_3digit | character varying(3)  | 
 iso_num    | smallint              | 
 cntry_name | character varying(40) | 
 long_name  | character varying(40) | 
 isoshrtnam | character varying(45) | 
 unshrtnam  | character varying(55) | 
 locshrtnam | character varying(43) | 
 loclngnam  | character varying(74) | 
 status     | character varying(60) | 
 pop2007    | double precision      | 
 sqkm       | double precision      | 
 sqmi       | double precision      | 
 land_sqkm  | integer               | 
 colormap   | smallint              | default 0
 the_geom   | geometry              | 

Indexes:
 "countries_pkey" PRIMARY KEY, btree (gid)
Check constraints:
  "countries_sqkm_check" CHECK (sqkm > 0::double precision)
  "countries_sqmi_check" CHECK (sqmi > 0::double precision)
  "countries_cntry_name_check" CHECK (cntry_name::text !~ '[^a-zA-Z.&, ''-]'::text)
  "enforce_dims_the_geom" CHECK (st_ndims(the_geom) = 2)
  "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) = 'MULTIPOLYGON' OR the_geom IS NULL)
  "enforce_srid_the_geom" CHECK (st_srid(the_geom) = 3995)

Representing Numbers

Binary Numbers


   Number   |   Code    |   Bit Depth                        
------------+-----------+---------------
0           | 0         |      1
1           | 1         |      1
2           | 10        |      2
3           | 11        |      2
4           | 100       |      3
5           | 101       |      3
6           | 110       |      3
7           | 111       |      3
8           | 1000      |      4
9           | 1001      |      4
10          | 1010      |      4
11          | 1011      |      4
12          | 1100      |      4
13          | 1101      |      4
14          | 1110      |      4
15          | 1111      |      4

ASCII

American Standards Code for Information Interchange

128 Characters:

Char  Dec  Oct  Hex | Char  Dec  Oct  Hex | Char  Dec  Oct  Hex | Char Dec  Oct   Hex
-------------------------------------------------------------------------------------
(nul)   0 0000 0x00 | (sp)   32 0040 0x20 | @      64 0100 0x40 | `      96 0140 0x60
(soh)   1 0001 0x01 | !      33 0041 0x21 | A      65 0101 0x41 | a      97 0141 0x61
(stx)   2 0002 0x02 | "      34 0042 0x22 | B      66 0102 0x42 | b      98 0142 0x62
(etx)   3 0003 0x03 | #      35 0043 0x23 | C      67 0103 0x43 | c      99 0143 0x63
(eot)   4 0004 0x04 | $      36 0044 0x24 | D      68 0104 0x44 | d     100 0144 0x64
(enq)   5 0005 0x05 | %      37 0045 0x25 | E      69 0105 0x45 | e     101 0145 0x65
(ack)   6 0006 0x06 | &      38 0046 0x26 | F      70 0106 0x46 | f     102 0146 0x66
(bel)   7 0007 0x07 | '      39 0047 0x27 | G      71 0107 0x47 | g     103 0147 0x67
(bs)    8 0010 0x08 | (      40 0050 0x28 | H      72 0110 0x48 | h     104 0150 0x68
(ht)    9 0011 0x09 | )      41 0051 0x29 | I      73 0111 0x49 | i     105 0151 0x69
(nl)   10 0012 0x0a | *      42 0052 0x2a | J      74 0112 0x4a | j     106 0152 0x6a
(vt)   11 0013 0x0b | +      43 0053 0x2b | K      75 0113 0x4b | k     107 0153 0x6b
(np)   12 0014 0x0c | ,      44 0054 0x2c | L      76 0114 0x4c | l     108 0154 0x6c
(cr)   13 0015 0x0d | -      45 0055 0x2d | M      77 0115 0x4d | m     109 0155 0x6d
(so)   14 0016 0x0e | .      46 0056 0x2e | N      78 0116 0x4e | n     110 0156 0x6e
(si)   15 0017 0x0f | /      47 0057 0x2f | O      79 0117 0x4f | o     111 0157 0x6f
(dle)  16 0020 0x10 | 0      48 0060 0x30 | P      80 0120 0x50 | p     112 0160 0x70
(dc1)  17 0021 0x11 | 1      49 0061 0x31 | Q      81 0121 0x51 | q     113 0161 0x71
(dc2)  18 0022 0x12 | 2      50 0062 0x32 | R      82 0122 0x52 | r     114 0162 0x72
(dc3)  19 0023 0x13 | 3      51 0063 0x33 | S      83 0123 0x53 | s     115 0163 0x73
(dc4)  20 0024 0x14 | 4      52 0064 0x34 | T      84 0124 0x54 | t     116 0164 0x74
(nak)  21 0025 0x15 | 5      53 0065 0x35 | U      85 0125 0x55 | u     117 0165 0x75
(syn)  22 0026 0x16 | 6      54 0066 0x36 | V      86 0126 0x56 | v     118 0166 0x76
(etb)  23 0027 0x17 | 7      55 0067 0x37 | W      87 0127 0x57 | w     119 0167 0x77
(can)  24 0030 0x18 | 8      56 0070 0x38 | X      88 0130 0x58 | x     120 0170 0x78
(em)   25 0031 0x19 | 9      57 0071 0x39 | Y      89 0131 0x59 | y     121 0171 0x79
(sub)  26 0032 0x1a | :      58 0072 0x3a | Z      90 0132 0x5a | z     122 0172 0x7a
(esc)  27 0033 0x1b | ;      59 0073 0x3b | [      91 0133 0x5b | {     123 0173 0x7b
(fs)   28 0034 0x1c | <      60 0074 0x3c | \      92 0134 0x5c | |     124 0174 0x7c
(gs)   29 0035 0x1d | =      61 0075 0x3d | ]      93 0135 0x5d | }     125 0175 0x7d
(rs)   30 0036 0x1e | >      62 0076 0x3e | ^      94 0136 0x5e | ~     126 0176 0x7e
(us)   31 0037 0x1f | ?      63 0077 0x3f | _      95 0137 0x5f | (del) 127 0177 0x7f

UTF

Unicode (Universal Coded Character Set) Transformation Format

1,112,064 Characters!

gis=# \l
                                  List of databases
   Name    |  Owner   | Encoding |  Collation  |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 gis       | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 

Week 1 Review: What can go wrong?

What’s wrong?

Find all “Hedging”: IF use1 = ‘hedging’ OR use2 = ‘hedging’ …

How many uses can we have?

How can we fix this?

How can we determine what is a “use”?

Chinchilla hallucinogenic

Swinging like Tarzan

How can we restrict these uses?

User input, field restrictions, etc (Churcher Ch. 11)

What’s wrong?

Farm/Field repeated enough that errors are likely.

Date is repeated for every entry.

How can we fix this?

Why did that work? What are we actually measuring here (farms, insects, etc)?

Week 1 Review: Data Modeling

Relationships

What is the cardinality?

What could be a real 1.1 < - - - > 1.1 cardinality?

[US Citizens]1.1 < - - - > 1.1[SSN]

What could be a 0.1 < - - - > 1.1?

[Observation]0.1 < - - - > 1.1[Tree Falling]?

Week 1 Review: Use Cases

What can/should be recorded?

Get in the head of the actor!

GIS Use Cases

What are they? (e.g. in business, government, academics, etc.)

Lab Questions?

… besides problem 12 …

  1. Why won’t the SELECT clause work without the FROM clause?

  1. When you use the ORDER BY clause and choose the option descending, what does that do to the data returned?

  1. What would be the ordering of the following statement:
SELECT prod_desc,prod_id,prod_price
FROM products
ORDER BY 3,1;

  1. What is another way to write the ORDER clause in the previous query?

  1. Can I have more than one AND in the WHERE clause?

  1. True or false: Both conditions when using the OR operator must be TRUE.

  1. True or false: All specified values must match when using the IN operator.

  1. True or false: The AND operator can be used in the SELECT and the WHERE clauses.

  1. What is the logical negation of the IN operator?

  1. Write a SELECT statement that returns the name and cost of each product from the products. Which product is the most expensive?

  1. Write a query that generates a list of all customers and their email addresses.

  1. Using the customer table, write a SELECT statement that returns customer IDs and customer names (in alphabetical order) for customers who live in Indiana, Ohio, Michigan, or Illinois and whose names begin with the letters A or B. (Note: the query will not return any rows, the point is to write the query).

  1. Using the products table, write a SELECT statement that returns the product ID, product description, and product cost. Limit the product cost to between $4.00 and $10.00.
SELECT prod_id, prod_desc, prod_price
FROM products
WHERE prod_price BETWEEN 4.00 AND 10.00; 

  1. Assuming that you used the BETWEEN operator in the previous question, rewrite your SQL statement to achieve the same results using different operators. If you did not use the BETWEEN operator, do so now.
SELECT prod_id, prod_desc, prod_price
FROM products
WHERE prod_price > 4.00
AND prod_price < 10.00;

  1. Pick three items from the products table. Now write a query to return the rows of data from the table associated with those three items. Now rewrite the query to return everything but those three items. For your query use combinations of equality operators and conjunctive operators.

Data Integrity

ACID

Atomicity

Either all of a trasaction is executed or none of it is - a transaction can never be partially completed.

Atomicity

00 BEGIN; - -Starts a transaction

HR’s Balance is $2,000

IT’s Balance is $5,000

01 UPDATE accounts SET balance=balance-1000 WHERE account_code=‘HR’;

- - - POWER OUTAGE - - -

02 SELECT balance FROM accounts WHERE account_code=‘HR’;

HR’s Balance is $ ? $2,000!

IT’s Balance is $ ? $5,000!

Consistency

Data in a database remain, before and after the transaction, in a “consistent state” as specified by the database schema and other constraints and integrity rules imposed on the database.

Limits:

  • Pizza toppings
  • Plant uses
  • Age (<= 125)

Constraint:

  • Can’t add farm without field
  • Can’t add a duplicate entry

Isolation

Requires the results of simultaneous transactions to be independent of each other.

Durability

After the completion of a transaction, its results can and always will be traced even if the system fails or crashes.

The transaction is incomplete until the results are stored!

Keys


SELECT * FROM employees;
  emp_id   | last_name | first_name | middle_name |        address        |     city     | state |  zip  |   phone    |   pager    
-----------+-----------+------------+-------------+-----------------------+--------------+-------+-------+------------+------------
 311549902 | STEPHENS  | TINA       | DAWN        | RR 3 BOX 17A          | GREENWOOD    | IN    | 47890 | 3178784465 | 
 442346889 | PLEW      | LINDA      | CAROL       | 3301 BEACON           | INDIANAPOLIS | IN    | 46224 | 3172978990 | 
 213764555 | GLASS     | BRANDON    | SCOTT       | 1710 MAIN ST          | WHITELAND    | IN    | 47885 | 3178984321 | 3175709980
 313782439 | GLASS     | JACOB      |             | 3789 WHITE RIVER BLVD | INDIANAPOLIS | IN    | 45734 | 3175457676 | 8887345678
 220984332 | WALLACE   | MARIAH     |             | 7889 KEYSTONE AVE     | INDIANAPOLIS | IN    | 46741 | 3173325986 | 
 443679012 | SPURGEON  | TIFFANY    |             | 5 GEORGE COURT        | INDIANAPOLIS | IN    | 46234 | 3175679007 | 

SELECT * FROM employee_pay;
  emp_id   |   position    | date_hire  | pay_rate | date_last_raise |  salary  |  bonus  
-----------+---------------+------------+----------+-----------------+----------+---------
 311549902 | MARKETING     | 1989-05-23 |          | 1999-05-01      | 40000.00 |        
 442346889 | TEAM LEADER   | 1990-06-17 |    14.75 | 1999-06-01      |          |        
 213764555 | SALES MANAGER | 1994-08-14 |          | 1999-08-01      | 30000.00 | 2000.00
 313782439 | SALESMAN      | 1997-06-28 |          |                 | 20000.00 | 1000.00
 220984332 | SHIPPER       | 1996-07-22 |    11.00 | 1999-07-01      |          |        
 443679012 | SHIPPER       | 1991-01-14 |    15.00 | 1999-01-01      |          |        

Primary


SELECT * FROM employee_pay;
  emp_id   |   position    | date_hire  | pay_rate | date_last_raise |  salary  |  bonus  
-----------+---------------+------------+----------+-----------------+----------+---------
 311549902 | MARKETING     | 1989-05-23 |          | 1999-05-01      | 40000.00 |        
 442346889 | TEAM LEADER   | 1990-06-17 |    14.75 | 1999-06-01      |          |        
 213764555 | SALES MANAGER | 1994-08-14 |          | 1999-08-01      | 30000.00 | 2000.00
 313782439 | SALESMAN      | 1997-06-28 |          |                 | 20000.00 | 1000.00
 220984332 | SHIPPER       | 1996-07-22 |    11.00 | 1999-07-01      |          |        
 443679012 | SHIPPER       | 1991-01-14 |    15.00 | 1999-01-01      |          |        

Foreign?


SELECT * FROM employee_pay;
  emp_id   |   position    | date_hire  | pay_rate | date_last_raise |  salary  |  bonus  
-----------+---------------+------------+----------+-----------------+----------+---------
 311549902 | MARKETING     | 1989-05-23 |          | 1999-05-01      | 40000.00 |        
 442346889 | TEAM LEADER   | 1990-06-17 |    14.75 | 1999-06-01      |          |        
 213764555 | SALES MANAGER | 1994-08-14 |          | 1999-08-01      | 30000.00 | 2000.00
 313782439 | SALESMAN      | 1997-06-28 |          |                 | 20000.00 | 1000.00
 220984332 | SHIPPER       | 1996-07-22 |    11.00 | 1999-07-01      |          |        
 443679012 | SHIPPER       | 1991-01-14 |    15.00 | 1999-01-01      |          |        

They’re the same!

Insect Observations

Next Week…

JOINS!