Query Evaluation

Due by Friday, April 5 at midnight

Last accepted with late days is Monday, April 8 at midnight

Goal of assignment

The goal is for you to work with query evaluation.

What to do

You will answer questions 12.4 (not part 4) and 12.5 from the textbook. For your convenience they are reproduced below. Items in bold were added in addition to the text in the book.

Notation

The questions use relational algebra which we have not studied but is discussed in Chapter 4 of the textbook. The two questions below only use one aspect of relational algebra - the selection operator which is denoted by σ. This effectively gives the where clause of the SQL query. For example, σSailors.sid<50,000∧Sailors.age=21 (Sailors) means you are doing a where of Sailors.sid < 50,000 and Sailors.age = 21. The ∧ means the logical and operator. The "(Sailors)" at the end means you are acting on the Sailors relation. The other queries are similar and ask if they are not clear.

Exercise 12.4

Consider the following schema with the Sailors relation:

Sailors(sid: integer, sname: string, rating: integer, age: real)

For each of the following indexes, list whether the index matches the given selection conditions. If there is a match, list the primary conjuncts. Explain your reasoning.

  1. A B+-tree index on the search key ⟨ Sailors.sid ⟩.
    1. σSailors.sid<50,000 (Sailors)
    2. σSailors.sid=50,000 (Sailors)
  2. A hash index on the search key ⟨ Sailors.sid ⟩.
    1. σSailors.sid<50,000 (Sailors)
    2. σSailors.sid=50,000 (Sailors)
  3. A B+-tree index on the search key ⟨ Sailors.sid, Sailors.age ⟩.
    1. σSailors.sid<50,000∧Sailors.age=21 (Sailors)
    2. σSailors.sid=50,000∧Sailors.age>21 (Sailors)
    3. σSailors.sid=50,000 (Sailors)
    4. σSailors.age=21 (Sailors)

Exercise 12.5

Consider again the schema with the Sailors relation:

Sailors(sid: integer, sname: string, rating: integer, age: real)

Assume that each tuple of Sailors is 50 bytes long, that a page can hold 80 Sailors tuples, and that we have 500 pages of such tuples. For each of the following selection conditions, estimate the number of pages retrieved, given the catalog information in the question. Explain how you came to your conclusion including showing how you estimated costs of the chosen method and any alternatives. IHeight is the height of the tree or the number of reads to get to the matching index entry in the hash index. INPages is the number of pages in the index but that is not needed for these questions. Low/High are the lowest and highest values in the table.

  1. Assume that we have a B+-tree index T on the search key ⟨ Sailors.sid ⟩, and assume that IHeight(T) = 4, INPages(T) = 50, Low(T) = 1, and High(T) = 100,000. Consider both the possibility of having a clustered and unclustered index.
    1. σSailors.sid<50,000 (Sailors)
    2. σSailors.sid=50,000 (Sailors)
  2. Assume that we have a hash index T on the search key ⟨ Sailors.sid ⟩, and assume that IHeight(T) = 2, INPages(T) = 50, Low(T) = 1, and High(T) = 100,000.
    1. σSailors.sid<50,000 (Sailors)
    2. σSailors.sid=50,000 (Sailors)

Working in groups

You may (and are even encouraged) to work in groups of up to three (3) students. You only need to turn in one solution for each group unless there is a problem. Place all group members names on the solution. All members of the group must know the details and reasoning of the solution. If only some members worked on and understand a particular sproblem solution then this should be noted on what you turn in by giving the name(s) of the people who accomplished the work for that problem.

Turning in your work

There is an assignment link on Moodle. Turn in a file with your answers. It can be in any reasonable format (PDF, RTF, MS Word, Open Office, ....).