Day 27, more database

Today is the last day of the database sprint yet El and I barely finished the basic requirements. I can’t image any other group is significantly ahead of us since El and I worked on the sprint yesterday and both of us had a basic working knowledge of database coming into the sprint.

I have to say that Marcus really knew the database stuff and did a great job teaching us today.

Clarification on database index:

– Each row has a constant length. This is the result of each column (or cell) has a constant length and each row contains a fixed number of columns.  

– Index indeed provides constant lookup time, similar to HashTable.

Interview questions on designing schema for database:

Question: given this dataset, can you design a database schema for me?

Answer: I am afraid that I might need to know the query pattern too. The two criterion for a good schema are:

– the most frequent queries should run the fastest. Hence, query pattern is necessary to come up with a good design;

– data should be normalized to a reasonable degree. It is almost impractical to achieve  perfect data normalization in a production system. For example, 

//consider the following table

user_name, user_mom_name

Max, Alice

Andy, Emily

// at this moment, the data is fairly normalized

// however, if the most frequent query to this table is, given a username, what is his/her great, great, great, great, great mom’s name?

// then we might want to add an additional column, and precompute the “great, great, great, great, great mom’s name”

// adding a column de-normalize the data, but speeds the lookup

Marcus gave us a lecture on how to implement your own ORM for SQL.