Day 68, the curse of ORM

Tried to get mongoose to work and once again, faced the curse of ORM. The problem I had with ORM is that, it is hard to debug because it doesn’t tell you where the problem is.

The task was simple: create an instance method such that the document can go to another document, fetching _id for another document, and fetch the final document. Extremely easy task yet I kept getting empty result.

Then I realized that the schema for one of the document was changed. Because the schema didn’t match the actual database, mongoose returned nothing. Will be more careful next time. But the fundamental problem is, how am I supposed to debug this, especially when a complex query involves many tables(collections).


How to write an ORM

ORM stands for Object Relationship Mapping, basically an abstraction layer on top the database.

Basic ORM structure:

var _ = require('underscore');
var mysql = require('mysql');

var connection = mysql.createConnection({
  user     : 'root',
  password : "root",
  database : 'chat',
  socketPath  : '/Applications/MAMP/tmp/mysql/mysql.sock',

connection.connect(function(err) {
  if (err) {
    console.log('db_connection_err', err);

//do NOT use 'new'
var makeORM = function(name){
  //class-level variables
  var tableName = name;

  //plese use 'new'
  var ORM = function(param){
    this.tableName = tableName;
      //init variables on instance
      this.msg_id = param.msg_id;
      this.text = param.text;

  ORM.findAll = function(callback){
    var sql_query = "select * from " + tableName;
    connection.query(sql_query,  function(err, rows){
      var result = _(rows).map(function(row){
        return ( new ORM(row) );

  ORM.find = function(){


  ORM.prototype = {
    getID: function(){
      return this.msg_id;
    save: function(){
    update: function(){

  return ORM;

//this refers to the 'messages' table
var Messages = makeORM('messages');
  console.log('result', result);
    console.log( 'each result also has methods such as getID: ', obj.getID() );


ORM results vs. normal database query results

ORM provides another layer of abstraction, and ORM queries tend to return non-generic objects.

Regular database query returns a generic object or most likely, an array of generic objects. Those generic objects haven’t gone through construction, hence don’t have methods. On a side note, SQL language was designed such that the query only specify what is returned but not how the result is returned.

Join table vs join query

Join table is indeed a real table existing in the database.

A join query tends to produce a temporary join table during execution, but that temporary join table is destroyed as soon as the join query is finished.


Day 25, database

Today is the second day of the Node.js fullstack sprint, but El and I were done with it last night. Hence, we started working on the next sprint – the database sprint.

The last several sprints are built on top of each other. It is amazing. Initially, we built a chat client, with as the backend. Then, we build a chat server with Node.js, replacing Then, we persisted our server side chat log to file system. Now, we are going to persist the server side chat log to databases.

It is a big hassle to talk to file system directly. Some drawbacks include:

– have to read / write the whole thing (we could break files into smaller files or try reading the file by index)

– no abstraction layer

– conflicts when the same file is read and written by multiple users at the same time

Hence, we need an abstraction layer on top of the file system. That system needs to:

– deal with tons of files and provide persistent storage.

– allow read from middle of the file and allow random access small piece of info in the middle of the file.

– keep a index of data locations

– data chunks have a fixed size

– provide speedy lookup, ideally in constant time

– support multiple programming language

– accept connection protocols similar to HTTP protocols

We briefly talked about SQL vs NoSQL. I am not very happy with the lecture. I feel like we over simplified the concept of a SQL database and created unnecessary confusion.

The purpose of index in a database:

– I had this faint idea of how index works and now I think I know better.

– Index, similar to the index pages in the back of the book, for example, the word “Long March” can be found in page 33.

– Image we have a table with 100 rows:

id, website, timestamp
1,, day1
2,, day2
3,, day3
100,, day5

– without an index, if we need to find out the timestamp for, we would have to traverse all records before finding the result. The complexity would be O(n).

– There are multiple steps in using an index.

Step 1 is to build the index. We need to traverse the entire table and build a index. It is O(n) but we only need to do this once.

My native thoughts are index might be similar to a HashTable, hence

myIndex[''] = 1;
myIndex[''] = 2;
myIndex[''] = 100;

Step 2 is to lookup the location from the index. Hopefully index behaves like a HashTable, with constant look up time. The time complexity here would be O(1).

In this case, we are looking for ‘’, and it would return a location of 100

Step 3 is to calculate the location of the real record. Assuming each record occupy the same size, the location of “” is: 100 * size_of_the_record

Step 4 is to visit the location of “” directly, without visiting any of the other records.

Again, my very native understanding of how index would speed up lookups.

Relational vs non-relational databases:

Relational comes from relational algebra, a branch of mathematics dealing with sets. It makes sense since sets and tables are very similar. For example, they are unique.

SQL questions are common in job interviews. Basic syntax is SFW – select * from * where. Common topics are: normalization, schema design and foreign keys.

Types of collections between tables: 1 to 1, 1 to many, and many to many.

It is not a good idea to store arrays in a column because it is hard to index.


ORMs enable people to use SQL, but they might produce bad or slow SQLs.


– technology.

— Schemaless / doc oriented. Good for start-ups who change their schema all the time.

— auto-scaling / shading. However, auto-scaling only works with primary indices. At a large scale, you have to choose auto-scaling or secondary indices. Primary index means index based on the primary key(???).

— won’t tell you if a write fails

– user experience

– fashion

The Twitter dilemma. Natively, there are two tables.

// UserTable
user_id, user_name
1, shao
2, js
3, python

// TweetTable
tweet_id, tweet_text, timestamp, user_id
11, tweet1, time1, 1
12, tweet2, time2, 2
13, tweet3, time3, 1

id, user_id, following_user
1, 1, 2
1, 1, 3

// to build shao's timeline, we would write
// pretty sure this is NOT the right syntax
SELECT * from TweetTable where user_id in (SELECT following_user from FollowTable where user_id = 1)

The problem is we need to re-build the entire index every time a new Tweet is created and the users are expecting results to be returned at almost real time. Enough for now, but I need to study more on this.