Day 73, God bless upsert

Trying to clean up the mongoDb code I wrote. Solved a bug by using upsert….

The problem I had might be related to this StackOverflow question:
http://stackoverflow.com/a/7855281/1239621

I don’t know why the other method is not working. But the solution presented in the SO was working. Basically, you want to do:

User.update({_id: input_id}, inputData, {upsert:true}, function(error,data){})
Advertisements
Standard

Day 32, more review sprint, hello Backbone

Continuing my review sprint. Spend the morning working on MongoDB and the afternoon working on BackBone.

It was surprising nobody really knew enough about MongoDB. Guess all the instructors never really used MongoDB in their work because MongoDB was so new.

Backbone is an interesting one. I followed the video of http://backbonetutorials.com/. Instead of writing the app he was talking about, I re-wrote my own ChatClient. I paused the video many times and was able to finish my re-write. A couple of takeaways:
– Can’t believe how many small magics there are. The video tutorial really helped.
– I am constantly paralyzed by the number of choices I have. Basically, I can achieve the same result with many ways and I don’t know which one is the best practice. So, in that regard, the tutorial also helped. However, I am not sure that the tutorial itself is following the best practice all the time

Personal projects:
We talked a little bit more about personal projects. The criteria are:
– heavy in employer relevant skills. Note: using cutting edge tech-tools is also employer relevant. The tool itself might not be relevant, but your appetite for cutting edge tech is employer relevant.
– hackernews worthy. This is an interesting idea. The teachers went on to classify that hackernews worthy is different from a general audience worthy. For example, a fairly nerdy idea might not appeal to the general public but might be interested to the tech community.
– focus & viable in two weeks. It is better to have a focused and complete product than a half-baked large product.
– show relevant skills / interests in certain field

Tips:
In a one-to-many relationships, store the information in the many side. A good example is thinking about a celebrity and his/her many fans, only fans remember the celebrity, not the other way around.

Standard

Day 25, database

Today is the second day of the Node.js fullstack sprint, but El and I were done with it last night. Hence, we started working on the next sprint – the database sprint.

The last several sprints are built on top of each other. It is amazing. Initially, we built a chat client, with Parse.com as the backend. Then, we build a chat server with Node.js, replacing Parse.com. Then, we persisted our server side chat log to file system. Now, we are going to persist the server side chat log to databases.

It is a big hassle to talk to file system directly. Some drawbacks include:

– have to read / write the whole thing (we could break files into smaller files or try reading the file by index)

– no abstraction layer

– conflicts when the same file is read and written by multiple users at the same time

Hence, we need an abstraction layer on top of the file system. That system needs to:

– deal with tons of files and provide persistent storage.

– allow read from middle of the file and allow random access small piece of info in the middle of the file.

– keep a index of data locations

– data chunks have a fixed size

– provide speedy lookup, ideally in constant time

– support multiple programming language

– accept connection protocols similar to HTTP protocols

We briefly talked about SQL vs NoSQL. I am not very happy with the lecture. I feel like we over simplified the concept of a SQL database and created unnecessary confusion.

The purpose of index in a database:

– I had this faint idea of how index works and now I think I know better.

– Index, similar to the index pages in the back of the book, for example, the word “Long March” can be found in page 33.

– Image we have a table with 100 rows:

id, website, timestamp
1, google.com, day1
2, amazon.com, day2
3, ebay.com, day3
...
100, baidu.com, day5

– without an index, if we need to find out the timestamp for baidu.com, we would have to traverse all records before finding the result. The complexity would be O(n).

– There are multiple steps in using an index.

Step 1 is to build the index. We need to traverse the entire table and build a index. It is O(n) but we only need to do this once.

My native thoughts are index might be similar to a HashTable, hence

myIndex['google.com'] = 1;
myIndex['amazon.com'] = 2;
...
myIndex['baidu.com'] = 100;

Step 2 is to lookup the location from the index. Hopefully index behaves like a HashTable, with constant look up time. The time complexity here would be O(1).

In this case, we are looking for ‘baidu.com’, and it would return a location of 100

Step 3 is to calculate the location of the real record. Assuming each record occupy the same size, the location of “baidu.com” is: 100 * size_of_the_record

Step 4 is to visit the location of “baidu.com” directly, without visiting any of the other records.

Again, my very native understanding of how index would speed up lookups.

Relational vs non-relational databases:

Relational comes from relational algebra, a branch of mathematics dealing with sets. It makes sense since sets and tables are very similar. For example, they are unique.

SQL questions are common in job interviews. Basic syntax is SFW – select * from * where. Common topics are: normalization, schema design and foreign keys.

Types of collections between tables: 1 to 1, 1 to many, and many to many.

It is not a good idea to store arrays in a column because it is hard to index.

ORMs:

ORMs enable people to use SQL, but they might produce bad or slow SQLs.

Mongo:

– technology.

— Schemaless / doc oriented. Good for start-ups who change their schema all the time.

— auto-scaling / shading. However, auto-scaling only works with primary indices. At a large scale, you have to choose auto-scaling or secondary indices. Primary index means index based on the primary key(???).

— won’t tell you if a write fails

– user experience

– fashion

The Twitter dilemma. Natively, there are two tables.

// UserTable
user_id, user_name
1, shao
2, js
3, python

// TweetTable
tweet_id, tweet_text, timestamp, user_id
11, tweet1, time1, 1
12, tweet2, time2, 2
13, tweet3, time3, 1

//FollowTable
id, user_id, following_user
1, 1, 2
1, 1, 3

// to build shao's timeline, we would write
// pretty sure this is NOT the right syntax
SELECT * from TweetTable where user_id in (SELECT following_user from FollowTable where user_id = 1)

The problem is we need to re-build the entire index every time a new Tweet is created and the users are expecting results to be returned at almost real time. Enough for now, but I need to study more on this.

Standard

How to upload database to meteor.com

If you want to upload your local db to replay.meteor.com(replace replay.meteor.com with yourapp.meteor.com), follow the following steps:

Step 1, create 3 scripts file: export.sh, showdb.sh, restore.sh

For export.sh, write:

mongodump -h 127.0.0.1:82 -d meteor -o meteor
#port is 82 because I am running the local meteor at 80
#port should be your local port + 2, so the default would be 3000+2 = 3002

For showdb.sh, write:

meteor mongo --url replay.meteor.com

For restore.sh, write:

mongorestore -u client -p $1 -h production-db-a1.meteor.io:27017 -db replay_meteor_com meteor/meteor/ --objcheck --drop
#production-db-a1.meteor.io is the physical server and might be different for your app

Step 2:

sh export.sh
#this will export your db into a local file

Step 3:

sh showdb.sh
#output should be
#mongodb://client:d2f3d2a0-73eb-f641-a41b-e12517cb5a2a@production-db-a1.meteor.io:27017/replay_meteor_com
#copy d2f3d2a0-73eb-f641-a41b-e12517cb5a2a to your clipboard

Step 4:

sh restore.sh d2f3d2a0-73eb-f641-a41b-e12517cb5a2a
#that should get the local file and restore it to the server
#WARNING: it will drop your remote db at meteor.com

Note:

– The ‘d2f3d2a0-73eb-f641-a41b-e12517cb5a2a’ seems to work only once and expires in a minute. So, hurry up

– Pay attention to the server address since they do change from time to time

Standard