invert .


Read this first

Learning about Learning - Part 1

The plan for the next few weeks is to develop a deep understanding about machine learning and statistics. Even though I have used and applied machine learning to many projects, and have taken a number of offline and online courses on machine learning, I feel like I don’t have a visceral understanding of it and have barely scratched the surface. My goal is to have a strong theoretical foundation and avoid the “throw things at it and see what sticks” approach.

Stage 1: Build a strong mathematical foundation, be able to reason about probability from a measure theoretic perspective and fill in holes in math. The plan of attack is:

  • Learn real analysis by going through Francis Su’s Youtube lectures and using the following books as reference
    • Principles of Mathematical Analysis by Rudin
    • Understanding Analysis by Abbott
    • Real Mathematical Analysis by Pugh
  • The next step is to do some...

Continue reading →

Yosemite battery issues.

After upgrading OSX to Yosemite, I noticed a sharp rise in battery usage in sleep mode. I usually close the lid, instead of shutting down and used to get a really good battery backup on my Air. But after the upgrade, I would lose about 30-40% of the charge overnight.

So when you close the lid and your mac enters sleep mode, it is still running, only the displays have been turned off. After some time, memory is flushed to disk and it actually enters sleep mode. Turns out Apple have increased the delay for this to happen in the default configuration. To view your power management settings, use the command line tool pmset.

$ pmset -g
Active Profiles:
Battery Power       -1*
AC Power        -1
Currently in use:
 standbydelay         10800
 standby              1
 halfdim              1
 hibernatefile        /var/vm/sleepimage
 darkwakes            0
 disksleep            10

Continue reading →

An algorithm for trending topics

In this post I will describe a really simple algorithm for identifying trending items or posts in your application. TF - IDF (term frequency, inverse document frequency) is a technique that was used to rank search results in early search engines.

Assume that you have a large corpus of text documents and want to search a document containing a certain phrase or set of keywords. With TF-IDF, you need to calculate two quantities for each keyword :-

  1. term frequency - the number of times the keyword appears in a particular document.
  2. inverse document frequency - the inverse of the number of documents containing the keyword.

For each document sum up the TF-IDF values of each of the keywords in the query and then rank them. The reason this works is because the IDF part of it helps filter out common words that are present in tons of documents. So a document containing a rare term is given...

Continue reading →

How does Git work?

So what’s behind the abstractions of branches and commits in git? How are the files really stored? . At the heart of git is an object database, everything is an object, commits, files and folders, everything. Inside your repo, the whole commit tree is stored in your .git directory.

Git takes the SHA1 hash of every file and compresses it using zlib/deflate and stores it in its object database, where each object is a file named after its SHA1 hash. Each directory is stored as a tree object, which is basically a flat file with a list of its files and subdirectories with their permissions and hash references. A commit object contains the commit message, its parent, its author and a reference to the hash of the root directory tree. So when you make a change to a file, its hash changes. When you commit it, the entry in the tree is updated. A branch is simply a reference to a commit. The...

Continue reading →

Writing a web server from scratch - Part 2

Over the past few days I spent some time writing a web server and it has been very gratifying and I learnt about quite a few things. So let me start off with the design, the thing about web servers is that its quite simple to implement the core functionality but the bulk of it is the plumbing work around parsing the different rules of the HTTP protocol, which I’ve kept it to the bare minimum as modern browsers have sane defaults.


So the basic outline is like this:

  1. Open a socket, bind it to a port and start accepting connections.
  2. Once you receive a request from a client, pass the socket information to a handler function and continue servicing other requests.
  3. In the handler function, parse the request headers and generate your response headers and body accordingly. For serving static files, simply do a buffered write on the socket after writing the headers.
  4. Close the socket and...

Continue reading →

Seneca on Reading

Seneca in his 1st century classic Letters from a Stoic, has some gems about reading books.

Don’t keep changing from book to book, passion to passion, one thing to another, that is a sign of a sick and restless tourist.

Only read the works of those whose genius is unquestionable.

If you don’t like something that you read, go back to reading someone whom you’ve read before.

Paraphrased a little but I thought they were worth sharing.

View →

Writing a web server from scratch

As I mentioned in an earlier post, I have been reading more about system calls in the Unix kernel recently and am thinking of writing a small application to apply what have learnt and gain some experience with systems programming. So, I’ll be building a minimalist high performance web server in C over the next few days.

Why reinvent the wheel?, you may ask but this is just to expand my personal knowledge and may not be used by anyone else. So the first step would be get a simple static server running that would be just serving all the files present in a directory provided as an option to it. Architecturally, this wouldn’t be radically different from a web server with dynamic content and this is the part I want to focus on for now. So to handle multiple requests concurrently there a number of possible designs like using a pre-forking model, multithreading or using non blocking I/O...

Continue reading →

F1 is not interesting anymore.

I recently saw an old race on youtube with Senna and Mansell battling each other from start to finish. Man, that was an interesting race! . Its a stark contrast to today’s F1, Mercedes has taken Red Bull’s place this year but its the same boring races marked by tyre and fuel conservation and very little racing. Many of the overtaking and defensive maneuvers in Senna’s days would be illegal today. Formula One is supposed to be the pinnacle of racing, why introduce so many unnecessary constraints?. Fuel economy and tyre degradation have now become more important than the driving itself. I am going stop watching the races for a while, possibly forever. Will find better ways to bide time on a Sunday afternoon.

View →

Using Vim as a password manager

The other day at work, we were having a discussion about managing passwords. Gone are the days where you could keep everything in your head. Having accounts on over hundreds of sites and apps, I find myself clicking the forgot password link far too often. A non solution is using the same password over all your accounts. If one of them gets leaked, all of your accounts become vulnerable. There are password management apps that generate and store passwords for you like LastPass, but they can’t be trusted as they store your passwords on their servers. There are a couple of apps which locally encrypt your passwords and then back it up in the location of your choice (iCloud / Dropbox) etc but they cost something like $50.

If you’re like me and don’t want to spend 50 bucks on a password app, there is a crude alternative: plain old Vim! . Remember, Vim has an option that enables the encryption...

Continue reading →

Advanced Linux Programming

I stumbled upon a book called Advanced Linux Programming - by Mark Mitchell, Jeffrey Oldham, and Alex Samuel - while reorganizing my Dropbox folder. I started reading the first few pages and got hooked to it, thereby spending the rest of the Sunday reading and trying things out from it. I always wanted to learn about operating systems but wasn’t able to find the resources for it. Berkeley has a nice set of video lectures on operating systems but I felt it was too theoretical and abstract. While there is nothing wrong with that, I couldn’t fathom watching 20 one hour lectures on the topic. I had read the Operating Systems book by Silberschatz earlier and that was a bit boring and I still had no clue about how various system calls in *nix systems worked.

This book is more practical and deals with GNU/Linux specifically though the ideas/APIs are portable to most Unix based systems...

Continue reading →