One reason why node.js has an innovation advantage

One reason why node.js has an innovation advantage

The Sum of All Page Ranks is (probably) a Googol

Preface: While the majority of this post is anecdotal it’s conclusion drawn from the perspective of creative programmers Larry and Sergey as of 1998 follows Occam’s razor

The real meaning behind Google’s name

It’s fairly common knowledge that Google’s name comes from the number googol. As time went on they used the number behind the name for various other things, such as the 1e100.net domain which also stands for a googol (1.0×10^100). But why a googol?

According to wikipedia, the term was coined by a 9 year old in 1938, and it is one of the largest numbers which can also be easily pronounced. This gives it the innocent and playful child-like connotation that lends it well to the name of a business. That also explains the choice of color scheme but that’s another story for another day.

But still, why 10^100? The real answer comes from Larry and Sergey’s first steps into the search industry in the 1990s. Back then all search engines and their algorithms did was index plain text and they calculated little about the meaning of that text. Google’s main differentiation was the idea of a page rank which would use anchor text and hyperlinks to sort results for the lexicon to reference.

So page rank is what it was all about. Without that, the relevance of google’s search results is no better than any other search engine.

PageRank is defined as follows: We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one. [1]

Firstly, a googol can be represented with 333 bits, or a little less than 42 bytes. Although the number is large and hard to comprehend for us mere mortals a computer does not have the flesh computation problem. Secondly and more importantly, google’s page rank algorithm specifies that the sum of the page ranks of all web pages on the internet will be one.

Taken at face value this seems straightforward enough. However the secret sauce of the project was page rank and the secret sauce of page rank is the idea that all page ranks total one. You may now see where this is going.

Google was initially called BackRub[1] and it was only after the first implementation that the name was changed to Google.

The page ranks together form a probability distribution over web pages, and the sum of all page ranks do indeed total one something but that something is not one integer, it is one googol. The idea that all page ranks total “one” is also an optimization as all page ranks can then be assumed to fall on one side of the decimal place which just happens to work brilliantly with fixed point arithmetic — in backrub’s google’s case scaled by a factor of googol.

For this reason a value sufficiently large enough to scale by a factor of, and precisely rank every single url on the internet had to be defined. And thusly float precision with all of its issues need not be introduced. The vastness of the number googol allows a high level of precision that fits into a mere 333 bits in base-2.

More anecdotally, this same optimization can be seen throughout Google’s various APIs e.g. it’s weather API that uses lat/long coordinates multiplied by one million to omit the decimal[2]

The publication of “The Anatomy of a Large-Scale Hypertextual Web Search Engine”[1] was at one time an academic endeavor. The wayback machine stops recording crawls for Google’s “manuscript for pagerank” at stanford.edu in late 2001 and the document has since been purged hosted elsewhere on the internet.

I have no direct access to the historical knowledge to claim that the name change from backrub -> google was caused by this but the lines do seem to connect. It’s most likely true that only after the first iteration of the page rank algorithm was it realized that a sum of all page ranks would be a predefined finite number. This was also most likely true because (a) the name backrub had it’s roots in page rank as it’s meaning came from the use of hyperlinks between websites (page rank) rather than just text search and (b) the name seemed to have iterated along with the project as is common in software development.

Only after at least the first iteration would it be necessary to propose that a number in the page rank algorithm would hold this significance.

10,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000,­000

This is just my opinion of course, I don’t work for Google or know their secrets so maybe they did just like the name. Other accounts of it’s origin lack the computational relevance attributing it to serendipity instead.

And until something better comes along, we will all continue to search the googol

  1. http://infolab.stanford.edu/~backrub/google.html
  2. http://www.google.com/ig/api?weather=„,4550000,-7358300

Node.js PaaS: Nodejitsu (part 1 of 3)

1. Getting started

2. Install the command line interface utility

3. The platform

4. Web interface

haha awesome

realtime jQuery on the server with nodequery

Introducing nQuery, an implementation of jQuery’s API to control the browser / DOM from the server in real-time. It’s easy! Just pass a function to nQuery.use()

Methods include live(), serialize(), attr(), css(), html(), append() It’s still in beta, but I have been adding more and more functionality.

tblobaum/nodeQuery - GitHub