Who's #1?: The Science of Rating and Ranking

Name: Who's #1?: The Science of Rating and Ranking
Author: Amy N. Langville and Carl D. Meyer
ISBN: 0590461109

by Amy N. Langville and Carl D. Meyer

Buy on Amazon

Recommended by

Nick Higham

"Yes. Google searches, all the shopping websites that say ‘you liked that, so you may like this,’ the ranking of teams (particularly in the US, where there is a lot of interest in ranking sports teams), or movie sites where people rate movies and they use that to predict what movies you might like. These are the sorts of applications the authors have in mind. The same authors wrote an earlier book purely about internet search engines , and Google in particular. Then they were led to think more generally about ranking and hence this book came along. It’s quite mathematical. There are a number of equations in there. But the mathematics is all elementary undergraduate mathematics. If you are familiar with matrices, vectors and basic probability, you should be able to follow much of the book. Indeed it’s an excellent read for anyone doing a linear algebra course, as it beautifully illustrates the power of linear algebra for solving real-life problems. I wouldn’t call it a field of mathematics, as such. There have been papers on this topic for many years though this might actually be the first book dedicated to ranking. But the sort of questions they’re asking are pretty fundamental. For instance, if you’ve got two different rankings how far apart are they? If you and I both come up with a ranking of the best football teams in the UK and then someone else comes up with a ranking, which pair of us are the closest? How do you compare different rankings? That’s one of the questions they address here. Also, if there are ties, how can you break them? If your ranking has two football teams on the same level, what extra information can you put into the equation to decide how to order those? How sensitive are those rankings to the underlying data that you’re working from? That’s particularly relevant to internet search engines when people try to spam. They try to create lots of pages purely to generate extra links that can influence a ranking. How much do rankings change when you introduce those sorts of additional nodes or make perturbations to the data? That’s another important question that is addressed here. The one that I’m most familiar with is the internet search engine rankings. They are based on representing the whole web as either a graph or matrix. So there are several billion pages on the web now. Any one page may link to any other page, so you have to form a massive array where each row is a webpage. If I’m the n-th row, the question is which of the other several billion pages I link to. You put a 0 there if I don’t link to the page and a 1 if do. So, you end up with a huge matrix of noughts and ones. Yes, with just noughts and ones in there. What Google does—or what people think they do, and they certainly did in 1998 when they published an algorithm—is use the properties of that matrix to produce an ordering of pages based on importance. It’s a clever argument based on if I link to an important page – say the New York Times – I’m linking to a page that clearly has some authority. If they link to me then that gives me authority because they’re important and they’re linking to me. So, by using those sorts of ideas, and a little bit of mathematics, they came up with a way of converting this massive matrix into an ordering of essentially all pages on the web. That is an example of a ranking. The maths behind that is relatively elementary. The maths had been known for a long while but it had not been used in quite this way before. It’s actually quite a simple computation because all you have to do with that matrix is multiply it into vectors a few times and that can be done quite efficiently. It sounds like it’s harder because of the sheer size of the matrix, but it’s actually much easier. It has not got the combinatorial explosion property that the travelling salesman problem has where there are so many routes you can take that you couldn’t possibly check them all. Whereas here, the hidden properties of the matrix are not too hard to pull out with a little bit of relatively well-known mathematics. The thing with Google is that they published an algorithm in 1998. A lot of what they’ve done since then is catering to advertisers and avoiding spamming. They have all sorts of techniques that will juggle the list to achieve their many aims. It will not be a simple algorithm any more, I’m sure. Well this was in 1998 when they were just starting up. Brin and Page were PhD students in computer science at Stanford University. They actually never finished their theses because they decided to set up the company. At that point, they didn’t know what was going to become of the company so they had no qualms about publishing a paper. That was the only time they’ve ever, as far as I’m aware, published anything like that. Exactly. They cannot reveal their precise methods. Maybe hotels don’t fit in too well to this idea of linking to other important pages. They’re not going to be linking to a lot of things other than local maps and tourist sites. For a website, probably their earlier book on page ranking and specifically about search engines would be the first place to look. This certainly would be a good follow-up. It’s looking at ranking in general, not just webpages."

Applied Mathematics · fivebooks.com