soam's home

home mail us syndication

Archive for internet

Which Linux Distro Is The Most Popular?

About six years ago, I had a hankering to dig deeper into Linux distros. I took an old desktop (and when I mean old, I mean like dated circa 2000!) and went through the long and painful process of installing Gentoo on it. I then installed apache and movable type and pretty soon I had this desktop box running 24/7 at home powering my site. By today’s standards, the h/w specs of the box meant that it would have comfortably been beaten by my iPhone blindfolded and with two hands tied. Yet, because Gentoo insists on compiling the entire distro from scratch per installation, it actually performed its web hosting duties pretty well. Ultimately, I ended up moving the site over to an ISP but only because such a setup provides things like 24/7 power and net access, something not possible at home then due to the PG & E imposed rolling blackouts in the SF Bay Area.

The next time I had to consider Linux distros in any meaningful way was when I had to start moving the services in our startup to Amazon. I ended up picking Ubuntu for our AMI. It seemed to have the biggest footprint and support. Gentoo didn’t really enter the picture at the time. Since then, I’ve seen, at least post acquisition at Limelight, the slow supplanting of Ubuntu and Debian by CentOS, certainly for server installs.

Imagine my surprise when a friend recently updated his FB status thus: “Setting up gentoo linux. It’s really designed for self torture.” Did people still use Gentoo? So, I did a bit of digging and found a site, DistroWatch, that offers various distro downloads and keeps track of their popularity. According to them, the top five are:

  1. Mint
  2. Ubuntu
  3. Fedora
  4. openSUSE
  5. Debian

Gentoo comes in at 18. To be honest, I never really had heard of Mint either. Apparently it is a desktop distro that is:

an Ubuntu-based distribution whose goal is to provide a more complete out-of-the-box experience by including browser plugins, media codecs, support for DVD playback, Java and other components. It also adds a custom desktop and menus, several unique configuration tools, and a web-based package installation interface. Linux Mint is compatible with Ubuntu software repositories.

I would imagine DistroWatch is targeted at desktop downloads, hence the skew. Interesting nonetheless.

Here’s a post at Geektrio dated nearly two years ago listing the then top ten from DistroWatch. The top five at that time:

  1. Ubuntu
  2. Fedora
  3. openSUSE
  4. Debian
  5. Mandriva

Mint is at 6 and Gentoo comes in at 9. The trend for these two would seem to be pretty clear. Mandriva (also known as Linux Mandrake) has now dropped to 17. Fascinating stuff for Linux enthusiasts.


The Case for an Open Source As Service Platform

In his article on Steve Jobs (The Tinkerer), Malcolm Gladwell gets to the core of what made the UK dominate the industrial revolution:

They believe that Britain dominated the industrial revolution because it had a far larger population of skilled engineers and artisans than its competitors: resourceful and creative men who took the signature inventions of the industrial age and tweaked them—refined and perfected them, and made them work.

Similarly, Steve Jobs, as per Isaacson’s biography:

But Isaacson’s biography suggests that he was much more of a tweaker. He borrowed the characteristic features of the Macintosh—the mouse and the icons on the screen—from the engineers at Xerox PARC, after his famous visit there, in 1979. The first portable digital music players came out in 1996. Apple introduced the iPod, in 2001, because Jobs looked at the existing music players on the market and concluded that they “truly sucked.” Smart phones started coming out in the nineteen-nineties. Jobs introduced the iPhone in 2007, more than a decade later, because, Isaacson writes, “he had noticed something odd about the cell phones on the market: They all stank, just like portable music players used to.

And so on. This observation does give rise to a question – if Jobs could rise to such exalted heights by mere ruthless refinement, what hope is left for the rest of us mere mortals? What the article does not say and should be obvious to anyone in the tech industry is that we’ve been living through the golden age of tweaking. After all, what is open source if not tweaking unleashed? I don’t need to go through the sheer quantity and variety of tools, programs, methods and systems that open source has produced. There is an open source equivalent for pretty much every functionality you can think of. Yet, I wonder if we have already lived through its golden age and are moving on to something else.

What I mean is that in the past ten years or so, we have moved from the paradigm of software as executable to software as service. It is not enough to produce a program or system. You have to run it as well and keep it running. Hence websites, search engines, social networks and pretty much everything else in our grand world wide web. This leads to the next question: while there is an open source equivalent to pretty much any software from Microsoft or Adobe, where is the open source as service (OSaaS) equivalent to Google or Facebook? Nutch doesn’t count – it’s code that has to be installed and run. Doing so is nontrivial and illustrates some of the issues facing OSaaS:

  • cost of machines to run the service and supporting services
  • cost of bandwidth
  • storage costs
  • operations costs

I would argue Wikipedia is a good example of a OSaaS – and it is continually in fundraising mode. Furthermore, eiven Google and Facebook’s masive scale, it’s impossible to produce any kind of competing system the traditional way without massive amounts of money. So much for open source purity! While it is true SaaS and PaaS providers often make available APIs and platforms on a tiered basis with the first x or so requests being free and hence is a great way of getting started with your app, you have to pay after you exceed a certain level of usage. Again, non scalable and sure to discourage the next budding Steve Jobs toiling away in his/her garage.

I think the success of the SETI project (not in finding aliens but in getting people to contribute their spare cycles) or even what I saw at Looksmart when we acquired Grub indicates there might be another way. Grub was an open source crawler that users could download and install on their computers. It showed nifty screen savers when your computer needed to snooze and crawled URLs at the same time. We were surprised by Grub’s uptake. People wanted to make search better and were happy to download and run it on their own computers. We had people allocating farms of machines devoted to running Grub. We used it for nothing but dead link checking for our Wisenut search engine – but even that made people happy to contribute.

One possible lesson from this could be that if it is possible to develop a framework/platform for effectively partitioning the service amongst many participants, each participant would pay a fraction of the total cost. Of course, as BitTorrent shows, load balancing has to be carefully done. People that host too many files leech too much ISP resources and get sued. Grid computing and projects like BOINC are certainly promising but seem to be specialized for long running jobs of certain types like protein folding or astrophysics computations. It’s not clear whether they can provide a distributed, public, OSaaS platform. Such a platform, if carefully engineered, could pave the way to many interesting applications that could provide alternatives to the Facebooks and Googles of the world and ensure tinkering in the new millenium remains within the reach of dreamers.


Brave New World Of Oversharing

From the New York Times:

“Ten years ago, people were afraid to buy stuff online. Now they’re sharing everything they buy,” said Barry Borsboom, a student at Leiden University in the Netherlands, who this year created an intentionally provocative site called Please Rob Me. The site collected and published Foursquare updates that indicated when people were out socializing — and therefore away from their homes.

In this day and age of Too Much Information (TMI), the only real security, it would seem, would be the “security through obscurity” variety. If everyone flooded the web about the minutiae of their day to day lives, chances are it’s going to be tough to single out anyone in particular. That approach, however, puts early adopters at risk. No longer would they be just a face in the crowd. Comes with the territory, I guess.

That being said, websites making said TMI possible should probably realize there are still some boundaries best left uncrossed.

Video Clip Lengths

In a NYT article, Rise of Web Video, Beyond 2-Minute Clips, Brian Stelter writes:

Video creators, by and large, thought their audiences were impatient. A three-minute-long comedy skit? Shrink it to 90 seconds. Slow Internet connections made for tedious viewing, and there were few ads to cover high delivery costs. And so it became the first commandment of online video: Keep it short.

I recall coming across this phenomenon in 1997 and 1998 while doing research work into characteristics of web video stored on the web at that point in time. Here’s an interesting graph from the paper I wrote on the subject:

Web Video Lengths (1997)

Web Video Lengths (1997)

The number of videos on the web were relatively small and their sizes could be measured in seconds. 90% were 45 seconds or less. The graph is capped at around 2 minutes for maximum length although I did find outliers that were longer.

What I found interesting, however, in a followup study was that if you took away the bandwidth chokepoints, video lengths ballooned. I was studying the video access patterns of a Video On Demand experiment at the Lulea University in Sweden – the setup here was over a dedicated high speed network, effectively removing slow access as a determinant of behavior. Specifically:

Since 1995, the Centre for Distance-spanning Technology at Luleå University (CDT) has been researching distance education and collaboration on the Internet [17]. Specifically, it has developed a hardware/software infrastructure for giving WWW-based courses and creating a virtual student community. The hardware aspects include the deployment of a high speed network (2-34 Mbps backbone links) to attach the local communities to the actual University campus. The campus is also connected to the national academic backbone by a high speed 34 Mbps link [13] with student apartments being wired together with the rest of campus via 10 or 100 Mbps ethernet.

The following graph shows the distribution of video lengths for the files used in the system:

VoD Video Durations

The mean duration of these files were around 75 minutes or so. This finding hinted that as videos grew in popularity and infrastucture hurdles fell away, video durations would increase. From the original NYT article:

New Web habits, aided by the screen-filling video that faster Internet access allows, are now debunking the rule. As the Internet becomes a jukebox for every imaginable type of video — from baby videos to “Masterpiece Theater” — producers and advertisers are discovering that users will watch for more than two minutes at a time.
“People are getting more comfortable, for better or for worse, bringing a computer to bed with them,” said Dina Kaplan, the co-founder of

Ms. Kaplan’s firm distributes dozens of Web series. A year ago all but one of the top 25 shows on her Web servers clocked in at under five minutes. Now, the average video hosted by Blip is 14 minutes long — “surprising even to us,” she said. The longest video uploaded in May was 133 minutes long, equivalent to a feature-length film.

Interested by this, I took a look at the duration of the videos hosted by Delve. This is based on data a couple of weeks old, so this is not representative of the latest trends. However, I found the average video duration to be a little under 6 minutes. However, within this I found definite disparities between publishers. Our top 25 publishers (by video duration) had videos that were a little under 25 minutes on average. This indicates mixed video use by our publishers. While some are still sticking to shorter videos, a significant number are definitely taking full advantage of long form clips – one of the largest videos is around 12 hours in length!

It’ll be interesting to see how these trends hold over the next year or so.

Point To Point Vs Broadcasting

In an article arguing the transformative nature of bloggers, Scott Rosenberg writes on how mainstream publishers are missing the point:

Diller and his species of executive have always excelled at finding rare talents that can, at their best, enchant a mass market. But this very success has blinded them to the different, more diffuse sort of talent present among the Web’s millions of contributors. Of course talent isn’t universal, nor is it evenly distributed. But there is far more of it in the world than Diller’s blinkered vision allows. On the Web it can reveal itself in a far wider range of ways, and far more people will have a chance to cultivate it. It will never be perceived in a uniform way; you and I will recognize it in very different places and judge it in very different ways. But it is surely there — and, fortunately, denigrating it will not make it go away.

Scott is pointing out about how the web makes it easy for bloggers (or any other self started media publisher for that matter) to find and cultivate smaller audiences. And if you expand that line of thought further, you’ll come to the Long Tail phenomenon and how the best way to succeed these days is to find new and innovative ways of content aggregation that span the spectrum from publishing to five people vs millions.

Thus far, I really haven’t said anything new. What does occur to me however is that the very underlying technical structure of the web (HTTP and TCP/IP) makes it far more convenient to set up point to point communication structures versus one to many. The Internet just isn’t that well structured for broadcasting – one of the reasons for the rise of Content Delivery Networks. The server client approach actually serves niche markets better than mass ones. In short – if you want to broadcast your programs to an audience of millions, transmission over cable or air is still the optimum way to go. If you want to reach small, specialized, targeted audience – the web would almost seem jury rigged to fit that need. The medium is the message indeed!