soam's home

home mail us syndication

Replacing my MacBook Pro Drive

Back in grad school, my thesis advisor Brian Smith, to his eternal credit, really put the systems into computer systems where our research was concerned. He also placed the same emphasis on our group and how we dealt with our own computers. We joked that much like Marine Boot Camp, our group members needed to know how to take apart and put together an entire computer in less than minute in order to be able to graduate!

I tried carrying on this entire DIY ethic in my post graduate career. There were stumbles though when I first started dealing with laptops. In 2000, I permanently crippled my Dell Inspiron while taking it apart in order to replace the internal hard drive. It never worked quite as well post operation. It literally was held together by liberal applications of masking tape and glue. I still take pride in being able to sell it to a fly by night computer repair operator in Kolkata sometime around 2005. It probably exists in some incarnation somewhere, fueling some kid’s IIT aspirations right about now.

Given my last experience, I was somewhat nervous about replacing my MacBook Pro’s hard drive. No question it was due – two years of hard labor had squeezed the existing Fujitsu down to cacophonous joint on joint grinding. Each day, I thought, would be its last. However, it wasn’t a straighforward processs. Apple provides a How-To on its site if you want upgrade your MacBook but for your Pro, no dice.

Luckily, help exists online, in particular, here, here, here and here. The basic procedure is the same: you first buy a 2.5″ SATA drive, ideally 7200 RPM for faster performance even though it’s a bigger drain on the battery. I’ve never gone wrong with Western Digital, so I bought a WD 320G Scorpio. Next, you’ll need a 2.5″ enclosure – make sure it can do SATA drives. I learned the hard way. Finally, you’ll need a Phillips and a T-6 Torx screwdriver. I bought all except the Phillips screwdriver (I already own a set) from Fry’s. Not the cheapest but at least they’ll take returns in the first 30 days.

After fitting the WD drive to the enclosure, I hooked it up to the Pro’s USB drive and used SuperDuper to completely clone my main HD into an externally bootable drive. I rebooted the Pro from the external drive to confirm (hold down the option key when rebooting your Pro and, if multiple bootable devices are available, it’ll ask you to choose).

For the actual physical work, I printed out ifixit’s guide and followed it step by step. You have to take out a lot of screws and some parts. To keep track, I placed all the pages in the guide side by side and placed each set of screws next to the pictures, once I completed the step. This was particularly useful when putting everything back together again. My Phillips screwdriver is magnetized, so it holds the screw to itself. This was invaluable as many of the screws in the Pro’s casing are tiny and placing them can be tricky.

It was quite a relief when I put everything back together again, powered up my laptop and, after a brief, yet agonizing period, the Apple logo came on. And soon after, the machine booted up quite happily and my desktop appeared. Now, I have a souped up box and quite possibly have saved my company, Delve, a fair chunk of change in terms of not having to replace my laptop. One of the keys in the keyboard is still loose but I am hoping the tape and chewing gum approach will work in holding it in place!

jconsole, ec2, ubuntu

Remote debugging your jmx enabled process in the ec2 cloud via jconsole isn’t easy for any number of reasons. Perhaps it’s the NAT setup at AWS. Perhaps it’s Ubuntu or Linux related. The most common workarounds given are to have jconsole run on the remote box and either export its display locally via X (using, for example ssh -X or ssh -Y to tunnel) or via VNC. I found the former too slow and the latter too time consuming to set up on our existing systems.

However, I discovered nx, a set of technologies to greatly compress the X protocol, was very easy to set up on our Ubuntu boxes. It made the process of running jconsole remotely but displaying to my laptop locally very tolerable indeed. Not surprising as nx is intended to allow you to run xterm even over dialup! Here are the set of steps I followed (instructions derived from the nomachine site) to set up a nx enabled account on a remote Ubuntu Hardy Heron box. Your mileage may well vary.

On your remote box:

  1. install jconsole if you don’t have it. It’s in the Sun JDK package: apt-get install  sun-java6-jdk
  2. Get the nx debian packages from nomachine:
    1. wget http://64.34.161.181/download/3.4.0/Linux/nxnode_3.4.0-6_i386.deb
    2. wget http://64.34.161.181/download/3.4.0/Linux/nxnode_3.4.0-6_i386.deb
    3. wget http://64.34.161.181/download/3.4.0/Linux/FE/nxserver_3.4.0-8_i386.deb
  3. Required by nxserver: apt-get install libaudiofile0
  4. Install nx client, node, server:
    1. dpkg -i nxclient_3.4.0-5_i386.deb
    2. dpkg -i nxnode_3.4.0-6_i386.deb
    3. dpkg -i nxserver_3.4.0-8_i386.deb
  5. Create an nx enabled account called “nxtest”. You supply the account password.
    1. /usr/NX/bin/nxserver –useradd nxtest –system
    2. You might have to edit ~nxtest/.profile to add the “/usr/bin” to the PATH if it’s not added already.
    3. There are other and more secure ways of doing this.
  6. Install enough X libraries to get the remote desktop going. You can do various patchwork stuff but nx appearts to work best with KDE, so: sudo apt-get install kubuntu-desktop

On your local box:

  1. go to nomachine.com, download and install the free nx client for your PC type.
  2. start up the nx client
  3. give the remote hostname/ip address, provide a session name
  4. username: nxtest, password: <whatever password you used to set up the account>
  5. this should open a remote desktop with KDE running
  6. run “jconsole” as a separate command, use the remote option to connect to the java process.

That should do it!

Bing’s Engineers

Nice San Jose Mercury article on the ex-Inktomi and Yahoo-ites behind Bing’s real time search launch – I particularly enjoyed the opening paragraphs:

Microsoft engineer Chad Carson wasn’t thrilled about surrendering his solo window seat on the Alaska Airlines flight from San Jose to Seattle so he could talk shop with his boss Sean Suchter and colleague Eric Scheel.

But that innocent decision last July 22 would spark a 91-day sprint to a previously unreached Internet milestone.

By the time Flight 321 was over Oregon, the group in Row 6 had evolved from a technology klatch to a cabal of plotters who scrawled a schematic tangle of boxes on a sheet of paper to map out something no big Internet search engine had yet achieved. The three members of Microsoft’s new Silicon Valley search team would try to make their company’s Bing a window into America’s stream of consciousness, serving up the chatter on Twitter and blog posts, with the latest updates on everything from celebrity gossip to breaking news.

Knowing Sean and Chad’s talent and work ethic, it’s great to see them get this exposure, particularly after spending so much time in Google’s shadow. Congrats guys! Also, I found the mention of row 6 in Alaska Airlines particularly amusing. If you’re not MVP or Gold or any type of high falutin’ flyin’ status holder, you can still get on early before the rest of the folks in cattle class if you score a seat on row 6. It’s my own shortcut on flights to Seattle to Delve HQ.

Peak Load

The graph below shows the requests/sec on the Delve production load balancers for our playlist service system. The time frame roughly covers the past 7 days.

As you can see, we’ve had at least three major peaks over the past couple of days. Some of these have been due to some big traffic partners coming online (including Pokemon) and at least one of them (the most recent one) was because of singer/songwriter Jay Reatard’s untimely passing and the subsequent massive demand for his videos by way of pitchfork, a partner of ours. In other words, some we predicted. Others – well those just happen.

All of these hits are great for the business and for our growth but is definitely white knuckle time for those of us responsible for keeping the system running. Fortunately, through some luck and a whole lot of planning, things have gone very smoothly thus far, fingers crossed. Some of the things we did in advance to prepare:

  • load testing the entire system to isolate the weakest links: we found apachebench and httperf to be our good friends
  • instrument components to print out response times: in particular, we did this with nginx, our load balancer of choice, as it is very easy to print out upstream and client response times
  • utilize the cloud to prepare testbeds: instead of hitting our production system, we were able to set up smaller replicas in the cloud and test there
  • monitor each machine in the chain: running something as simple as top or a little more sophisticated as netstat during load testing can provide great insights. In fact, this is something not limited to load testing. Simply, monitoring production machines during heavy traffic can provide a lot of information.

Our testing showed that:

  • we needed to offload serving more static files to the CDN
  • we could use the load balancer to serve some files which were generated dynamically in the backend, yet never changed. This was an enormous saving.
  • our backend slave cloud dbs needed some semblance of tuning. I don’t consider myself to be a mysql expert by any stretch of the imagination. However, I found that our dbs were small enough and there was sufficient RAM in our AWS instances such that tweaks like increasing the query cache size and raising the innodb buffer pool ensured no disk I/O when serving requests.
  • altering our backend caching to evict after a longer period of time – this would reduce load on our dbs
  • smoothen our deployment process so we can fire up additional backend nodes and load balancers if necessary

There’s much more to be done but surviving the onslaught thus far (with plenty of remaining capacity) has definitely been very heartening. It almost (but not quite) makes up for working through most of the holiday season :)

What You Know Vs What You Don’t

To this, I have to add one of my favorite quotes:

I am not young enough to know everything

– Oscar Wilde

Architecting For The Cloud: Some Thoughts

Introduction

A key advantage associated with cloud computing is that of scalability. Theoretically, it should be easy to provision new machines or decommission older ones from an existing application and scale thus. In reality, things are not so simple. The application has to be suitably structured from ground up in order to best leverage this feature. Merely adding more CPUs or storage will not deliver linear performance improvements unless the application was explicitly designed with that goal. Most legacy systems are not and consequently, as traffic and usage grows, must be continually monitored and patched to keep performing at an acceptable level. This is not optimal. Consequently, extracting maximum utility from the cloud requires applications follow a set of architectural guidelines. Some thoughts on what those should be:

Stateless, immutable components

An important guideline for linear scalabilty is to have relatively lightweight, independent stateless processes which can execute anywhere and run on newly deployed resource (threads/nodes/cpus) as appropriate in order to serve an increasing number of requests. These services share nothing with others, merely processing asynchronous messages. At Delve, we make extensive use of this technique for multimedia operations such as thumbnail extraction, transcoding and transcription that fit well into this paradigm. Scaling for these services involves spinning up, automatically configuring and deploying additional dedicated instances which can be put to work immediately and subsequently taken down once they are no longer needed. Without planning for this type of scenario, however, it is difficult for legacy applications to leverage this type of functionality.

Reduced reliance on relational databases

Relational databases are primarily designed for managing updates and transactions on a single instance. They scale well, but usually on a single node. When the capacity of that single node is reached, it is necessary to scale out and distribute the load across multiple machines. While there are best practices such as clustering, replication and sharding to allow this type of functionality, they have to be incorporated into the system design from the beginning for the application to benefit. Moving into the cloud does not get rid of this problem.

Furthermore, even if these techniques are utilized by the application, their complexity makes it very difficult to scale to hundreds or thousands of nodes, drastically reducing their viability for large distributed systems. Legacy applications are more likely to be reliant on relational databases and moving the actual database system to the cloud does not eliminate any of these issues.

Alternatively, applications designed for the cloud have the opportunity to leverage a number of cloud based storage systems to reduce their dependence on RDBMS systems. For example, we use Amazon’s SimpleDB as core for a persistent key/value store instead of MySQL. Our use case does not require relational database features such as joining multiple tables. However, scalability is essential and SimpleDB provides a quick and uncomplicated way for us to implement this feature. Similarly, we use Amazon’s Simple Storage Service (S3) to store Write Once Read Many data such as very large video file backups and our analytics reports. Both of these requirements, were we to use MySQL like many legacy applications providing similar functionality, would require a heavy initial outlay of nodes and management infrastructure. By using SimpleDB and S3, we are able to provide functionality comparable to or better than legacy systems at lower cost.

There are caveats, however, with using nosql type systems. They have their own constraints and using them effectively requires understanding those limitations. For example, S3 works under a version of the eventual consistency model which does not provide the same guarantees as a standard file system. Treating it as such would lead to problems. Similarly, SimpleDB provides limited db functionality – treating it as a mysql equivalent would be a mistake.

Integration with other cloud based applications

A related advantage of designing for the cloud is the ability to leverage systems offered by the cloud computing provider. In our case, we extensively use Amazon’s Elastic Map Reduce (EMR) service for our analytics. EMR, like Amazon’s other cloud offerings, is a pay as you go system. It is also tightly coupled with the rest of Amazon’s cloud infrastructure such that transferring data within Amazon is free. At periodic intervals, our system spins up a number of nodes within EMR, transfers data from S3, performs computations, saves the results and tears down the instances. The functionality we thus achieve is similar to constantly maintaining a large dedicated map-reduce cluster such as that would be required by a legacy application but at a fraction of the cost.

Deployment Issues

Deploying an application to the cloud demands special preparation. Cloud machines are typically commodity hardware – preparing an environment able to run the different types of services required by an application is time consuming. In addition, deployment is not a one time operation. The service may need additional capacity to be added later, fast. Consequently, it is important to be able to quickly commission, customize and deploy a new set of boxes as necessary. Existing tools do not provide the functionality required. As cloud computing is relatively new, tools to deploy and administer in the cloud are similarly nascent and must be developed in addition to the actual application. Furthermore, developing, using and maintaining such tools requires skills typically not found in the average sysadmin. The combination of tools and personnel required to develop and run them poses yet another hurdle for moving existing applications to the cloud. For new applications, these considerations must be part of any resource planning.

Fault Tolerance

Typically, cloud infrastructure providers do not guarantee uptime for a node. This implies a box can go down at any time. Additionally, providers such as Amazon will provide a certain percentage uptime confidence level for data centers. While in reality nodes are usually stable, an application designed for the cloud has to have redundancy built in such that a) backups are running and b) they are running in separate data centers. These backup systems also must meet other application requirements such as scalability. Their data must also be in sync with that of the primary stores. Deploying and coordinating such a system, imposes additional overhead in terms of design, implementation, deployment and maintenance, particularly relational databases are involved. Consequently, applications designed from the grounds up with these constraints in mind are much more likely to have an easier transition to the cloud.

References

Entrepreneurship

Recently, courtesy house repairwork gone amok, I had occasion to spend time with Felix Ejeckam, a friend from my grad school days. Felix was recently voted Innovator of the Year by Black Enterprise magazine for his work with Group4Labs. His field, semiconductors and such, is somewhat different from mine. Yet, as long as I’ve known him, Felix has always been fiercely entrepreneurial and anyway, if you move up enough, it’s all hi-tech. I value his insights and, during my stay, I asked him about his thoughts on what qualities separated good entrepreneurs from bad. I was a little surprised to see determination, a topic recently in vogue courtesy the Paul Graham essay, not make the list, so I thought I’d share. Here goes:

  1. Unwavering self confidence in one’s own abilities. I liked how Felix phrased this – would you enter a situation where your life depended on your work? Do you have that much trust in yourself?
  2. Zoom in, zoom out – the best entrepreneurs have the ability to see the forest for the trees. However, they can also quickly become close and personal with a specific problem if necessary. While running small startups, you don’t have the luxury of sticking exclusively to either big picture or small picture.
  3. Neuroses free zone – there’s no question startups are physically demanding. However, mentally, they can be even more grueling. There’s no question it takes a certain level of obsession, recklessness and risktaking several levels beyond the norm to launch a career of this sort. However, self doubt, insecurities and all of that can easily be exacerbated in the day to day grind, sometimes with tragic results. Keeping mentally healthy is important. This is not to say there aren’t crazy chair throwing CEOs out there, just that you probably won’t have that luxury!

Video Clip Lengths

In a NYT article, Rise of Web Video, Beyond 2-Minute Clips, Brian Stelter writes:

Video creators, by and large, thought their audiences were impatient. A three-minute-long comedy skit? Shrink it to 90 seconds. Slow Internet connections made for tedious viewing, and there were few ads to cover high delivery costs. And so it became the first commandment of online video: Keep it short.

I recall coming across this phenomenon in 1997 and 1998 while doing research work into characteristics of web video stored on the web at that point in time. Here’s an interesting graph from the paper I wrote on the subject:

Web Video Lengths (1997)

Web Video Lengths (1997)

The number of videos on the web were relatively small and their sizes could be measured in seconds. 90% were 45 seconds or less. The graph is capped at around 2 minutes for maximum length although I did find outliers that were longer.

What I found interesting, however, in a followup study was that if you took away the bandwidth chokepoints, video lengths ballooned. I was studying the video access patterns of a Video On Demand experiment at the Lulea University in Sweden – the setup here was over a dedicated high speed network, effectively removing slow access as a determinant of behavior. Specifically:

Since 1995, the Centre for Distance-spanning Technology at Luleå University (CDT) has been researching distance education and collaboration on the Internet [17]. Specifically, it has developed a hardware/software infrastructure for giving WWW-based courses and creating a virtual student community. The hardware aspects include the deployment of a high speed network (2-34 Mbps backbone links) to attach the local communities to the actual University campus. The campus is also connected to the national academic backbone by a high speed 34 Mbps link [13] with student apartments being wired together with the rest of campus via 10 or 100 Mbps ethernet.

The following graph shows the distribution of video lengths for the files used in the system:

VoD Video Durations

The mean duration of these files were around 75 minutes or so. This finding hinted that as videos grew in popularity and infrastucture hurdles fell away, video durations would increase. From the original NYT article:

New Web habits, aided by the screen-filling video that faster Internet access allows, are now debunking the rule. As the Internet becomes a jukebox for every imaginable type of video — from baby videos to “Masterpiece Theater” — producers and advertisers are discovering that users will watch for more than two minutes at a time.
:
:
“People are getting more comfortable, for better or for worse, bringing a computer to bed with them,” said Dina Kaplan, the co-founder of Blip.tv.

Ms. Kaplan’s firm distributes dozens of Web series. A year ago all but one of the top 25 shows on her Web servers clocked in at under five minutes. Now, the average video hosted by Blip is 14 minutes long — “surprising even to us,” she said. The longest video uploaded in May was 133 minutes long, equivalent to a feature-length film.

Interested by this, I took a look at the duration of the videos hosted by Delve. This is based on data a couple of weeks old, so this is not representative of the latest trends. However, I found the average video duration to be a little under 6 minutes. However, within this I found definite disparities between publishers. Our top 25 publishers (by video duration) had videos that were a little under 25 minutes on average. This indicates mixed video use by our publishers. While some are still sticking to shorter videos, a significant number are definitely taking full advantage of long form clips – one of the largest videos is around 12 hours in length!

It’ll be interesting to see how these trends hold over the next year or so.

Delve Analytics: A New Foundation

If you’re using Delve Networks read on – we just swept the rug from underneath your feet and I bet you didn’t even know it! If you’re considering using us, well, you’d definitely want to know about this too.

For the past two weeks, Delve has been running a totally new, revamped analytics system – one that allows us to scale well beyond our current levels and provide a foundation for many more features to come. A peek under the hood:

  • Our event collection system is now completely in the cloud, running on Amazon EC2 instances. This gives us the ability to quickly scale up (hopefully always up :=) or down quickly depending on load. Correspondingly, we can now instrument our players to send more granular data to improve our accuracy vis a vis metrics such as playback times
  • Our analytics processing is also now completely in the cloud. We use (EMR) Amazon’s Elastic MapReduce, a service built atop Hadoop, to process our event data and generate reports. We are early adopters of this service and have engaged the Amazon EMR tech team to catch and resolve issues. One of the biggest benefits of using EMR is that we don’t need to maintain a dedicated hadoop cluster. Instead we simply select the number of machines to run for each given job submission – again, this simplifies scaling as our data sets grow.
  • We have now moved away from our dependence on MySQL and instead use S3 as a report storage repository. This allows publishers access to an archive of all past computed reports. While our front end does not offer this feature yet, the basic structure is in place to allow us to do so in the near future.

Perhaps the biggest advantage of the new rehaul is now we have a strong foundation for further enhancements, reports and features, futuristic and otherwise. Several are already in the works and are slated to be released soon. Stay tuned!

NoSQL vs S3

MongoDB, Tokyo Cabinet, Project Voldemort … Systems that provide distributed, persistent key value store functionality are proliferating like kudzu. Sometimes it seems that not a day goes by without my hearing about a new one. Case in point: just right now, while browsing, I came across Riak, a “decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.”

I understand the motivation behind the NoSQL movement: one of them has to be backlash at the problems associated with MySQL. It’s one of those beasts that is very easy to get started on but, if you don’t start with the right design decisions and growth plan, can be hellacious to scale and maintain. Sadly, this is something I’ve seen happen at places repeatedly throughout my tenure at various companies. It happens all too often. Small wonder then that developers and companies have identified one of the most frequent use cases with modern web applications and MySQL – that of needing to quickly look up key value pairs reliably – and have built or are building numerous pieces of software optimized for doing precisely that at very high performance levels.

The trouble is if you need one yourself. Which one to pick? There are some nice surveys out there (here’s one from highscalability with many good links) but most are in various stages of development, usually with version numbers less than 1 and qualifiers like “alpha” or “beta” appended. Some try to assuage your fears:

Version 0.1? Is it ready for my production data?

That kind of decision depends on many factors, most of which cannot be answered in general but depend on your business. We gave it a low version number befitting a first public appearance, but Riak has robustly served the needs of multiple revenue-generating applications for nearly two years now.

In other words, “we’ve had good experiences with it but caveat emptor. You get what you pay for.”

This is why I really enjoyed the following entry in BJ Clark’s recent survey:

Amazon S3

url: http://aws.amazon.com/s3/
type: key/value store
Conclusion: Scales amazingly well

You’re probably all like “What?!?”. But guess what, S3 is a killer key/value store. It is not as performant as any of the other options, but it scales *insanely* well. It scales so well, you don’t do anything. You just keep sticking shit in it, and it keeps pumping it out. Sometimes it’s faster than other times, but most of the time it’s fast enough. In fact, it’s faster than hitting mysql with 10 queries (for us). S3 is my favorite k/v data store of any out there.

I couldn’t agree more. Recently, I finished a major project at Delve (and I hope to write about more of this later) where one of our goals was to have all our reports we computed for our customers to be available indefinitely. Our current system stores all the reports in, you guessed it, MySQL. The trouble is this eats up MySQL resources and since we don’t do any queries on these reports, we, in essence, are simply using MySQL as a repository. By moving our reporting storage to S3 (and setting up a simple indexing scheme to list and store lookup keys), we have greatly increased the capacity for our current MySQL installation and are now able to keep and lookup reports for customers indefinitely. We are reliant on S3 lookup times and availability – but, for this use case, the former is not as big an issue and having Amazon take care of the latter frees us to worry about other pressing problems of which are fairly plentiful at a startup!

All

Next entries »