soam's home

home mail us syndication

Archive for January, 2011

Data Trends For 2011

From Ed Dumbill at O’Reilly Radar comes some nice thoughts on key data trends for 2011. First, the emergence of a data marketplace:

Marketplaces mean two things. Firstly, it’s easier than ever to find data to power applications, which will enable new projects and startups and raise the level of expectation—for instance, integration with social data sources will become the norm, not a novelty. Secondly, it will become simpler and more economic to monetize data, especially in specialist domains.

The knock-on effect of this commoditization of data will be that good quality unique databases will be of increasing value, and be an important competitive advantage. There will also be key roles to play for trusted middlemen: if competitors can safely share data with each other they can all gain an improved view of their customers and opportunities.

There’s a number of companies emerging that crawl the general web, Facebook and Twitter to extract raw data, process/cross-reference that data and sell access. The article mentions InfoChimp and Gnip. Other practitioners include BackType, Klout, RapLeaf etc. Their success indicates a growing hunger for this type of information. I definitely seeing this need where I am currently. Limelight, by virtue of its massive CDN infrastructure and customers such as Netflix, collects massive amounts of user data. Such data could greatly increase in value when cross referenced against other databases which provide additional dimensions such as demographic information. This is something that might best be obtained from some sort of third party exchange.

Another trend that seems familiar is the rise of real time analytics:

This year’s big data poster child, Hadoop, has limitations when it comes to responding in real-time to changing inputs. Despite efforts by companies such as Facebook to pare Hadoop’s MapReduce processing time down to 30 seconds after user input, this still remains too slow for many purposes.
:
:
It’s important to note that MapReduce hasn’t gone away, but systems are now becoming hybrid, with both an instant element in addition to the MapReduce layer.

The drive to real-time, especially in analytics and advertising, will continue to expand the demand for NoSQL databases. Expect growth to continue for Cassandra and MongoDB. In the Hadoop world, HBase will be ever more important as it can facilitate a hybrid approach to real-time and batch MapReduce processing.

Having built Delve’s (near) real time analytics last year, I am familiar with the pain points of leveraging hadoop to fit into this kind of role. In addition NoSQL based solutions, I’d note that other approaches are emerging:

It’s interesting to see how a new breed of companies have evolved from treating their actual code as a valuable asset to giving away their code and tools and treating their data (and the models they extract from that data) as major assets instead. With that in mind, I would add a third trend to this list: the rise of cloud based data processing. Many of the startups in the data space use Amazon’s cloud infrastructure for storage and processing. Amazon’s ElasticMapReduce, which I’ve written about before, is a very well put together and stable system that obviates the need to maintain a continuously running Hadoop cluster. Obviously, not all applications fit this criteria but if it does, it can be very cost effective.

View From My Office

After a couple of years of working remotely, it still feels strange to have an office of my own, let alone one with a modicum of a view of the downtown SF skyline, so I am enjoying it while it lasts. We’re scheduled to move to a more central SOMA location sometime in the next month.

View From My Office

Gesture Recognition And Music

Despite Farhad Manjoo’s assertions that a week at CES is essentially a week wasted (“The Most Worthless Week in Tech”), I found this LA Times article talking about gesture recognition vendors at CES to be particularly interesting:

Competing examples on display were from PrimeSense, the Israeli designers of the microchips that power Microsoft’s popular controller-free Kinect gaming accessory, and Softkinetic, a Belgian rival that powered an interactive billboard in Hollywood last summer for “The Sorcerer’s Apprentice.” The former relies on an approach called structured light — a projector fills the area in front of the display with beams of infrared light, then a sensor detects how the beams are distorted by moving objects. The latter takes the so-called time of flight approach, which detects motion by projecting light in front of a display and measuring how long it takes to bounce back.

PrimeSense has a considerable head start in the gesture recognition field thanks to the inclusion of its technology in Kinect — Microsoft sold some 8 million units of the device in 60 days. But games are “just the tip of the iceberg,” said Uzi Breier, executive vice president of PrimeSense. “We’re in the middle of a revolution. We’re changing the interface between man and machine.”

PrimeSense is focused on living room devices, while SoftKinect is also active in display advertising and medical applications. Breier said other possible uses include automobile security and safety, robotics, home security and rehabilitation.

To this list of uses, I would add another: music. Anyone who has played air guitar, air drums and/or the theremin would agree, I think. Percussion, in particular, would be a natural fit. Perhaps, in the future, conducting itself would be the actual performance and the orchestra would not even be there!

Theremin