DataSift gets historical with Twitter firehose

Krystal Peak · February 28, 2012 · Short URL:

The Twitter filtering, search tool launches access to tweets dating back to Jan. 2010

As marketing executives and analytic continue to find more ways to use Twitter information for brand insight and trends, one company is offering up more information and access than has ever been offered.

DataSfit announced the launch of its new product Historics, which gives its users the ability to search through the Twitter firehouse back to January 2010. 

The leaders over at DataSift had always planned to organize and archive historical Twitter data and just four moths after its launch, the company has been able to accrue an astonishing two years of information -- this is no easy feat since, in 2011 alone, 85 billion tweets were made and the information tweeted each day can average a terabyte of data.

While some simpler information and trends can be mined from the generally accessible data from Twitter, many more specify analytics require accessing the entire Twitter firehouse (which pulls deeper layers of data such as the location of tweets, the language of origin, gender of the user and other information that is only accessible by a handful old companies that have been thoroughly vetted by Twitter.

DataSift uses its complex filters to provide its users any level of analysis on how certain topics, brands, citizens or events are being discussed on Twitter.

I spoke with DataSift CEO Rob Bailey about the types of searches and analysis that he has been excited by since his company's launch in November.

"I just keep getting blown away by the research and studies that people can derive from Twitter data," Bailey told me. "I have seen everything from searches that measure company sentiment in order to predict stock price changes to major US cities looking at its citizenry and their biggest concerns."

Bailey has even seen studies that look at the tweets from farmers to predict weather patterns and farming cycles.

The possibilities are wide open, and since more than half of the people that have already started working with DataSift were really interested in historical data, the company got to work archiving that information for better search options and results (which usually could only go back in time a month or to the start of the company.)

Now that companies can tap into real-time tweets and historical tweets, DataSift is interested in seeing just how powerful the data it houses can be for different companies, entities and brands, but it is also important to note that not just anyone can tap into DataSift information. The San Francisco company puts companies through a crucial vetting process to access that the information being obtained is used for positive purposes and not to spam or negatively target tweeters. 

"Just as we were vetted to gain Twitter firehouse access, we will reject access to companies that aren't using this data for good," Bailey told me.

Founder & CTO Nick Halstead explained just what a mountain of data they are wiring with, since the company is now able to process a whole month of Twitter data in an hour and DataSift is running over half a petabyte of storage to handle the extreme volume of tweets.

Historics is available today as a limited release to existing customers and is currently scheduled to be generally available in April 2012. The service itself is available as either a corporate subscription or pay-as-you-go service.

Since its inception, DataSift has gathered together $8 million in funding from various investors and financial groups and is gearing up to staff aggressively within the U.S.

The co-founder Halstead has been a passionate coder since he was a child and wanted to harness the immense information available on social platforms, such as Twitter, but it took a while to get the system to the point were it could launch.

"As Twitter grew, it weathered some serious pains and people became familiar with the infamous fail whale," said Halstead, who has founded and worked for several technology companies, including Techmeme. "We wanted to make sure that we had the infrastructure to handle the sheer data we were going to be sifting through before we went live."

DataSift has already partnered with companies like Klout, to add more search options to the service and has signed on several global brands that will begin using its search tools immediately for marketing and analysis purposes.

Currently, DataSift is looking to move deeper into the Twitter space, and hopes that soon after its launch it will have more features to offer in the sorting of Twitter data. Eventually they would like to expand to other platforms, however, such as Facebook, Google+ and blogs.

"Our goal is to make data available to everyone interested," said Bailey, who was pointed to the ability that paid users have of sharing their data results with anyone for free. "We think that data should be disseminated and help everyone better understand the climate we are in, moment by moment." 




Support VatorNews by Donating

Read more from our "Trends and news" series

More episodes

Related Companies, Investors, and Entrepreneurs



Joined Vator on

DataSift Inc. is a social data platform company, enabling enterprises and entrepreneurs to aggregate, filter and extract insights from the billions of public social conversations on Twitter, leading social networks and millions of other sources.

Through a licensing agreement with Twitter, DataSift provides companies with both real-time and historical Tweets to filter and uncover insights and trends that relate to brands, businesses, financial markets, news and public opinion.

DataSift is an on-demand platform with a flexible pricing scale that makes enterprise-level data accessible to companies of any size. DataSift has offices in San Francisco and Reading, U.K. It has received investment from IA Ventures (Roger Ehrenberg), a fund that is focused exclusively on Big Data, and from GRP Partners (Mark Suster). For more information, visit and follow us on twitter @datasift.