Databricks takes on Google data streaming analysis with Spark

Databricks Cloud will provide Spark-based streaming analysis as a service

Taking on Google, Databricks plans to offer its own cloud service for analyzing live data streams, one based on the Apache Spark software.

Databricks Cloud is designed to provide a platform for analyzing streaming data, much like the Google DataFlow service announced last week.

Like Google DataFlow, Databricks Cloud promises to offer a single programming model that cuts across different approaches to data analysis, including support for batch programming and live data streaming. And like Google DataFlow, Databricks Cloud will first be offered in preview mode, with full commercial support due by the end of the year.

The two services are aimed to different markets, according to Ion Stoica, CEO of Databricks.

"Google DataFlow is really targeted to developers. We also have higher-level interfaces for data scientists and data engineers," Stoica said.

Databricks also guarantees application portability. Because the entire stack is based on open source software, users can move their workloads to other Apache Spark installations should they need to, Stoica said. "You can take your application and run it in another cloud," Stoica said.

Such a service could be used by enterprises for tasks such as churn analysis, which can determine why a customer stops using a product, or for fraud detection, where a malicious activity can be spotted while it is still taking place.

The University of California, Berkeley's AMP (Algorithms, Machines and People) Lab originally developed Spark as a unified processing engine, one able to provide a platform for a variety of data analysis tasks, including interactive queries, steaming data analysis, machine learning and graph computation.

A number of developers behind Spark went on to form Databricks. The software itself, designed to run on a cluster of servers, is now managed as an open source project under the guidance of the Apache Software Foundation.

Offering Spark as a service eliminates the arduous task for setting up and maintaining an in-house implementation of Spark, Stoica noted.

"Clusters are hard to set up and maintain. To build a data pipeline, you need to stitch together multiple tools, and the tools are still hard to use. So extracting value out of the data is still a struggle," Stoica said.

Initially, Databricks Cloud will be run on Amazon Web Services, though eventually it will also run on other cloud providers such as Google.

In addition to the Spark platform itself, Databricks will provide a set of built-in applications that can do common data analysis tasks. Users can build their own workflows, or issue queries and interact with the data directly. Output can be piped to a dashboard or a report.

Databricks is not the only company making use of Spark's capabilities. ClearStory offers an analytics software package based on Spark that allows organizations to aggregate dozens of unstructured data sources for analysis, far more than can be easily done through traditional business intelligence tools, said ClearStory CEO Sharmila Mulligan.

Databricks also announced Monday that it has received US$33 million in series B funding led by venture capital firm, New Enterprise Associates, with follow-on investment from Andreessen Horowitz.

Join the CMO newsletter!

Error: Please check your email address.
Show Comments

Supporting Association

Blog Posts

People in vegan houses shouldn't throw bacon

Picture this. You’re at a Gourmerican burger joint chomping a cheeseburger, when an outspoken vegan friend starts preaching that you’re killing the planet. Last week, that same vegan downed a pricey glass of pinot before their flight to a far-flung destination, armed with their strongest mossie repellant and first aid kit. Anything amiss?

Abbie Love

Strategist, Ikon Communications

The role of the CMO is evolving: Are you keeping up?

My (amazing) vacation in the Galapagos Islands earlier in the year got me thinking about Charles Darwin and his theory of evolution. What does this have to do with the role of today’s CMO, you ask? Plenty.

Sheryl Pattek

Vice-president, executive partner

Getting your business ready for the Entrepreneurial Consumer

We all know the digital revolution has completely transformed the way consumers are interacting with brands, and that a lot of businesses are finding it hard to catch up. One way to closing this brand gap is to understand consumer behaviour and build a brand experience that meets these new needs.

Pip Stocks

CEO and founder, BrandHook

Or just go to sites like www.shopsthatshiptoaustralia.c... and others and be sure that the stores will send to where you live :-)


Why online shopping is like dating – RedBalloon CEO

Read more

Personalisation is the key. Customers demand a very relatable and well defined CX where the sincerity and understanding of their disposit...

Hitesh Parekh

In pictures: Improving cutomer experiences through smart personalisation

Read more

Thanks for this. The key for me is the effective of governance where it dictates and sets the proactive policy when it comes to CX. Tech ...

Hitesh Parekh

6 lessons in modern marketing from a customer experience chief

Read more

Very well said “With today’s consumers more demanding of the brands and merchants they shop, it’s imperative for merchants to not just co...


CMO's top 10 martech stories for this week - 29 September

Read more

Very interesting article which touches on the importance of a feedback loop fuelled by customer and market insights. Ideally this scenari...

Andrew Reid

Building customer insights in the data and digital age

Read more

Latest Podcast

More podcasts

Sign in