Google's Dataflow can analyze data as it crosses the wire
Taking what many see as the next step in big data analysis, Google is previewing a service called Google Cloud Dataflow that analyzes live data, potentially giving users the ability to view trends and be alerted to events as they happen.
"There's an enormous amount of data being created, and so you need a way to ingest that in a more intelligent way," said Brian Goldfarb, Google Cloud Platform head of marketing. With big data, "the program models are different. The technologies are different. It requires developers to learn a lot and manage a lot to make it happen."
"It is a fully managed service that lets you create data pipelines for ingesting, transforming and analyzing arbitrary amounts of data in both batch or streaming mode, using the same programming model," Goldfarb said.
Google Cloud Dataflow is designed so the user can focus on devising proper analysis, without worrying about setting up and maintaining the underlying data piping and processing infrastructure.
It could be used for live sentiment analysis, for instance, where an organization estimates the popular sentiment around a product by scanning social networks such as Twitter. It could also be used as a security tool to watch activity logs for unusual activity.
"There are a bunch of different business applications in which it could apply. In a lot of data-centric verticals, like retail or oil and gas, a technology like this could open the door to getting analytics," Goldfarb said.
It could also be used an alternative to commercial ETL (extract, transform and load) programs, widely used to prepare data for analysis by business intelligence software.
Google Cloud Dataflow is based on technologies that the company built internally for its own use, following up on work it did on the MapReduce programming model, which is used in Apache Hadoop.
Live data stream analysis appears to be the next logical step in big data analysis, a field pioneered by Hadoop. Hadoop provides a way to analyze massive amounts of unstructured data spread across multiple servers. Originally, Hadoop used MapReduce as the platform to write programs that analyze the data.
MapReduce's limitation is that it can only analyze data in batch mode, which means all the data must be collected before it can be analyzed. A number of new software programs have been developed to get around the limitation of batch processing, such as Twitter Storm and Apache Spark, which are both available as open source and can run on Hadoop.
The service provides a software development kit that can be used to build complex pipelines and analysis. Like MapReduce, Cloud Dataflow will initially use the Java programming language. In the future, other languages may be supported.
The pipelines can ingest data from external sources and use them for a variety of things. The service provides a library to prepare and reformat data for further analysis, and users can write their own transformations.
The treated dataset can be queried against using Google's BigQuery service. Or the user can write modules to examine the data as it crosses the wire, to look for aberrant behavior or trends in real-time.
Google announced Cloud Dataflow at the company's Google I/O user conference in San Francisco. A small number of Google customers are testing it and the company plans to open it up as a public preview later this year.
Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com
In this latest episode of our conversations over a cuppa with CMO, we catch up with the delightful Pip Arthur, Microsoft Australia's chief marketing officer and communications director, to talk about thinking differently, delivering on B2B connection in the crisis, brand purpose and marketing transformation.
The lessons emerging from a year like 2020 are what make the highlights, not necessarily what we gained. One of these is renewed emphasis on sustainability, and by this, I mean complete circular sustainability.
The past 12 months have been a confronting time for marketers, with each week seemingly bringing a new challenge. Some of the more notable impacts have been customer-centric, driven by shifting priorities, new consumption habits and expectation transfer.
ABM has been the buzzword in digital marketing for a while now, but I feel many companies are yet to really harness its power. The most important elements of ABM are to: Identify the right accounts; listen to these tracked accounts; and hyper-personalise your content to these accounts to truly engage them. It’s this third step where most companies struggle.
Hey there! Very interesting article, thank you for your input! I found particularly interesting the part where you mentioned that certain...
Martin Valovič
Companies don’t have policies to disrupt traditional business models: Forrester’s McQuivey
I too am regularly surprised at how little care a large swathe of consumers take over the sharing and use of their personal data. As a m...
Catherine Stenson
Have customers really changed? - Marketing edge - CMO Australia
The biggest concern is the lack of awareness among marketers and the most important thing is the transparency and consent.
Joe Hawks
Data privacy 2021: What should be front and centre for the CMO right now
Thanks for giving these awesome suggestions. It's very in-depth and informative!sell property online
Joe Hawks
The new rules of Millennial marketing in 2021
In these tough times finding an earning opportunity that can be weaved into your lifestyle is hard. Doordash fits the bill nicely until y...
Fred Lawrence
DoorDash launches in Australia