Ebay: Hadoop 'flexibility' favoured alongside Teradata for customer data analysis
- 07 November, 2013 17:03
Ebay is favouring the flexibility of NoSQL tools such as Hadoop alongside Teradata as its seeks to gain better insight into its customer data.
Speaking at the Financial Information Management (FIMA) event in London yesterday, Mark Uksusman, senior manager, data architecture at Ebay, claimed the online retailer has one of the largest implementations of Teradata in production. This includes two major Teradata clusters, one a normal data warehouse for its traditional reporting system, involving very structured data, and the other a bespoke platform, called Singularity, developed for deep analytics and data discovery.
However he said that open source data base software such as Hadoop, MongoDB and Cassandra offer more benefits for analysing data than its relational data bases.
"We are one of the biggest Teradata implementations in the world, we are processing 90PB of data," said Uksusman. "But are we optimised 100 percent? Maybe not, and maybe we need to think a bit about optimising our data warehouse and offload Teradata to Hadoop environments, which is more flexible and more developed for data discovery."
Uksusman explained that although tools such as Hadoop have clear benefits in areas such as real-time analytics, NoSQL databases are only considered for certain use cases, and would still tend towards relational databases where possible.
"I don't want to say that it is a 100 percent right solution [using non-relational data bases]. If you want to talk about secure transactions you have to ensure data governance and that your records are accurate, [for this} we are still using relational data base management system, Oracle, Teradata and so on.
"But if you are looking at something around data discovery, you would like to do some very quick processing and analyse information on the fly, and do some analysis of non-structural information, this is why NoSQL technology is there."
The use of these tools is helping Ebay gain better insight into the masses of data that it generates and processes across its business. Uksusman said that, with the analysis of unstructured data such as social media feeds, this runs into many petabytes of data.
"We are not talking about terabytes any more at Ebay, not even hundreds of terabytes, we are talking about hundreds of petabytes that we are processing on different channels."
The information gained by Ebay's team of data scientists is being used to bring insights into customer behaviour, and enable predictive marketing.
This provides benefits as the company, which started trading purely as an auction site, moves into fixed price retail and attempts to tie in wider services such as its payments platform PayPal, and StubHub ticket trading.
"Right now we are doing more and more integration between the different companies. The challenge is huge because the companies already have some historic information, historic systems and data platforms, so integration is a humongous project for us."
"We would like to know if you go to StubHub or Paypal, and we would like to know if you are a customer of Ebay so we can offer you something on Ebay.com. Or, if you are a customer of Ebay, you can go to StubHub and buy tickets there."
He added: "To make sure we have a 360 degree view of our customer is very important, and big data is actually helping us to create this."