Facebook's Graph Search puts Apache Giraph on the map

Powering Facebook Open Graph, Apache Giraph was built from Yahoo and Google technologies

Facebook has found that Giraph scales near linearly with the number of workers or the problem size.
Facebook has found that Giraph scales near linearly with the number of workers or the problem size.

Move over Hadoop, there is another highly scalable data processing powerhouse in town: Apache Giraph. Facebook is using the technology to bring a new style of search to its billion users.

When Facebook built its Graph Search service, the social networking company picked Giraph over other social graphing technologies -- such as the Hadoop-based Apache Hive and GraphLab -- because of Giraph's speed and immense scalability.

"Analyzing these real-world graphs at our scale ... with available software was impossible last year. We needed a programming framework to express a wide range of graph algorithms in a simple way and scale them to massive datasets," wrote Facebook software engineer Avery Ching, in a blog post that discussed Facebook's use of the technology.

With a little modification, Facebook has used Giraph to analyze a trillion edges, or connections between different entities, in under four minutes.

In addition to using Giraph for its Graph Search, Facebook also plans to use the software for other duties such as targeting ads and ranking data.

Launched in January, Facebook's Graph Search service provides a way for users to query Facebook's massive collection of user-generated data and get back personalized results.

"Open Graph allows application developers to connect objects in their applications with real-world actions (such as user X is listening to song Y)," Ching explained.

Facebook's Graph Search, while still not as mature as regular search services such as Google's, may be the first widespread public introduction to the benefits of using social graphs.

A social graph maps the complex relationships between many different entities (called nodes). A node can be anything: a person, a restaurant, a city. They are connected by edges. An edge, for instance, asserts that a particular person may live in a certain city.

Yahoo first developed Giraph using the principles set forth in a 2010 paper published by Google engineers, "Pregel: a system for large-scale graph processing."

Using the Bulk Synchronous Parallel model of computing, Google designed Pregel to generate graphs from very large data sets, using lots of commodity servers.

Like it did with Hadoop, Yahoo bequeathed Giraph to the Apache Software Foundation, where it is now a fully open-source project worked on by developers from Facebook, LinkedIn, Twitter and Hortonworks.

Because Giraph is written in Java, Ching explained, it can connect very easily with the various parts of Facebook's Hadoop deployment, which it relies upon for data storage management and resource scheduling.

Facebook stores its user-generated data in a data warehouse running on Apache Hive, a component of Hadoop. Giraph, however, can generate graphs four times faster than Hive itself. Because it runs on Hadoop's MapReduce, a Giraph job can be split across multiple servers so it can be executed in parallel.

Facebook modified Giraph in a number of ways to make it run more efficiently, according to Ching.

Company engineers devised a number of tweaks to trim Giraph's memory usage on servers. "Giraph was a memory behemoth due to all data types being stored as separate Java objects," Ching wrote.

To improve Giraph's scalability, Facebook linked it with the Netty event-driven framework.

In one test using user interaction data, Facebook was able to use Giraph to create a 1 trillion-edge social graph in under four minutes, using 200 commodity servers.

Facebook's benchmark dwarfed previously published Giraph tests by other companies by at least two orders of magnitude. Heretofore, researchers have been able to create a 6.6 billion-edge graph using Yahoo Altavista data and a graph of Twitter data with 1.5 billion edges.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Show Comments

Latest Videos

Launch marketing council Episode 5: Retailer and supplier

In our fifth and final episode, we delve into the relationship between retailer and supplier and how it drives and influences launch marketing strategies and success. To do that, we’re joined by Campbell Davies, group general manager of Associated Retailers Limited, and Kristin Viccars, marketing director A/NZ, Apex Tool Group. Also featured are Five by Five Global managing director, Matt Lawton, and CMO’s Nadia Cameron.

More Videos

Hi,When online retailers establish their multi channel strategy and they are using or will to use live chatbot to support their customers...

Alice Labs Pte. Ltd.

CMO's top 8 martech stories for the week - 6 May 2021

Read more

Thanks for nice information regarding Account-based Marketing. PRO IT MELBOURNE is best SEO Agency in Melbourne have a team of profession...

PRO IT MELBOURNE

Cultivating engaging content in Account-based Marketing (ABM)

Read more

The best part: optimizing your site for SEO enables you to generate high traffic, and hence free B2B lead generation. This is done throug...

Sergiu Alexei

The top 6 content challenges facing B2B firms

Read more

Nowadays, when everything is being done online, it is good to know that someone is trying to make an improvement. As a company, you are o...

Marcus

10 lessons Telstra has learnt through its T22 transformation

Read more

Check out tiny twig for comfy and soft organic baby clothes.

Morgan mendoza

Binge and The Iconic launch Inactivewear clothing line

Read more

Blog Posts

Getting privacy right in a first-party data world

With continued advances in marketing technology, data privacy continues to play catchup in terms of regulation, safety and use. The laws that do exist are open to interpretation and potential misuse and that has led to consumer mistrust and increasing calls for a stronger regulatory framework to protect personal information.

Furqan Wasif

Head of biddable media, Tug

​Beyond greenwashing: Why brands need to get their house in order first

Environmental, Social and (Corporate) Governance is a hot topic for brands right now. But before you start thinking about doing good, Craig Flanders says you best sort out the basics.

Craig Flanders

CEO, Spinach

​The value of collaboration: how to keep it together

Through the ages, from the fields to the factories to the office towers and now to our kitchen tables, collaboration has played a pivotal role in how we live and work. Together. We find partners, live as families, socialise in groups and work as teams. Ultimately, we rely on these collaborative structures to survive and thrive.

Rich Curtis

CEO, FutureBrand A/NZ

Sign in