Picture this. You’re at a Gourmerican burger joint chomping a cheeseburger, when an outspoken vegan friend starts preaching that you’re killing the planet. Last week, that same vegan downed a pricey glass of pinot before their flight to a far-flung destination, armed with their strongest mossie repellant and first aid kit. Anything amiss?
A team of university researchers in Austria are pooling human intelligence and cutting-edge data mining technology in a bid to solve the puzzle of understanding social media and online-based consumer sentiments accurately.
The uComp research project at the Modul University in Vienna, aims to extract complex, unstructured and often contradictory knowledge from social media engagement, along with other noisy and multilingual online data sets, and interpret it in a robust, accurate and scalable way. It plans to achieve this by combining newly created automated knowledge extraction software tools with the “wisdom of the crowds”.
In June, the uComp project announced an open source-based extensible Web Retrieval Toolkit (eWRT), which captures data from different public sources such as social media information, and accurately identifies gathered information items using language recognition. It also claims to promote a transparent approach to analysing data from social media platforms.
The new tool also supports text acquisition, detection of phonetic similarities, as well as standardised integration and archiving of captured information. Additional functions include the ability to archive large volumes of data, as well as manage and normalise relevant metadata.
"Millions of people express their opinions using social media, but with conventional methods we are unable to determine the collective mood expressed in social media in real time,” the head of Modul University’s New Media Technology department and project technical director, professor Arno Scharl, said.
“We do not know which aspects move people, mobilise people or stimulate their thoughts. The technologies from the uComp project provide us with better ways to capture opinions on a global basis, irrespective of language barriers, national borders and cultural differences."
Unlike traditionally structured databases such as libraries or large corporate archives, online information is fragmented and disordered, which makes it difficult to extract knowledge automatically, the university professor explained. Social media makes it even more complicated because it is difficult to determine the specific context of a posting, while the use of slang, dialects or foreign words challenges existing tools for text analysis.
The eWRT software package has its roots in another Austrian research project called Divine, which looks at aspects of dynamic information integration and visualisation. In addition, the research is also working off emerging research findings in Embedded Human Computation, which aims to integrate and advance human and machine computation research.
According to the uComp website, EHC goes beyond mere data collection and embeds human computation into adaptive knowledge extraction workflows. The project aims to provide a scalable and generic HC framework for knowledge extraction and evaluation, delegating the most difficult tasks to large communities of users and continuously learning from their feedback to optimise automated methods.
Although uComp’s work is generic, the team’s main focus is on climate change because of the complex data sets and often conflicting interpretations. It is now collaborating with a range of international bodies including the European Environment Agency, and the NASA Earth Observatory.
The uComp project is being funded by the Austrian Science Fund and is supported by the UK’s University of Sheffield, France’s Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur and the Vienna University of Economics and Business and Modul University Vienna.
Over the next two-and-a-half years, the uComp team plans to focus on human analysis and validating data gathered with the new eWRT tool. Professor Scharl also claimed the work is entering “unknown digital territory” by integrating the ‘games with a purpose’ approach into its framework to identify complex knowledge patterns.
The ‘games with a purpose’ approach has already been used in EHC research and includes using online games for classifying documents or for evaluating automatic translations.
"We are currently investigating ways of engaging people and providing incentives for participants to share their knowledge,” professor Scharl said. “At the same time we need to evaluate the reliability of their contributions, prevent manipulation and assess the quality of results.
“The uComp project will advance the state of the art by offering all these capabilities in an integrated, reusable framework."
More social media innovation