Computer Researchers and Google Develop an Algorithm to Identify Fake Reviews
Computer Science researchers have teamed up with Google to investigate how SPAM reviews online can be identified automatically and shut down. The research was published by Arjun Mukherjee and Bing Liu from the computer science department of the University of Illinois at Chicago, and Natalie Glance, a software designer and engineer at Google.
The researchers began by realizing that individuals who may spam a site such as Amazon, Yelp, or Netflix with fake bad or good reviews can be difficult, or even impossible to detect. The researchers decided to come at the problem from a different angle – trying to identify groups of fake reviewers, which they call spammer groups. In the abstract to their published paper, the researchers claim these fake review spam groups are even more damaging than individual spammers. They used a combination of methods to solve the spam review problem and identify spammer groups that should be shut down:
The proposed method first uses a frequent itemset mining method to find a set of candidate groups. It then uses several behavioral models derived from the collusion phenomenon among fake reviewers and relation models based on the relationships among groups, individual reviewers, and products they reviewed to detect fake reviewer groups
The researchers also built a labeled dataset of fake reviewer groups, which they were surprised to find was easier than labeling individual fake reviewers.
According to the researchers, providing fake reviews has become a business, and spammer groups can be powerful when they take control over the opinion of a product or service. Spammer groups are not necessarily made up of multiple fake reviewers. Often, it could be just one person with many multiple accounts. The researcher’s algorithms take into account the frequency of reviews, how similar reviews are, and how quickly reviews are posted, since spammers tend to get their fake reviews up first. Also, reviews by a spammer group might be for a disparate multitude of unrelated products, which is highly suspicious.
Experimental results listed in the paper showed that the researchers new algorithms outperformed other “state-of-the-art supervised classification, regression, and learning to rank algorithms.” This is good news for consumers who want unbiased online reviews of products from real people who use the products.
(via The Register)