HTTPS SSH

Quarantine deceiving Yelp's users by detecting unreliable rating reviews

Online reviews, nowadays, have become a valuable and significant resource, for not only consumers but companies, in decision making. In the absence of a trusted system, highly "popular and trustworthy" internet users will be assumed as members of the trusted circle. The problem statement being argued is that: given a set of user rating reviews, determine whether they are trustworthy or unreliable; from these deceptive reviews of particular target business, identify and quarantine any Yelp account producing such over-threshold amount of those.

The purpose of this paper is to describe the authors’ focus on quarantining deceiving Yelp’s users that employs both review spike detection (RSD) algorithm and spam detection technique in bridging review networks (BRN), on extracted key features

How do I get set up?

  1. Please run this application in Python 2.7
  2. To run a unit-test for each different module, simply uncomment line 50 (unit_test()) and comment in line 51 (main())
  3. To run an entire application on entire dataset:
    2.1 import "yelp_academic_dataset_review.csv" and "yelp_academic_dataset_user.csv" into the dataset folder
    2.2 make sure that all file paths in filelocation.py are correct 2.3 run main()

Additional Notes

  • main() will cluster user.csv to find a group of popular users, return result is saved into gen_pop_uid.csv
  • next, businesses rated by popular users in step 1 will be determined from review.csv, outputs will be saved into gen_pop_bid.csv and gen_bid_trusty.csv
  • rsd will run next to determine spiky businesses
  • client_spam_score run to calculate spam score for target businesses and potential deceptive ratings
  • last, quarantined users are classified, and statistical results are outputted.

Step 1 takes about 15 mins, step 2 takes about less than 40 mins, and the other steps run in less than 5 mins. In total, the entire application should finish its execution in less than an hour.

  • Viet Trinh - vqtrinh@ucsc.edu
  • Vikrant More - vmore@ucsc.edu
  • Samira Zare - szare@ucsc.edu
  • Sheideh Homayon - shomayon@ucsc.edu