AI/ML

Artificial Intelligence / Machine Learning projects

Social network analysis: Detection of fraudulent reviewers that skew product ratings in Amazon. Modified a graph algorithm for the “Fine Foods” dataset to compute user trust, item reliability and review honesty via iterative updates in Python.

Poster Report Presentation

The data was “Web data: Amazon reviews” dataset available via the Social and Information Network Analysis course website. The full dataset contains 34,686,770 reviews. We focused on a particular dataset referred to as the “Fine Foods” dataset. The “Fine Foods” subset contains 568,454 reviews from 256,059 users. Each review has a number of positive ratings and negative ratings called helpfulness votes.

While many of the more than 34 million Amazon reviews are helpful, fake reviews can be written to skew product ratings. (For example, book authors may have fake reviews say how great their book is, or manufacturers may write to fake reviews trashing a competitor’s product.) Though users can vote a review as either helpful or unhelpful, previous ratings on a product influence how a user votes: If high variance in review star ratings (reviewers disagree), then polarizing reviews (1 or 5 stars) are voted as most helpful. If low variance in review star ratings (reviewers agree), then average reviews (near product mean) are voted as most helpful. How do we tell which reviews are authentic and identify fraudulent reviewers? We utilize a graph-based algorithm to determine the honesty of users’ reviews and trustworthiness of the user performing the reviews. We modify this algorithm for our purposes to use a single store and apply it to the ”Fine Foods” Amazon review data. We then analyze the user trust ratings computed from the algorithm with the user helpfulness metric in order to classify fraudulent users.

We want to determine the helpfulness rating of a user from all number of positive helpful and negative helpful (unhelpful) votes that all his/her reviews receive. Suppose a user wrote 2 reviews, the first one has 95 positive votes and 5 negative votes, the second has 1 positive votes and 3 negative votes. Our first method of user helpfulness calculation is sum total of helpful votes only / sum total of both helpful and unhelpful votes. In the example, the user’s rating is 96/104 where the range is 0.0 to 1.0. Our second method of user helpfulness calculation is sum total of helpful votes only – sum total of unhelpful votes only. The same user described above now has a helpfulness rating of +88 where the range is negative infinity to positive infinity. Using the first method, users with a small number of total votes achieve perfect helpfulness score. However, with the second method, we achieve the desired quality where most users now have ratings close to a mean (zero), and just a few outliers have a large positive or negative score. The second method solves the problem of rating users who have few helpfulness votes in their reviews.

Here is the algorithm to compute user trust that was written in Python:

  • User trust (-1 distrust to +1 trust): Users are trusted if their reviews are honest, users are untrusted if their reviews are dishonest.
  • Item reliability (-1 unreliable to +1 reliable): Items are reliable if they have high scores by trusted users, items are unreliable if they have low scores by trusted users.
  • Review honesty (-1 dishonest to +1 honest): First, compute review agreement: review agreement is high when review agrees with majority trusted opinion, and it is low when it disagrees with majority trusted opinion. Second, compute review honesty: normalize review agreement and take into account item reliability.
    User trust, product reliability and review honesty are computed through iterative updates: Each variable is initially set 1 and updated many times in a loop that interdependently computes their values.

Now we relate user helpfulness (from votes) with user trust (from our algorithm). When just outlier values are considered, there is a strong correlation between user trust (computed from our algorithm) and user helpfulness (calculated from the second method). For example, 87% of users with helpfulness ratings less than -100 have trustworthiness ratings less than -0.9. Therefore, our algorithm using pure graph analysis could be helpful in fraudulent user detection. Additionally, a bot that rates every product 5 stars out of 5 will fool our algorithm into thinking it is a helpful user. This is due to the skewed ratings in the data where the products, fine foods, are most commonly reviewed as 5 stars and second most commonly reviewed as 4 stars. Textual analysis of review text, such as searching for common spam phrases, would likely improve our results.

Time series forecasting: Analysis of the Global Energy Forecasting Competition “A wind power forecasting problem: predicting hourly power generation up to 48 hours ahead at 7 wind farms.” Prediction of energy based on locally weighted regression and autoregressive integrated moving average (ARIMA) models in R.

Poster Report

The goal was to predict power generated at 7 wind farms. The inputs were a 48 hour look ahead forecast for wind speed, wind direction, and the zonal and meridional wind components at each of the farms. The outputs were normalized wind power measurements between zero and one for each of the 7 wind farms. The model identification and training period was Sept. 1st, 2009 to Sept. 30th, 2010. The evaluation period was Oct. 1st, 2010 to Dec. 31st, 2010. In the evaluation period, there is a repeated 72 hour pattern: In the first 36 hours, we are given the generated wind power from 0 to 1. In the remaining 48 hours, there is missing data of generated wind power, to be predicted by our model.

We initially used two time series prediction models: ARIMA and Locally Weighted Regression. The ARIMA model is trained over the hourly generated power from the entire training set, while the Locally Weighted Regression model gives more regression weight to the training instances that are similar to the 36 hour given data in the prediction intervals. However, this often leads to huge error in the prediction. The ARIMA model predicts better than Locally Weighted Regression model, even though ARIMA model tries to average out the prediction.

We improved these models by adding wind speed features that we chose in our feature selection procedure through correlation to the generated power. The new models, ARIMAX and Locally Weighted Regression with Wind Speed, had a quite reasonable performance improvement over original models, ARIMA and Locally Weighted Regression. Locally Weighted Regression with Wind Speed beat the benchmark method by almost a factor of 2.