Final Project v1.7
Description and note
Task 1. Choose one specific type of product to analyze. (Product type information is already merged in the dataframe merged_df, we can just filter with the columns ‘title’ and ‘description’) Tool link (additional reference): https://colab.research.google.com/drive/12r4KJVbNqjjhiZ6aeiaG809x4-Tg5fm8?usp=sharing#scrollTo=V06gw2d93Q5D 2. Consider how to utilize different scores (1-5) to extract ideas on product features ie: high score have what kind of features & low score have what kind of feature 3. High and low score specific analysis
Load the data
import pandas as pd
import json
# 2
def load_json_file(file_path):
data = []
with open(file_path, 'r') as file:
for line in file:
json_line = json.loads(line.strip())
data.append(json_line)
return data
file_path = ''
data_list = load_json_file(file_path)
df = pd.DataFrame(data_list)
# Replace the file path and name with your own
file_path = ''
# Load the data into a list of dictionaries
data_list = load_json_file(file_path)
# Convert the list of dictionaries into a pandas DataFrame
meta_df = pd.DataFrame(data_list)
df.head(10)
overall | verified | reviewTime | reviewerID | asin | style | reviewerName | reviewText | summary | unixReviewTime | vote | image | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5.0 | True | 01 5, 2018 | A2HOI48JK8838M | B00004U9V2 | {‘Size:’: ‘ 0.9 oz.’} | DB | This handcream has a beautiful fragrance. It d… | Beautiful Fragrance | 1515110400 | NaN | NaN |
1 | 5.0 | True | 04 5, 2017 | A1YIPEY7HX73S7 | B00004U9V2 | {‘Size:’: ‘ 3.5 oz.’} | Ajaey | wonderful hand lotion, for seriously dry skin,… | wonderful hand lotion | 1491350400 | NaN | NaN |
2 | 5.0 | True | 03 27, 2017 | A2QCGHIJ2TCLVP | B00004U9V2 | {‘Size:’: ‘ 250 g’} | D. Jones | Best hand cream around. Silky, thick, soaks i… | Best hand cream around | 1490572800 | NaN | NaN |
3 | 5.0 | True | 03 20, 2017 | A2R4UNHFJBA6PY | B00004U9V2 | {‘Size:’: ‘ 3.5 oz.’} | Amazon Customer | Thanks!! | Five Stars | 1489968000 | NaN | NaN |
4 | 5.0 | True | 02 28, 2017 | A2QCGHIJ2TCLVP | B00004U9V2 | {‘Size:’: ‘ 0.9 oz.’} | D. Jones | Great hand lotion. Soaks right in and leaves … | Great hand lotion! | 1488240000 | NaN | NaN |
5 | 5.0 | True | 02 25, 2017 | A1606LA683WZZU | B00004U9V2 | {‘Size:’: ‘ 250 g’} | Amr | Great product. Doesn’t leave you hands feeling… | Five Stars | 1487980800 | NaN | NaN |
6 | 5.0 | True | 02 25, 2017 | A1606LA683WZZU | B00004U9V2 | {‘Size:’: ‘ 3.5 oz.’} | Amr | Great product. Doesn’t leave you hands feeling… | Five Stars | 1487980800 | NaN | NaN |
7 | 5.0 | True | 01 30, 2017 | A1606LA683WZZU | B00004U9V2 | {‘Size:’: ‘ 0.9 oz.’} | Amr | Just as described. Arrived on time. | Five Stars | 1485734400 | NaN | NaN |
8 | 4.0 | False | 01 24, 2017 | A1YY53NQXFKMRN | B00004U9V2 | {‘Size:’: ‘ 3.5 oz.’} | Trixie | Nice lightweight hand cream for the summer. | Smells good, absorbs quickly | 1485216000 | NaN | NaN |
9 | 5.0 | True | 12 1, 2016 | A3R0NQ9E53JHYQ | B00004U9V2 | {‘Size:’: ‘ 250 g’} | T. Hooth | Best hand cream ever. | Five Stars | 1480550400 | NaN | NaN |
meta_df.head(10)
category | tech1 | description | fit | title | also_buy | tech2 | brand | feature | rank | also_view | details | main_cat | similar_item | date | price | asin | imageURL | imageURLHighRes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [] | [After a long day of handling thorny situation… | Crabtree & Evelyn – Gardener’s Ultra-Moist… | [B00GHX7H0A, B00FRERO7G, B00R68QXCS, B000Z65AZ… | [] | 4,324 in Beauty & Personal Care ( | [B00FRERO7G, B00GHX7H0A, B07GFHJRMX, B00TJ3NBN… | {‘ Product Dimensions: ‘: ‘2.2 x 2.2 … | Luxury Beauty | $30.00 | B00004U9V2 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||||||
1 | [] | [If you haven’t experienced the pleasures of b… | AHAVA Bath Salts | [] | [] | 1,633,549 in Beauty & Personal Care ( | [] | {‘ Product Dimensions: ‘: ‘3 x 3.5 x … | Luxury Beauty | B0000531EN | [] | [] | |||||||
2 | [] | [Rich, black mineral mud, harvested from the b… | AHAVA Dead Sea Mineral Mud, 8.5 oz, Pack of 4 | [] | [] | 1,806,710 in Beauty & Personal Care ( | [] | {‘ Product Dimensions: ‘: ‘5.1 x 3 x … | Luxury Beauty | B0000532JH | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | |||||||
3 | [] | [This liquid soap with convenient pump dispens… | Crabtree & Evelyn Hand Soap, Gardeners, 10… | [] | [] | [] | [B00004U9V2, B00GHX7H0A, B00FRERO7G, B00R68QXC… | {‘ Product Dimensions: ‘: ‘2.6 x 2.6 … | Luxury Beauty | $15.99 | B00005A77F | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||||||
4 | [] | [Remember why you love your favorite blanket? … | Soy Milk Hand Crme | [B000NZT6KM, B001BY229Q, B008J724QY, B0009YGKJ… | [] | 42,464 in Beauty & Personal Care ( | [] | {‘ Product Dimensions: ‘: ‘7.2 x 2.2 … | Luxury Beauty | $18.00 | B00005NDTD | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||||||
5 | [] | [Winter, summer, spring or fall, this soothing… | AHAVA Dermud Enriched Intensive Foot Cream, 4…. | [] | [] | 1,527,650 in Beauty & Personal Care ( | [] | {‘ Product Dimensions: ‘: ‘2.5 x 2.3 … | Luxury Beauty | B00005R7ZZ | [] | [] | |||||||
6 | [] | [Highly concentrated formula created to rejuve… | AHAVA Dermud Intensive Nourishing Hand Cream, … | [] | [] | 1,538,330 in Beauty & Personal Care ( | [] | {‘ Product Dimensions: ‘: ‘2.5 x 2.3 … | Luxury Beauty | B00005R7ZY | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | |||||||
7 | [] | [<P><STRONG>Please note: Due to product improv… | Supersmile Powdered Mouthrinse | [B0010Y3M2S, B00005V50B, B00NNZWXEK, B001AC3VI… | [] | 122,723 in Beauty & Personal Care ( | [B07CHTPD6W, B07D72B2VX, B07CLR4T96, B07B9XZ3K… | {‘ Product Dimensions: ‘: ‘5.8 x 2.8 … | Luxury Beauty | $21.73 | B00005V50C | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||||||
8 | [] | [Created by Dr. Irwin Smigel, world-renowned “… | Supersmile Professional Teeth Whitening Toothp… | [B00NNZWXEK, B0057MMSWY, B00TZJDY4Q, B001ABYRZ… | [] | 5,522 in Beauty & Personal Care ( | [B00TZJDY4Q, B07CHTPD6W, B076GZSV93, B00KAC7LE… | {‘ Product Dimensions: ‘: ‘1.8 x 1.4 … | Luxury Beauty | $23.00 | B00005V50B | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||||||
9 | [] | [Naturally stimulating essential oils make our… | Archipelago Morning Mint Body Lotion ,18 Fl Oz | [B001IJOYJA, B008J720A4, B001IJQR68, B008J722A… | [] | 20,146 in Beauty & Personal Care ( | [B001JB55SQ, B00J0A448K, B001IJQR68, B008J722A… | {‘ Product Dimensions: ‘: ‘2.6 x 2.6 … | Luxury Beauty | $25.00 | B000066SYB | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… |
# Merge the DataFrames on the 'asin' column
merged_df = df.merge(meta_df, on='asin', how='left')
# Display the first few rows of the merged DataFrame
merged_df
overall | verified | reviewTime | reviewerID | asin | style | reviewerName | reviewText | summary | unixReviewTime | … | feature | rank | also_view | details | main_cat | similar_item | date | price | imageURL | imageURLHighRes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5.0 | True | 01 5, 2018 | A2HOI48JK8838M | B00004U9V2 | {‘Size:’: ‘ 0.9 oz.’} | DB | This handcream has a beautiful fragrance. It d… | Beautiful Fragrance | 1515110400 | … | [] | 4,324 in Beauty & Personal Care ( | [B00FRERO7G, B00GHX7H0A, B07GFHJRMX, B00TJ3NBN… | {‘ Product Dimensions: ‘: ‘2.2 x 2.2 … | Luxury Beauty | $30.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
1 | 5.0 | True | 01 5, 2018 | A2HOI48JK8838M | B00004U9V2 | {‘Size:’: ‘ 0.9 oz.’} | DB | This handcream has a beautiful fragrance. It d… | Beautiful Fragrance | 1515110400 | … | [] | 4,324 in Beauty & Personal Care ( | [B00FRERO7G, B00GHX7H0A, B07GFHJRMX, B00TJ3NBN… | {‘ Product Dimensions: ‘: ‘2.2 x 2.2 … | Luxury Beauty | $30.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
2 | 5.0 | True | 04 5, 2017 | A1YIPEY7HX73S7 | B00004U9V2 | {‘Size:’: ‘ 3.5 oz.’} | Ajaey | wonderful hand lotion, for seriously dry skin,… | wonderful hand lotion | 1491350400 | … | [] | 4,324 in Beauty & Personal Care ( | [B00FRERO7G, B00GHX7H0A, B07GFHJRMX, B00TJ3NBN… | {‘ Product Dimensions: ‘: ‘2.2 x 2.2 … | Luxury Beauty | $30.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
3 | 5.0 | True | 04 5, 2017 | A1YIPEY7HX73S7 | B00004U9V2 | {‘Size:’: ‘ 3.5 oz.’} | Ajaey | wonderful hand lotion, for seriously dry skin,… | wonderful hand lotion | 1491350400 | … | [] | 4,324 in Beauty & Personal Care ( | [B00FRERO7G, B00GHX7H0A, B07GFHJRMX, B00TJ3NBN… | {‘ Product Dimensions: ‘: ‘2.2 x 2.2 … | Luxury Beauty | $30.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
4 | 5.0 | True | 03 27, 2017 | A2QCGHIJ2TCLVP | B00004U9V2 | {‘Size:’: ‘ 250 g’} | D. Jones | Best hand cream around. Silky, thick, soaks i… | Best hand cream around | 1490572800 | … | [] | 4,324 in Beauty & Personal Care ( | [B00FRERO7G, B00GHX7H0A, B07GFHJRMX, B00TJ3NBN… | {‘ Product Dimensions: ‘: ‘2.2 x 2.2 … | Luxury Beauty | $30.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
35853 | 4.0 | False | 09 3, 2017 | A2CF66KIQ3RKX3 | B01GOZ61O8 | NaN | Vivian Deliz | I like to use moisturizers and sunscreens that… | Works great as a moisturizer and sunscreen | 1504396800 | … | [] | 60,938 in Beauty & Personal Care ( | [B00YHMQDC6, B01GKH6FTQ, B00YHPQLO8, B00J9POUB… | {‘ Product Dimensions: ‘: ‘5.8 x 1 x … | Luxury Beauty | $49.99 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
35854 | 4.0 | False | 09 3, 2017 | A1LKOIZXPQ9VG0 | B01GOZ61O8 | NaN | Elisa 20 | I wouldn’t be able to afford this if not asked… | Nice skin care product and sunscreen if you do… | 1504396800 | … | [] | 60,938 in Beauty & Personal Care ( | [B00YHMQDC6, B01GKH6FTQ, B00YHPQLO8, B00J9POUB… | {‘ Product Dimensions: ‘: ‘5.8 x 1 x … | Luxury Beauty | $49.99 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
35855 | 1.0 | True | 08 25, 2017 | AV2RWORXTFRJU | B01H353HUY | NaN | Gapeachmama | Did nothing | One Star | 1503619200 | … | [] | 40,994 in Beauty & Personal Care ( | [B00UZFYSTY, B01N9PTAFL, B01N0Z13T1, B01LYMSEH… | {‘ Product Dimensions: ‘: ‘2 x 2 x 5…. | Luxury Beauty | $58.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
35856 | 5.0 | False | 07 8, 2017 | A22S7D0LP8GRDH | B01H353HUY | NaN | Jacob and Kiki Hantla | I love the Oribe bright blonde radiance spray…. | No more brass! | 1499472000 | … | [] | 40,994 in Beauty & Personal Care ( | [B00UZFYSTY, B01N9PTAFL, B01N0Z13T1, B01LYMSEH… | {‘ Product Dimensions: ‘: ‘2 x 2 x 5…. | Luxury Beauty | $58.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
35857 | 5.0 | True | 07 9, 2018 | AAF5D1LTFGB7L | B01HGSJPMW | NaN | Libby Johnson | I love all of the Elemis products. | Five Stars | 1531094400 | … | [] | 13,211 in Beauty & Personal Care ( | [B0714LK2WT, B078K2QSTR, B00175W3HK, B00DZP5SJ… | {‘ Product Dimensions: ‘: ‘2.2 x 1.5 … | Luxury Beauty | $55.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… |
35858 rows × 30 columns
Filtering product type
# keywords filtering function
def filter_dataset_by_keywords(dataframe, keywords, columns):
keywords = keywords.lower().split()
mask = []
for _, row in dataframe.iterrows():
keyword_found = False
for column in columns:
text = str(row[column]).lower()
if all(keyword in text for keyword in keywords):
keyword_found = True
break
mask.append(keyword_found)
return dataframe[mask]
#Change the ‘hand cream’ with any product name you want to search
apply the function¶
###################### change the ‘hand cream’ to try differet product!!##################
filtered_df = filter_dataset_by_keywords(merged_df, ”, [‘description’, ‘title’])¶
filtered_df = filter_dataset_by_keywords(merged_df, ‘shampoo’, [‘description’,’title’])
check how many rows are there¶
for instance: (660, 37) means there are 660 products with the name in it¶
print(filtered_df.shape)
beauty_products = [ ‘Hand Crme’, ‘hand cream’, ‘lipstick’, ‘eyeliner’, ‘mascara’, ‘foundation’, ‘moisturizer’, ‘face mask’, ‘nail polish’, ‘shampoo’, ‘conditioner’, ‘eyeshadow’, ‘blush’, ‘concealer’, ‘bronzer’, ‘highlighter’, ‘primer’, ‘makeup remover’, ‘face wash’, ‘face serum’, ‘body lotion’, ‘body wash’, ‘sunscreen’, ‘hair mask’, ‘hair serum’, ‘hair spray’, ‘hair oil’, ‘hair gel’, ‘hair mousse’, ‘hair color’, ‘hair dye’, ‘perfume’, ‘cologne’, ‘fragrance’, ‘deodorant’, ‘antiperspirant’, ‘bath salts’, ‘bath bombs’, ‘body scrub’, ‘exfoliator’, ‘toner’, ‘face oil’, ‘eye cream’, ‘lip balm’, ‘lip gloss’, ‘lip liner’, ‘makeup brushes’, ‘makeup sponge’, ‘beauty tools’, ‘tweezers’, ‘eyelash curler’, ‘nail clippers’, ‘nail files’, ‘nail care’, ‘cuticle care’, ‘beard oil’, ‘beard balm’, ‘beard comb’, ‘shaving cream’, ‘shaving soap’, ‘razor’, ‘aftershave’, ‘hair removal’, ‘waxing’, ‘tanning lotion’, ‘tanning spray’, ‘teeth whitening’, ‘toothpaste’, ‘mouthwash’, ‘dental floss’, ‘skin care sets’, ‘makeup sets’, ‘hair care sets’, ‘fragrance sets’, ‘bath & body sets’ ]
create an empty dataframe to store the product name and the number of reviews¶
product_review_counts = pd.DataFrame(columns=[‘product’, ‘review_count’])
for product in beauty_products: filtered_df = filter_dataset_by_keywords(merged_df, product, [‘description’, ‘title’]) product_review_counts = product_review_counts.append({‘product’: product, ‘review_count’: len(filtered_df)}, ignore_index=True)
sort the dataframe in descending order of the number of reviews¶
product_review_counts = product_review_counts.sort_values(by=’review_count’, ascending=False)
product_review_counts
# Export the list
# product_review_counts.to_csv('beauty_product_counts.csv', index=False)
Test filter the data
# change keywords here
merged_df = filter_dataset_by_keywords(merged_df, 'foundation', ['title'])
merged_df
overall | verified | reviewTime | reviewerID | asin | style | reviewerName | reviewText | summary | unixReviewTime | … | feature | rank | also_view | details | main_cat | similar_item | date | price | imageURL | imageURLHighRes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
718 | 3.0 | True | 04 9, 2018 | A1KSC91G9AIY2Z | B00014GT8W | {‘Color:’: ‘ 30W Yellow Beige: For light skin … | RYW | Good foundation. High coverage but be prepared… | Good coverage. Requires patient effort to blen… | 1523232000 | … | [] | 25,522 in Beauty & Personal Care ( | [B07J6SWRND, B076LW35CN, B0749Z1PSS, B0748G28V… | {‘ Product Dimensions: ‘: ‘2.3 x 2.3 … | Luxury Beauty | $39.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
719 | 3.0 | True | 04 9, 2018 | A1KSC91G9AIY2Z | B00014GT8W | {‘Color:’: ‘ 30W Yellow Beige: For light skin … | RYW | Good foundation. High coverage but be prepared… | Good coverage. Requires patient effort to blen… | 1523232000 | … | [] | 25,522 in Beauty & Personal Care ( | [B07J6SWRND, B076LW35CN, B0749Z1PSS, B0748G28V… | {‘ Product Dimensions: ‘: ‘2.3 x 2.3 … | Luxury Beauty | $39.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
720 | 4.0 | False | 03 13, 2018 | A1WKQ94M45D8MG | B00014GT8W | {‘Color:’: ‘ 35W Warm Beige: For medium skin t… | Denise Crawford | I have light skin with some imperfections. As … | Full coverage, | 1520899200 | … | [] | 25,522 in Beauty & Personal Care ( | [B07J6SWRND, B076LW35CN, B0749Z1PSS, B0748G28V… | {‘ Product Dimensions: ‘: ‘2.3 x 2.3 … | Luxury Beauty | $39.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
721 | 4.0 | False | 03 13, 2018 | A1WKQ94M45D8MG | B00014GT8W | {‘Color:’: ‘ 35W Warm Beige: For medium skin t… | Denise Crawford | I have light skin with some imperfections. As … | Full coverage, | 1520899200 | … | [] | 25,522 in Beauty & Personal Care ( | [B07J6SWRND, B076LW35CN, B0749Z1PSS, B0748G28V… | {‘ Product Dimensions: ‘: ‘2.3 x 2.3 … | Luxury Beauty | $39.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
722 | 4.0 | False | 02 28, 2018 | AFICF7DKHTQ87 | B00014GT8W | {‘Color:’: ‘ 40W Caramel Beige: For medium ski… | Amazon Customer | My wife likes the smoothness of the product as… | SPF30 makeup | 1519776000 | … | [] | 25,522 in Beauty & Personal Care ( | [B07J6SWRND, B076LW35CN, B0749Z1PSS, B0748G28V… | {‘ Product Dimensions: ‘: ‘2.3 x 2.3 … | Luxury Beauty | $39.00 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
35648 | 5.0 | False | 08 1, 2016 | AAZ5OJ2OOJ2DK | B01BMBL56S | {‘Color:’: ‘ Natural’} | Krisilou | I’ve tried many Bliss skin care products in th… | Smooth and even coverage | 1470009600 | … | [] | 666,773 in Beauty & Personal Care ( | [B01BTOD520, B01MXRO1ZF, B01BTO8LEW, B01BTOC16… | {‘ Item Weight: ‘: ‘2.88 ounces’, ‘Sh… | Luxury Beauty | $34.99 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
35649 | 3.0 | False | 07 24, 2016 | A1ZO9D554VQO9F | B01BMBL56S | {‘Color:’: ‘ Natural’} | Jadecat | Well, I received the “Natural” color of this p… | Not what I expected, probably best if your ski… | 1469318400 | … | [] | 666,773 in Beauty & Personal Care ( | [B01BTOD520, B01MXRO1ZF, B01BTO8LEW, B01BTOC16… | {‘ Item Weight: ‘: ‘2.88 ounces’, ‘Sh… | Luxury Beauty | $34.99 | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | ||
35664 | 3.0 | False | 08 9, 2016 | A3JLOIXFM75QNV | B01BMBNUQQ | {‘Color:’: ‘ Buff’} | Valerya Couto | I’m not a huge fan of foundation and use it sp… | Mediocre for the price | 1470700800 | … | [] | 975,868 in Beauty & Personal Care ( | [B01BTOD520, B01BTO8836, B01MXRO1ZF, B01BTOBHI… | {‘Shipping Weight:’: ‘4 ounces’, ‘ASIN:’: ‘B01… | Luxury Beauty | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | |||
35665 | 3.0 | False | 07 18, 2016 | A9P07NJ7UV0M | B01BMBNUQQ | {‘Color:’: ‘ Buff’} | Meryl K. Evans, Digital Marketing Pro | I could not figure out how to use the dropper … | Nice color and lightness, but bottle design ma… | 1468800000 | … | [] | 975,868 in Beauty & Personal Care ( | [B01BTOD520, B01BTO8836, B01MXRO1ZF, B01BTOBHI… | {‘Shipping Weight:’: ‘4 ounces’, ‘ASIN:’: ‘B01… | Luxury Beauty | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… | |||
35666 | 3.0 | False | 07 17, 2016 | A2X2WTEVCZ5L8N | B01BMBNUQQ | {‘Color:’: ‘ Buff’} | Sandy Kay | I thought from the name and the description of… | Foundation and sun protection combined | 1468713600 | … | [] | 975,868 in Beauty & Personal Care ( | [B01BTOD520, B01BTO8836, B01MXRO1ZF, B01BTOBHI… | {‘Shipping Weight:’: ‘4 ounces’, ‘ASIN:’: ‘B01… | Luxury Beauty | [https://images-na.ssl-images-amazon.com/image… | [https://images-na.ssl-images-amazon.com/image… |
1741 rows × 30 columns
import pandas as pd
import matplotlib.pyplot as plt
import re
def extract_volume(details):
# Convert the details dictionary to a string
details_str = str(details)
# Extract volume in ounces
ounces_pattern = r"(\d+(\.\d+)?)\s*ounces?"
ounces_match = re.search(ounces_pattern, details_str, flags=re.IGNORECASE)
if ounces_match:
volume = float(ounces_match.group(1))
return volume
# If no match found, return None
return None
merged_df['volume'] = merged_df['details'].apply(extract_volume)
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/2753941139.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['volume'] = merged_df['details'].apply(extract_volume)
merged_df['volume']
718 2.08 719 2.08 720 2.08 721 2.08 722 2.08 ... 35648 2.88 35649 2.88 35664 4.00 35665 4.00 35666 4.00 Name: volume, Length: 1741, dtype: float64
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Remove rows with missing volume or price information
cleaned_df = merged_df.dropna(subset=['volume', 'price'])
# Calculate the frequency of each (volume, price) pair
frequency = cleaned_df.groupby(['volume', 'price']).size().reset_index(name='frequency')
# Create a scatter plot with point sizes based on frequency
plt.scatter(frequency['volume'], frequency['price'], s=frequency['frequency']*10, alpha=0.5)
# Set the labels for x and y axes
plt.xlabel('Volume (ounces)')
plt.ylabel('Price ($)')
# Set the title of the plot
plt.title('Scatter Plot of Volume vs. Price with Frequency Indicated by Point Size')
# Show the plot
plt.show()
import matplotlib.pyplot as plt
# Remove rows with missing volume or price information
cleaned_df = merged_df.dropna(subset=['volume', 'price'])
# Create a scatter plot
plt.scatter(cleaned_df['volume'], cleaned_df['price'])
# Set the labels for x and y axes
plt.xlabel('Volume (ounces)')
plt.ylabel('Price ($)')
# Set the title of the plot
plt.title('Scatter Plot of Volume vs. Price')
# Show the plot
plt.show()
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Assuming the dataset is already in a pandas DataFrame named merged_df
# If not, load the dataset into a DataFrame first
# merged_df = pd.read_csv('your_dataset.csv')
# Clean the price data by removing the '$' sign and converting the column to a numeric type
merged_df['price'] = merged_df['price'].str.replace('$', '')
merged_df['price'] = pd.to_numeric(merged_df['price'], errors='coerce')
# Drop rows with NaN values in the 'price' column (Optional)
merged_df = merged_df.dropna(subset=['price'])
# Create a histogram for the distribution of prices
sns.histplot(data=merged_df, x='price', bins=30, kde=True)
# Set the chart title and labels
plt.title('Distribution of Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
# Display the chart
plt.show()
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/1696552794.py:10: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True. merged_df['price'] = merged_df['price'].str.replace('$', '') /var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/1696552794.py:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['price'] = merged_df['price'].str.replace('$', '') /var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/1696552794.py:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['price'] = pd.to_numeric(merged_df['price'], errors='coerce')
Import the packages
# Install the packages, remove if not necessary!
!pip install textblob
!pip install scikit-learn
!pip install gensim
!pip install nltk
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
Requirement already satisfied: textblob in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (0.17.1) Requirement already satisfied: nltk>=3.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from textblob) (3.7) Requirement already satisfied: joblib in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk>=3.1->textblob) (1.2.0) Requirement already satisfied: regex>=2021.8.3 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk>=3.1->textblob) (2022.3.15) Requirement already satisfied: click in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk>=3.1->textblob) (8.0.4) Requirement already satisfied: tqdm in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk>=3.1->textblob) (4.64.0) [notice] A new release of pip is available: 23.0.1 -> 23.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: scikit-learn in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (1.2.2) Requirement already satisfied: joblib>=1.1.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from scikit-learn) (1.2.0) Requirement already satisfied: numpy>=1.17.3 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from scikit-learn) (1.21.5) Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from scikit-learn) (2.2.0) Requirement already satisfied: scipy>=1.3.2 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from scikit-learn) (1.7.3) [notice] A new release of pip is available: 23.0.1 -> 23.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: gensim in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (4.1.2) Requirement already satisfied: numpy>=1.17.0 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from gensim) (1.21.5) Requirement already satisfied: scipy>=0.18.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from gensim) (1.7.3) Requirement already satisfied: smart-open>=1.8.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from gensim) (5.1.0) [notice] A new release of pip is available: 23.0.1 -> 23.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: nltk in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (3.7) Requirement already satisfied: tqdm in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk) (4.64.0) Requirement already satisfied: regex>=2021.8.3 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk) (2022.3.15) Requirement already satisfied: click in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk) (8.0.4) Requirement already satisfied: joblib in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from nltk) (1.2.0) [notice] A new release of pip is available: 23.0.1 -> 23.1 [notice] To update, run: pip install --upgrade pip
[nltk_data] Downloading package stopwords to [nltk_data] /Users/lailongying/nltk_data... [nltk_data] Package stopwords is already up-to-date! [nltk_data] Downloading package punkt to [nltk_data] /Users/lailongying/nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package wordnet to [nltk_data] /Users/lailongying/nltk_data... [nltk_data] Package wordnet is already up-to-date!
True
import re
import pandas as pd
from textblob import TextBlob
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from gensim.corpora import Dictionary
from gensim.models import Phrases
from gensim.models.phrases import Phraser
from gensim.models import LdaModel
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
Preprocessing
# 3
ENGLISH_STOP_WORDS = stopwords.words('english')
def preprocess(text):
text = str(text) # Convert non-string values to strings
text = text.lower()
text = re.sub(r'[^\w\s]', '', text)
text = re.sub(r'\d+', '', text)
text = " ".join(text.split())
text = text.split()
text = [x for x in text if x not in ENGLISH_STOP_WORDS]
# remove stop-words
text = [x for x in text if x not in ['work well', 'well','great','good','like','product','look','make','realli']] # remove topic related stop-words
text = " ".join(text)
# stemmer_ps = PorterStemmer()
#text = [stemmer_ps.stem(word) for word in text.split()]
#text = " ".join(text)
#lemmatizer = WordNetLemmatizer()
#text = [lemmatizer.lemmatize(word) for word in text.split()]
#text = " ".join(text)
return text
import nltk
nltk.download('omw-1.4')
[nltk_data] Downloading package omw-1.4 to [nltk_data] /Users/lailongying/nltk_data... [nltk_data] Package omw-1.4 is already up-to-date!
True
# Preprocess the text data
merged_df['cleaned_reviewText'] = merged_df['reviewText'].apply(preprocess)
merged_df['cleaned_summary'] = merged_df['summary'].apply(preprocess)
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/2423404023.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['cleaned_reviewText'] = merged_df['reviewText'].apply(preprocess) /var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/2423404023.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['cleaned_summary'] = merged_df['summary'].apply(preprocess)
Sentiment Analysis
# Sentiment Analysis
merged_df['reviewText_sentiment'] = merged_df['cleaned_reviewText'].apply(lambda x: TextBlob(x).sentiment.polarity)
merged_df['summary_sentiment'] = merged_df['cleaned_summary'].apply(lambda x: TextBlob(x).sentiment.polarity)
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/3911972301.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['reviewText_sentiment'] = merged_df['cleaned_reviewText'].apply(lambda x: TextBlob(x).sentiment.polarity) /var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/3911972301.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['summary_sentiment'] = merged_df['cleaned_summary'].apply(lambda x: TextBlob(x).sentiment.polarity)
Tokenization
# Tokenization
merged_df['tokens_reviewText'] = merged_df['cleaned_reviewText'].apply(word_tokenize)
merged_df['tokens_summary'] = merged_df['cleaned_summary'].apply(word_tokenize)
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/1506086681.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['tokens_reviewText'] = merged_df['cleaned_reviewText'].apply(word_tokenize) /var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/1506086681.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['tokens_summary'] = merged_df['cleaned_summary'].apply(word_tokenize)
N-grams
# Bigrams and Trigrams
bigram_phraser = Phrases(merged_df['tokens_reviewText'], min_count=5)
trigram_phraser = Phrases(bigram_phraser[merged_df['tokens_reviewText']], min_count=5)
bigrams = [bigram_phraser[line] for line in merged_df['tokens_reviewText']]
trigrams = [trigram_phraser[bigram_phraser[line]] for line in merged_df['tokens_reviewText']]
LDA modeling
# Topic Modeling with LDA
dictionary = Dictionary(trigrams)
Modify this part! for low and high end in the LDA model
# modify the parameter
dictionary.filter_extremes(no_below=10, no_above=0.1)
corpus = [dictionary.doc2bow(text) for text in trigrams]
lda_model = LdaModel(corpus=corpus, id2word=dictionary, num_topics=10, passes=15)
merged_df['topic_distribution'] = merged_df['tokens_reviewText'].apply(lambda x: lda_model.get_document_topics(dictionary.doc2bow(x)))
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/2219660351.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy merged_df['topic_distribution'] = merged_df['tokens_reviewText'].apply(lambda x: lda_model.get_document_topics(dictionary.doc2bow(x)))
# Print the topics
for topic_idx, topic in lda_model.print_topics(num_topics=10, num_words= 10):
print(f"Topic #{topic_idx + 1}:")
print(topic)
print("\n")
Topic #1: 0.014*"setting_powder" + 0.013*"sunscreen" + 0.012*"smooth_liquid_camo_medium" + 0.011*"without" + 0.009*"want" + 0.008*"since" + 0.008*"line" + 0.008*"yet" + 0.008*"spf" + 0.008*"stick" Topic #2: 0.014*"sponge" + 0.014*"blends" + 0.009*"applying" + 0.009*"overall" + 0.007*"little_goes_long_way" + 0.007*"lighter" + 0.007*"line" + 0.007*"quality" + 0.006*"job" + 0.006*"imperfections" Topic #3: 0.013*"brand" + 0.009*"easy" + 0.008*"comes" + 0.008*"covered" + 0.008*"goes" + 0.007*"stick" + 0.007*"felt" + 0.007*"redness" + 0.007*"setting_powder" + 0.006*"looking" Topic #4: 0.009*"pretty" + 0.009*"give" + 0.008*"tried" + 0.008*"put" + 0.008*"could" + 0.008*"makes" + 0.007*"find" + 0.007*"may" + 0.006*"lines" + 0.006*"foundations" Topic #5: 0.012*"concealer" + 0.009*"stick" + 0.008*"way" + 0.007*"definitely" + 0.006*"youre" + 0.006*"colors" + 0.006*"hard" + 0.006*"body" + 0.006*"take" + 0.006*"natural" Topic #6: 0.010*"colors" + 0.008*"right" + 0.008*"darker" + 0.008*"match" + 0.008*"said" + 0.008*"natural" + 0.007*"shades" + 0.007*"lighter" + 0.007*"find" + 0.006*"way" Topic #7: 0.013*"summer" + 0.011*"cream" + 0.011*"medium" + 0.010*"looked" + 0.009*"say" + 0.009*"tan" + 0.008*"darker" + 0.008*"easily" + 0.007*"happy" + 0.007*"pretty" Topic #8: 0.011*"applying" + 0.010*"imperfections" + 0.010*"sponge" + 0.010*"oily" + 0.010*"wear" + 0.009*"want" + 0.009*"prefer" + 0.008*"foundations" + 0.008*"made" + 0.007*"finish" Topic #9: 0.010*"easily" + 0.010*"sponge" + 0.009*"dry" + 0.008*"liquid_foundation" + 0.007*"cream" + 0.007*"feels" + 0.007*"way" + 0.006*"spf" + 0.006*"seems" + 0.006*"pretty" Topic #10: 0.010*"easy_apply" + 0.008*"sunscreen" + 0.007*"since" + 0.007*"find" + 0.007*"easily" + 0.006*"without" + 0.006*"needed" + 0.006*"lot" + 0.006*"set" + 0.005*"spf"
# Create a DataFrame with topic distribution as columns
topic_df = pd.DataFrame([{f'topic_{i}': topic_prob for i, topic_prob in row} for row in merged_df['topic_distribution']])
topic_df.fillna(0, inplace=True)
LDA visualization
!pip install pyLDAvis
!pip install gensim
Collecting pyLDAvis Downloading pyLDAvis-3.4.0-py3-none-any.whl (2.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 5.1 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: pandas>=1.3.4 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (1.4.2) Requirement already satisfied: numexpr in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (2.8.1) Requirement already satisfied: setuptools in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (61.2.0) Collecting funcy Downloading funcy-2.0-py2.py3-none-any.whl (30 kB) Requirement already satisfied: joblib>=1.2.0 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (1.2.0) Requirement already satisfied: gensim in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (4.1.2) Requirement already satisfied: scikit-learn>=1.0.0 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (1.2.2) Requirement already satisfied: jinja2 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (2.11.3) Collecting numpy>=1.22.0 Downloading numpy-1.24.3-cp39-cp39-macosx_10_9_x86_64.whl (19.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.8/19.8 MB 4.1 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: scipy in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pyLDAvis) (1.7.3) Requirement already satisfied: python-dateutil>=2.8.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pandas>=1.3.4->pyLDAvis) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from pandas>=1.3.4->pyLDAvis) (2021.3) Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from scikit-learn>=1.0.0->pyLDAvis) (2.2.0) Downloading numpy-1.22.4-cp39-cp39-macosx_10_15_x86_64.whl (17.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.7/17.7 MB 2.3 MB/s eta 0:00:0000:0100:01m Requirement already satisfied: smart-open>=1.8.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from gensim->pyLDAvis) (5.1.0) Requirement already satisfied: MarkupSafe>=0.23 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from jinja2->pyLDAvis) (2.0.1) Requirement already satisfied: packaging in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from numexpr->pyLDAvis) (21.3) Requirement already satisfied: six>=1.5 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas>=1.3.4->pyLDAvis) (1.16.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from packaging->numexpr->pyLDAvis) (3.0.4) Installing collected packages: funcy, numpy, pyLDAvis Attempting uninstall: numpy Found existing installation: numpy 1.21.5 Uninstalling numpy-1.21.5: Successfully uninstalled numpy-1.21.5 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. daal4py 2021.5.0 requires daal==2021.4.0, which is not installed. numba 0.55.1 requires numpy<1.22,>=1.18, but you have numpy 1.22.4 which is incompatible. Successfully installed funcy-2.0 numpy-1.22.4 pyLDAvis-3.4.0 [notice] A new release of pip is available: 23.0.1 -> 23.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: gensim in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (4.1.2) Requirement already satisfied: smart-open>=1.8.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from gensim) (5.1.0) Requirement already satisfied: numpy>=1.17.0 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from gensim) (1.22.4) Requirement already satisfied: scipy>=0.18.1 in /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages (from gensim) (1.7.3) [notice] A new release of pip is available: 23.0.1 -> 23.1 [notice] To update, run: pip install --upgrade pip
import gensim
import gensim.corpora as corpora
from gensim.models import LdaModel
from gensim.utils import simple_preprocess
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
pyLDAvis.enable_notebook()
corpus = [dictionary.doc2bow(text) for text in trigrams]
lda_visualization = gensimvis.prepare(lda_model, corpus, dictionary=lda_model.id2word)
pyLDAvis.display(lda_visualization)
/Users/lailongying/opt/anaconda3/lib/python3.9/site-packages/pyLDAvis/_prepare.py:243: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only. default_term_info = default_term_info.sort_values(
Top rating analysis
# top rating df
high_rated_df = merged_df[merged_df['overall'] >= 4]
high_rated_df.shape
(1258, 38)
# most common n-grams
from collections import Counter
def get_top_n_grams(tokenized_texts, n, top_n):
n_grams = Counter()
for text in tokenized_texts:
n_grams.update(zip(*[text[i:] for i in range(n)]))
return n_grams.most_common(top_n)
# top 20 n-grams
top_words_high_rated = get_top_n_grams(high_rated_df['tokens_reviewText'], 1, 20)
top_bigrams_high_rated = get_top_n_grams(high_rated_df['tokens_reviewText'], 2, 20)
top_trigrams_high_rated = get_top_n_grams(high_rated_df['tokens_reviewText'], 3, 20)
LDA model for high rated products
# Bigrams and Trigrams
bigram_phraser = Phrases(high_rated_df['tokens_reviewText'], min_count=5)
trigram_phraser = Phrases(bigram_phraser[high_rated_df['tokens_reviewText']], min_count=5)
bigrams_high = [bigram_phraser[line] for line in high_rated_df['tokens_reviewText']]
trigrams_high = [trigram_phraser[bigram_phraser[line]] for line in high_rated_df['tokens_reviewText']]
# Topic Modeling with LDA
dictionary_high = Dictionary(trigrams_high)
# modify the parameter
dictionary_high.filter_extremes(no_below=10, no_above=0.1)
corpus = [dictionary_high.doc2bow(text) for text in trigrams]
lda_model_high = LdaModel(corpus=corpus, id2word=dictionary_high, num_topics=10, passes=15)
high_rated_df['topic_distribution'] = high_rated_df['tokens_reviewText'].apply(lambda x: lda_model_high.get_document_topics(dictionary_high.doc2bow(x)))
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/3522050477.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy high_rated_df['topic_distribution'] = high_rated_df['tokens_reviewText'].apply(lambda x: lda_model_high.get_document_topics(dictionary_high.doc2bow(x)))
# Print the topics
for topic_idx, topic in lda_model_high.print_topics(num_topics=10, num_words= 10):
print(f"Topic #{topic_idx + 1}:")
print(topic)
print("\n")
Topic #1: 0.010*"blemishes" + 0.010*"natural" + 0.009*"blends" + 0.008*"brand" + 0.008*"try" + 0.008*"lot" + 0.007*"concealer" + 0.007*"isnt" + 0.007*"slightly" + 0.007*"matte" Topic #2: 0.012*"stick" + 0.010*"online" + 0.010*"pretty" + 0.009*"setting_powder" + 0.009*"ive" + 0.009*"small" + 0.008*"concealer" + 0.008*"body" + 0.008*"still" + 0.008*"hard" Topic #3: 0.013*"stick" + 0.012*"blend" + 0.012*"professional" + 0.012*"smooth_liquid_camo_medium" + 0.012*"job" + 0.011*"since" + 0.009*"darker" + 0.009*"scar" + 0.009*"spf" + 0.009*"water" Topic #4: 0.015*"didnt" + 0.012*"blend" + 0.010*"line" + 0.009*"spf" + 0.009*"made" + 0.008*"imperfections" + 0.007*"enough" + 0.007*"liquid_foundation" + 0.007*"never" + 0.007*"got" Topic #5: 0.015*"want" + 0.013*"may" + 0.011*"covering" + 0.010*"setting_powder" + 0.009*"go" + 0.008*"try" + 0.008*"said" + 0.008*"see" + 0.007*"tone" + 0.007*"colors" Topic #6: 0.008*"see" + 0.008*"go" + 0.008*"medium" + 0.008*"different" + 0.007*"find" + 0.007*"could" + 0.006*"primer" + 0.006*"ive" + 0.006*"price" + 0.006*"still" Topic #7: 0.016*"sunscreen" + 0.013*"way" + 0.011*"looks" + 0.011*"beauty_blender" + 0.011*"put" + 0.010*"pretty" + 0.009*"blend" + 0.008*"creme" + 0.008*"areas" + 0.008*"found" Topic #8: 0.014*"find" + 0.011*"blend" + 0.008*"natural" + 0.008*"makes" + 0.008*"old" + 0.008*"looks" + 0.007*"yellow" + 0.007*"time" + 0.007*"oily_skin" + 0.007*"way" Topic #9: 0.010*"feels" + 0.009*"worked" + 0.009*"sunscreen" + 0.009*"felt" + 0.009*"easily" + 0.009*"natural" + 0.008*"easy" + 0.008*"stuff" + 0.008*"blends" + 0.008*"sensitive_skin" Topic #10: 0.011*"didnt" + 0.008*"darker" + 0.008*"cream" + 0.008*"applying" + 0.008*"wear" + 0.008*"dry" + 0.008*"looked" + 0.008*"foundations" + 0.007*"easily" + 0.007*"pretty"
LDA visualization for high rated products
pyLDAvis.enable_notebook()
corpus = [dictionary_high.doc2bow(text) for text in trigrams]
lda_visualization = gensimvis.prepare(lda_model_high, corpus, dictionary=lda_model_high.id2word)
pyLDAvis.display(lda_visualization)
/Users/lailongying/opt/anaconda3/lib/python3.9/site-packages/pyLDAvis/_prepare.py:243: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only. default_term_info = default_term_info.sort_values(
Low rating analysis
low_rated_df = merged_df[merged_df['overall'] < 4]
low_rated_df.shape
(317, 38)
# most common n-grams
from collections import Counter
def get_top_n_grams(tokenized_texts, n, top_n):
n_grams = Counter()
for text in tokenized_texts:
n_grams.update(zip(*[text[i:] for i in range(n)]))
return n_grams.most_common(top_n)
# top 20 n-grams
top_words_low_rated = get_top_n_grams(low_rated_df['tokens_reviewText'], 1, 20)
top_bigrams_low_rated = get_top_n_grams(low_rated_df['tokens_reviewText'], 2, 20)
top_trigrams_low_rated = get_top_n_grams(low_rated_df['tokens_reviewText'], 3, 20)
LDA model for low rated products
# Bigrams and Trigrams
bigram_phraser = Phrases(low_rated_df['tokens_reviewText'], min_count=5)
trigram_phraser = Phrases(bigram_phraser[low_rated_df['tokens_reviewText']], min_count=5)
bigrams_low = [bigram_phraser[line] for line in low_rated_df['tokens_reviewText']]
trigrams_low = [trigram_phraser[bigram_phraser[line]] for line in low_rated_df['tokens_reviewText']]
# Topic Modeling with LDA
dictionary_low = Dictionary(trigrams_low)
# modify the parameter
dictionary_low.filter_extremes(no_below=10, no_above=0.1)
corpus = [dictionary_low.doc2bow(text) for text in trigrams]
lda_model_low = LdaModel(corpus=corpus, id2word=dictionary_low, num_topics=10, passes=15)
low_rated_df['topic_distribution'] = low_rated_df['tokens_reviewText'].apply(lambda x: lda_model_low.get_document_topics(dictionary_low.doc2bow(x)))
/var/folders/0d/rtr5srz94xxb4_5xm7x2vqbh0000gn/T/ipykernel_66253/4039827119.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy low_rated_df['topic_distribution'] = low_rated_df['tokens_reviewText'].apply(lambda x: lda_model_low.get_document_topics(dictionary_low.doc2bow(x)))
# Print the topics
for topic_idx, topic in lda_model_low.print_topics(num_topics=10, num_words= 10):
print(f"Topic #{topic_idx + 1}:")
print(topic)
print("\n")
Topic #1: 0.054*"sunscreen" + 0.034*"want" + 0.028*"set" + 0.026*"concealer" + 0.024*"recommend" + 0.022*"full_coverage" + 0.020*"pretty" + 0.019*"sure" + 0.019*"love" + 0.018*"going" Topic #2: 0.030*"wear" + 0.022*"thats" + 0.021*"enough" + 0.020*"cream" + 0.018*"spf" + 0.017*"made" + 0.017*"give" + 0.015*"complexion" + 0.015*"actually" + 0.015*"stays" Topic #3: 0.027*"best" + 0.025*"full_coverage" + 0.024*"without" + 0.023*"job" + 0.022*"sponge" + 0.020*"ive" + 0.019*"blends" + 0.019*"covers" + 0.018*"spots" + 0.016*"almost" Topic #4: 0.061*"works" + 0.051*"stick" + 0.037*"said" + 0.027*"lot" + 0.024*"liquid_foundation" + 0.022*"tone" + 0.022*"goes" + 0.018*"easy_apply" + 0.017*"hand" + 0.017*"area" Topic #5: 0.037*"best" + 0.035*"covers" + 0.034*"cream" + 0.029*"pretty" + 0.027*"full_coverage" + 0.023*"covering" + 0.021*"dermablend_products" + 0.021*"want" + 0.019*"imperfections" + 0.017*"online" Topic #6: 0.032*"different" + 0.031*"primer" + 0.025*"foundations" + 0.020*"take" + 0.020*"oily" + 0.019*"full_coverage" + 0.017*"brush" + 0.017*"colors" + 0.017*"looking" + 0.015*"applying" Topic #7: 0.041*"setting_powder" + 0.033*"beige" + 0.027*"concealer" + 0.021*"cover_creme" + 0.020*"easy_apply" + 0.020*"covers" + 0.019*"colors" + 0.018*"cream" + 0.018*"many" + 0.017*"spf" Topic #8: 0.047*"love" + 0.029*"brand" + 0.021*"line" + 0.021*"red" + 0.017*"never" + 0.016*"got" + 0.016*"fine" + 0.015*"pretty" + 0.015*"without" + 0.014*"always" Topic #9: 0.131*"brush" + 0.044*"sponge" + 0.029*"compact" + 0.018*"quality" + 0.017*"lot" + 0.017*"foundations" + 0.016*"want" + 0.013*"say" + 0.013*"thought" + 0.012*"liquid" Topic #10: 0.023*"feels" + 0.023*"sensitive_skin" + 0.020*"blends" + 0.018*"easy" + 0.016*"covers" + 0.016*"water" + 0.015*"greasy" + 0.015*"overall" + 0.015*"texture" + 0.014*"matte"
LDA visualization for low rated products
pyLDAvis.enable_notebook()
corpus = [dictionary_low.doc2bow(text) for text in trigrams]
lda_visualization = gensimvis.prepare(lda_model_low, corpus, dictionary=lda_model_low.id2word)
pyLDAvis.display(lda_visualization)
/Users/lailongying/opt/anaconda3/lib/python3.9/site-packages/pyLDAvis/_prepare.py:243: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only. default_term_info = default_term_info.sort_values(
Conclusion
# lda analysis
def display_top_words(lda_model, num_topics, num_words):
for topic_idx, topic in lda_model.print_topics(num_topics=num_topics, num_words=num_words):
print(f"Topic #{topic_idx + 1}:")
print(topic)
print("\n")
display_top_words(lda_model, 10, 10)
Topic #1: 0.014*"setting_powder" + 0.013*"sunscreen" + 0.012*"smooth_liquid_camo_medium" + 0.011*"without" + 0.009*"want" + 0.008*"since" + 0.008*"line" + 0.008*"yet" + 0.008*"spf" + 0.008*"stick" Topic #2: 0.014*"sponge" + 0.014*"blends" + 0.009*"applying" + 0.009*"overall" + 0.007*"little_goes_long_way" + 0.007*"lighter" + 0.007*"line" + 0.007*"quality" + 0.006*"job" + 0.006*"imperfections" Topic #3: 0.013*"brand" + 0.009*"easy" + 0.008*"comes" + 0.008*"covered" + 0.008*"goes" + 0.007*"stick" + 0.007*"felt" + 0.007*"redness" + 0.007*"setting_powder" + 0.006*"looking" Topic #4: 0.009*"pretty" + 0.009*"give" + 0.008*"tried" + 0.008*"put" + 0.008*"could" + 0.008*"makes" + 0.007*"find" + 0.007*"may" + 0.006*"lines" + 0.006*"foundations" Topic #5: 0.012*"concealer" + 0.009*"stick" + 0.008*"way" + 0.007*"definitely" + 0.006*"youre" + 0.006*"colors" + 0.006*"hard" + 0.006*"body" + 0.006*"take" + 0.006*"natural" Topic #6: 0.010*"colors" + 0.008*"right" + 0.008*"darker" + 0.008*"match" + 0.008*"said" + 0.008*"natural" + 0.007*"shades" + 0.007*"lighter" + 0.007*"find" + 0.006*"way" Topic #7: 0.013*"summer" + 0.011*"cream" + 0.011*"medium" + 0.010*"looked" + 0.009*"say" + 0.009*"tan" + 0.008*"darker" + 0.008*"easily" + 0.007*"happy" + 0.007*"pretty" Topic #8: 0.011*"applying" + 0.010*"imperfections" + 0.010*"sponge" + 0.010*"oily" + 0.010*"wear" + 0.009*"want" + 0.009*"prefer" + 0.008*"foundations" + 0.008*"made" + 0.007*"finish" Topic #9: 0.010*"easily" + 0.010*"sponge" + 0.009*"dry" + 0.008*"liquid_foundation" + 0.007*"cream" + 0.007*"feels" + 0.007*"way" + 0.006*"spf" + 0.006*"seems" + 0.006*"pretty" Topic #10: 0.010*"easy_apply" + 0.008*"sunscreen" + 0.007*"since" + 0.007*"find" + 0.007*"easily" + 0.006*"without" + 0.006*"needed" + 0.006*"lot" + 0.006*"set" + 0.005*"spf"
display_top_words(lda_model_high, 10, 10)
Topic #1: 0.010*"blemishes" + 0.010*"natural" + 0.009*"blends" + 0.008*"brand" + 0.008*"try" + 0.008*"lot" + 0.007*"concealer" + 0.007*"isnt" + 0.007*"slightly" + 0.007*"matte" Topic #2: 0.012*"stick" + 0.010*"online" + 0.010*"pretty" + 0.009*"setting_powder" + 0.009*"ive" + 0.009*"small" + 0.008*"concealer" + 0.008*"body" + 0.008*"still" + 0.008*"hard" Topic #3: 0.013*"stick" + 0.012*"blend" + 0.012*"professional" + 0.012*"smooth_liquid_camo_medium" + 0.012*"job" + 0.011*"since" + 0.009*"darker" + 0.009*"scar" + 0.009*"spf" + 0.009*"water" Topic #4: 0.015*"didnt" + 0.012*"blend" + 0.010*"line" + 0.009*"spf" + 0.009*"made" + 0.008*"imperfections" + 0.007*"enough" + 0.007*"liquid_foundation" + 0.007*"never" + 0.007*"got" Topic #5: 0.015*"want" + 0.013*"may" + 0.011*"covering" + 0.010*"setting_powder" + 0.009*"go" + 0.008*"try" + 0.008*"said" + 0.008*"see" + 0.007*"tone" + 0.007*"colors" Topic #6: 0.008*"see" + 0.008*"go" + 0.008*"medium" + 0.008*"different" + 0.007*"find" + 0.007*"could" + 0.006*"primer" + 0.006*"ive" + 0.006*"price" + 0.006*"still" Topic #7: 0.016*"sunscreen" + 0.013*"way" + 0.011*"looks" + 0.011*"beauty_blender" + 0.011*"put" + 0.010*"pretty" + 0.009*"blend" + 0.008*"creme" + 0.008*"areas" + 0.008*"found" Topic #8: 0.014*"find" + 0.011*"blend" + 0.008*"natural" + 0.008*"makes" + 0.008*"old" + 0.008*"looks" + 0.007*"yellow" + 0.007*"time" + 0.007*"oily_skin" + 0.007*"way" Topic #9: 0.010*"feels" + 0.009*"worked" + 0.009*"sunscreen" + 0.009*"felt" + 0.009*"easily" + 0.009*"natural" + 0.008*"easy" + 0.008*"stuff" + 0.008*"blends" + 0.008*"sensitive_skin" Topic #10: 0.011*"didnt" + 0.008*"darker" + 0.008*"cream" + 0.008*"applying" + 0.008*"wear" + 0.008*"dry" + 0.008*"looked" + 0.008*"foundations" + 0.007*"easily" + 0.007*"pretty"
display_top_words(lda_model_low, 10, 10)
Topic #1: 0.054*"sunscreen" + 0.034*"want" + 0.028*"set" + 0.026*"concealer" + 0.024*"recommend" + 0.022*"full_coverage" + 0.020*"pretty" + 0.019*"sure" + 0.019*"love" + 0.018*"going" Topic #2: 0.030*"wear" + 0.022*"thats" + 0.021*"enough" + 0.020*"cream" + 0.018*"spf" + 0.017*"made" + 0.017*"give" + 0.015*"complexion" + 0.015*"actually" + 0.015*"stays" Topic #3: 0.027*"best" + 0.025*"full_coverage" + 0.024*"without" + 0.023*"job" + 0.022*"sponge" + 0.020*"ive" + 0.019*"blends" + 0.019*"covers" + 0.018*"spots" + 0.016*"almost" Topic #4: 0.061*"works" + 0.051*"stick" + 0.037*"said" + 0.027*"lot" + 0.024*"liquid_foundation" + 0.022*"tone" + 0.022*"goes" + 0.018*"easy_apply" + 0.017*"hand" + 0.017*"area" Topic #5: 0.037*"best" + 0.035*"covers" + 0.034*"cream" + 0.029*"pretty" + 0.027*"full_coverage" + 0.023*"covering" + 0.021*"dermablend_products" + 0.021*"want" + 0.019*"imperfections" + 0.017*"online" Topic #6: 0.032*"different" + 0.031*"primer" + 0.025*"foundations" + 0.020*"take" + 0.020*"oily" + 0.019*"full_coverage" + 0.017*"brush" + 0.017*"colors" + 0.017*"looking" + 0.015*"applying" Topic #7: 0.041*"setting_powder" + 0.033*"beige" + 0.027*"concealer" + 0.021*"cover_creme" + 0.020*"easy_apply" + 0.020*"covers" + 0.019*"colors" + 0.018*"cream" + 0.018*"many" + 0.017*"spf" Topic #8: 0.047*"love" + 0.029*"brand" + 0.021*"line" + 0.021*"red" + 0.017*"never" + 0.016*"got" + 0.016*"fine" + 0.015*"pretty" + 0.015*"without" + 0.014*"always" Topic #9: 0.131*"brush" + 0.044*"sponge" + 0.029*"compact" + 0.018*"quality" + 0.017*"lot" + 0.017*"foundations" + 0.016*"want" + 0.013*"say" + 0.013*"thought" + 0.012*"liquid" Topic #10: 0.023*"feels" + 0.023*"sensitive_skin" + 0.020*"blends" + 0.018*"easy" + 0.016*"covers" + 0.016*"water" + 0.015*"greasy" + 0.015*"overall" + 0.015*"texture" + 0.014*"matte"
# sentiment analysis
rating_sentiment_summary = merged_df.groupby('overall').agg({'reviewText_sentiment': 'mean', 'summary_sentiment': 'mean'}).reset_index()
print(rating_sentiment_summary)
overall reviewText_sentiment summary_sentiment 0 1.0 0.023442 -0.147807 1 2.0 0.087539 0.059064 2 3.0 0.103638 0.069082 3 4.0 0.179245 0.192631 4 5.0 0.229236 0.312825
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
sns.barplot(data=rating_sentiment_summary, x='overall', y='reviewText_sentiment', ax=axes[0])
axes[0].set_title('Average Review Text Sentiment per Rating')
axes[0].set_xlabel('Rating')
axes[0].set_ylabel('Average Sentiment Score')
sns.barplot(data=rating_sentiment_summary, x='overall', y='summary_sentiment', ax=axes[1])
axes[1].set_title('Average Summary Sentiment per Rating')
axes[1].set_xlabel('Rating')
axes[1].set_ylabel('Average Sentiment Score')
plt.show()
/Users/lailongying/opt/anaconda3/lib/python3.9/site-packages/seaborn/rcmod.py:400: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(mpl.__version__) >= "3.0": /Users/lailongying/opt/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
sns.scatterplot(data=merged_df, x='reviewText_sentiment', y='overall', ax=axes[0], alpha=0.2)
axes[0].set_title('Review Text Sentiment vs Rating')
axes[0].set_xlabel('Review Text Sentiment Score')
axes[0].set_ylabel('Rating')
sns.scatterplot(data=merged_df, x='summary_sentiment', y='overall', ax=axes[1], alpha=0.2)
axes[1].set_title('Summary Sentiment vs Rating')
axes[1].set_xlabel('Summary Sentiment Score')
axes[1].set_ylabel('Rating')
plt.show()
# top 20 n-grams
def plot_top_ngrams(top_ngrams, title, n):
labels, values = zip(*top_ngrams)
labels = [" ".join(label) for label in labels]
index = list(range(n))
plt.figure(figsize=(10, 5))
plt.bar(index, values)
plt.xlabel('Phrases')
plt.ylabel('Frequency')
plt.xticks(index, labels, rotation=45)
plt.title(title)
plt.show()
plot_top_ngrams(top_words_high_rated, 'Top 20 Words in High-Rated Products', 20)
plot_top_ngrams(top_bigrams_high_rated, 'Top 20 Bigrams in High-Rated Products', 20)
plot_top_ngrams(top_trigrams_high_rated, 'Top 20 Trigrams in High-Rated Products', 20)
# top 20 n-grams
def plot_top_ngrams(top_ngrams, title, n):
labels, values = zip(*top_ngrams)
labels = [" ".join(label) for label in labels]
index = list(range(n))
plt.figure(figsize=(10, 5))
plt.bar(index, values)
plt.xlabel('Phrases')
plt.ylabel('Frequency')
plt.xticks(index, labels, rotation=45)
plt.title(title)
plt.show()
plot_top_ngrams(top_words_low_rated, 'Top 20 Words in Low-Rated Products', 20)
plot_top_ngrams(top_bigrams_low_rated, 'Top 20 Bigrams in Low-Rated Products', 20)
plot_top_ngrams(top_trigrams_low_rated, 'Top 20 Trigrams in Low-Rated Products', 20)