Photo by Kilyan Sockalingum on Unsplash

Introduction to product recommender (with Apple’s Turi Create)

import turicreate as tcmovies = tc.SFrame.read_csv("ml-latest-small/movies.csv", header=True,
delimiter=',')
movies
Finished parsing file /home/antonello/Documents/py-notebooks/movielens_recommender/ml-latest-small/movies.csv
Parsing completed. Parsed 100 lines in 0.036318 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as
column_type_hints=[int,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /home/antonello/Documents/py-notebooks/movielens_recommender/ml-latest-small/movies.csv
Parsing completed. Parsed 9742 lines in 0.03129 secs
movies.show()
ratings = tc.SFrame.read_csv("ml-latest-small/ratings.csv", header=True,
delimiter=',')
ratingsFinished parsing file /home/antonello/Documents/py-notebooks/movielens_recommender/ml-latest-small/ratings.csv
Parsing completed. Parsed 100 lines in 0.049596 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as
column_type_hints=[int,int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /home/antonello/Documents/py-notebooks/movielens_recommender/ml-latest-small/ratings.csv
Parsing completed. Parsed 100836 lines in 0.051093 secs.
ratings['rating'].show()
model = tc.recommender.popularity_recommender.create(ratings,
user_id='userId',
item_id='movieId',
target='rating')
most_popular = model.recommend(users=[1,2,3,4,5],k=3)most_popular = most_popular.join(right=movies,on={'movieId':'movieId'},how='inner').sort(['userId','rank'], ascending=True)most_popular.print_rows(num_rows=15)Recsys training: model = popularity
Warning: Ignoring columns timestamp;
To use these columns in scoring predictions, use a model that allows the use of additional features.
Preparing data set.
Data has 100836 observations with 610 users and 9724 items.
Data prepared in: 0.099973s
100836 observations to process; with 9724 unique items.
+--------+---------+-------+------+--------------------------------+
| userId | movieId | score | rank | title |
+--------+---------+-------+------+--------------------------------+
| 1 | 6835 | 5.0 | 1 | Alien Contamination (1980) |
| 1 | 5746 | 5.0 | 2 | Galaxy of Terror (Quest) (... |
| 1 | 131724 | 5.0 | 3 | The Jinx: The Life and Dea... |
| 2 | 3851 | 5.0 | 1 | I'm the One That I Want (2000) |
| 2 | 6835 | 5.0 | 2 | Alien Contamination (1980) |
| 2 | 5746 | 5.0 | 3 | Galaxy of Terror (Quest) (... |
| 3 | 1151 | 5.0 | 1 | Lesson Faust (1994) |
| 3 | 3851 | 5.0 | 2 | I'm the One That I Want (2000) |
| 3 | 131724 | 5.0 | 3 | The Jinx: The Life and Dea... |
| 4 | 6835 | 5.0 | 1 | Alien Contamination (1980) |
| 4 | 5746 | 5.0 | 2 | Galaxy of Terror (Quest) (... |
| 4 | 131724 | 5.0 | 3 | The Jinx: The Life and Dea... |
| 5 | 6835 | 5.0 | 1 | Alien Contamination (1980) |
| 5 | 5746 | 5.0 | 2 | Galaxy of Terror (Quest) (... |
| 5 | 131724 | 5.0 | 3 | The Jinx: The Life and Dea... |
+--------+---------+-------+------+--------------------------------+
+--------------------------------+
| genres |
+--------------------------------+
| Action|Horror|Sci-Fi |
| Action|Horror|Mystery|Sci-Fi |
| Documentary |
| Comedy |
| Action|Horror|Sci-Fi |
| Action|Horror|Mystery|Sci-Fi |
| Animation|Comedy|Drama|Fantasy |
| Comedy |
| Documentary |
| Action|Horror|Sci-Fi |
| Action|Horror|Mystery|Sci-Fi |
| Documentary |
| Action|Horror|Sci-Fi |
| Action|Horror|Mystery|Sci-Fi |
| Documentary |
+--------------------------------+
[15 rows x 6 columns]
training_data, validation_data = tc.recommender.util.random_split_by_user(ratings, 'userId', 'movieId',item_test_proportion=0.2)model = tc.recommender.item_similarity_recommender.create(training_data,
user_id='userId',
item_id='movieId',
target='rating')
items_similarity = model.get_similar_items()Recsys training: model = item_similarity
Warning: Ignoring columns timestamp;
To use these columns in scoring predictions, use a model that allows the use of additional features.
Preparing data set.
Data has 80673 observations with 610 users and 8972 items.
Data prepared in: 0.105496s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 2.441ms | 100 |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 311.493ms | 0 | 3 |
| 1.47s | 100 | 8972 |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 1.51143s
(items_similarity[(items_similarity['movieId'] == 1214)]).join(right=movies,on={'similar':'movieId'},how='inner').sort('rank', ascending=True).print_rows()+---------+---------+---------------------+------+
| movieId | similar | score | rank |
+---------+---------+---------------------+------+
| 1214 | 1200 | 0.517241358757019 | 1 |
| 1214 | 1097 | 0.3395061492919922 | 2 |
| 1214 | 1089 | 0.33529412746429443 | 3 |
| 1214 | 1210 | 0.32692307233810425 | 4 |
| 1214 | 1198 | 0.3051643371582031 | 5 |
| 1214 | 1136 | 0.2971428632736206 | 6 |
| 1214 | 1387 | 0.28767120838165283 | 7 |
| 1214 | 1653 | 0.2789115905761719 | 8 |
| 1214 | 260 | 0.273809552192688 | 9 |
| 1214 | 1213 | 0.273809552192688 | 10 |
+---------+---------+---------------------+------+
+-------------------------------+--------------------------------+
| title | genres |
+-------------------------------+--------------------------------+
| Aliens (1986) | Action|Adventure|Horror|Sci-Fi |
| E.T. the Extra-Terrestrial... | Children|Drama|Sci-Fi |
| Reservoir Dogs (1992) | Crime|Mystery|Thriller |
| Star Wars: Episode VI - Re... | Action|Adventure|Sci-Fi |
| Raiders of the Lost Ark (I... | Action|Adventure |
| Monty Python and the Holy ... | Adventure|Comedy|Fantasy |
| Jaws (1975) | Action|Horror |
| Gattaca (1997) | Drama|Sci-Fi|Thriller |
| Star Wars: Episode IV - A ... | Action|Adventure|Sci-Fi |
| Goodfellas (1990) | Crime|Drama |
+-------------------------------+--------------------------------+
[10 rows x 6 columns]
model.evaluate(validation_data)Overall RMSE: 3.5057222562438963
model = tc.recommender.ranking_factorization_recommender.create(training_data,
user_id='userId',
item_id='movieId',
target='rating')
results = model.recommend(k=3)
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 80673 observations with 610 users and 8972 items.
Data prepared in: 0.133147s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 10084 / 80673 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 16.6667 | Not Viable |
| 1 | 4.16667 | Not Viable |
| 2 | 1.04167 | Not Viable |
| 3 | 0.260417 | Not Viable |
| 4 | 0.0651042 | 1.10129 |
| 5 | 0.0325521 | 1.53943 |
| 6 | 0.016276 | 1.89349 |
| 7 | 0.00813802 | 1.97929 |
+---------+-------------------+------------------------------------------+
| Final | 0.0651042 | 1.10129 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 226us | 2.32822 | 1.08971 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 451.511ms | 2.1881 | 1.16773 | 0.0651042 |
| 2 | 939.443ms | 1.90281 | 1.08491 | 0.0651042 |
| 3 | 1.36s | 1.76298 | 1.02327 | 0.0651042 |
| 4 | 1.84s | 1.65405 | 0.987049 | 0.0651042 |
| 5 | 2.26s | 1.5937 | 0.965333 | 0.0651042 |
| 10 | 4.35s | 1.43729 | 0.899279 | 0.0651042 |
| 20 | 8.51s | 1.1761 | 0.8083 | 0.0651042 |
| 25 | 10.58s | 1.07333 | 0.766321 | 0.0651042 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.04304
Final training RMSE: 0.734723
def join_titles(sframe,on):
return sframe.join(right=movies, on=on, how='inner')
results = join_titles(results,'movieId')results.sort(['userId','rank'], ascending=True).print_rows(20)+--------+---------+--------------------+------+-------------------------------+
| userId | movieId | score | rank | title |
+--------+---------+--------------------+------+-------------------------------+
| 1 | 296 | 5.4481021343133005 | 1 | Pulp Fiction (1994) |
| 1 | 318 | 5.38932802065821 | 2 | Shawshank Redemption, The ... |
| 1 | 858 | 5.369748759116844 | 3 | Godfather, The (1972) |
| 2 | 1198 | 4.917704129159317 | 1 | Raiders of the Lost Ark (I... |
| 2 | 356 | 4.897511178195343 | 2 | Forrest Gump (1994) |
| 2 | 260 | 4.891662919461593 | 3 | Star Wars: Episode IV - A ... |
| 3 | 541 | 5.270915883626655 | 1 | Blade Runner (1982) |
| 3 | 1394 | 5.122784393872932 | 2 | Raising Arizona (1987) |
| 3 | 50 | 4.789787578429893 | 3 | Usual Suspects, The (1995) |
| 4 | 1193 | 5.191378074731544 | 1 | One Flew Over the Cuckoo's... |
| 4 | 318 | 5.121037292327598 | 2 | Shawshank Redemption, The ... |
| 4 | 1247 | 5.0619253991505655 | 3 | Graduate, The (1967) |
| 5 | 2959 | 4.786239120211318 | 1 | Fight Club (1999) |
| 5 | 7361 | 4.614468338932708 | 2 | Eternal Sunshine of the Sp... |
| 5 | 1193 | 4.590772422995284 | 3 | One Flew Over the Cuckoo's... |
| 6 | 1197 | 4.813408214050211 | 1 | Princess Bride, The (1987) |
| 6 | 50 | 4.795824933248438 | 2 | Usual Suspects, The (1995) |
| 6 | 2858 | 4.793457645374216 | 3 | American Beauty (1999) |
| 7 | 1198 | 5.20988603219004 | 1 | Raiders of the Lost Ark (I... |
| 7 | 2571 | 5.106401997651771 | 2 | Matrix, The (1999) |
+--------+---------+--------------------+------+-------------------------------+
+-------------------------------+
| genres |
+-------------------------------+
| Comedy|Crime|Drama|Thriller |
| Crime|Drama |
| Crime|Drama |
| Action|Adventure |
| Comedy|Drama|Romance|War |
| Action|Adventure|Sci-Fi |
| Action|Sci-Fi|Thriller |
| Comedy |
| Crime|Mystery|Thriller |
| Drama |
| Crime|Drama |
| Comedy|Drama|Romance |
| Action|Crime|Drama|Thriller |
| Drama|Romance|Sci-Fi |
| Drama |
| Action|Adventure|Comedy|Fa... |
| Crime|Mystery|Thriller |
| Drama|Romance |
| Action|Adventure |
| Action|Sci-Fi|Thriller |
+-------------------------------+
[1830 rows x 6 columns]
model.evaluate(validation_data)'rmse_overall': 1.0967441224583008}

Tech consultant (antonellocalamea.com) | Avid learner | Composer | Proudly believing less is more, except for love and knowledge

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store