Build your own recommendation engine

Build your own recommendation engine

Tom Weiss, Thu 27 June 2019

Recommendations, or recommender systems, are one of the most powerful tools within the data science toolkit. Some recommendation engines focus on the top-level executives within a company, such as those who make the decisions about what is going to be available on Netflix or similar OTT services this autumn or those who make the spending decisions in any large company. However, the vast majority of recommendations are customer-focused, and these are the ones that are most common in the media industry.

A recommender system is simply an algorithm which examines the products and services offered by the company and then provides them as recommendations to customers. The analog comparison is if a woman goes to a dress shop wanting a dress. The skilled employee can recommend one or more to her, perhaps based on her previous experience of this customer shopping in the store and even if it is her first time in the place, based on such things as her size, her hair color, and suchlike. The recommender system tries to do the same by drawing on the data available about the products available and the customer, to try and find the best match for customer and product.

Understanding the customer

We would argue that for businesses in 2019 which use a recommender system, getting recommendations consistently right (or wrong) can make or break that company, like Netflix, among others, have demonstrated, which means that getting the system right is of critical importance to any enterprise using one. Moreover, the first step the data science team needs to take in building a recommender system is to understand the customer.

The main reason businesses need to use recommender systems with online customers is because of the sheer range of choice available. When someone wanted to buy a TV in the past, they'd go to a store, which would have perhaps 50 different types of TV. Contrast this to the digital world where the customer might have a choice of 5000 or more separate devices. The appearance of the product is going to be less critical for the digital consumer than the things the TV does (is it a smart TV?) whereas a consumer in a store might pick one set simply because they like the look of it. The same deluge of choice happens with watching TV, so in the analog era there were perhaps 50 channels and the consumer had to choose one out of fifty, but if they go into YouTube there are millions of possible shows the consumer could watch. So when recommender systems work well, they allow the consumer to swiftly and smoothly find what they want in a world swamped with choice.

Search engines also play an essential role in helping consumers find what they want but are better suited to consumers who know what they are looking for but are not sure where to find it, so they ask the search engine. If a consumer wants to watch Breaking Bad tonight, a search engine will help them see where they can watch it. People don't always know what they want to watch, though, and often they want to watch a TV show that they are going to enjoy. Giving a different example, perhaps someone gets a bonus at work or a small inheritance and is unsure what to spend it on; all they know is that they want to buy something from an e-commerce site. Recommendations work best where people have an idea of what they want but aren't that specific. If the offer is good, so that the viewer, thanks to the recommender system, finds something to watch or buy that they then really enjoy, they will likely use the recommender system, and hence your site, again in the future. Recommendations are great for customer retention, always a key goal for any business.

We are seeing recommendations for customers everywhere online, whether it is Amazon suggesting we might like product B because we bought product A or a TV broadcaster recommends we might like Russian Doll because we devoured Game of Thrones. These examples may sound simple but if a recommender system is to have real value for customers it needs to work well every time and for every customer, and just as there are typically a wide variety of products and services, so there are masses of customers, all with different likes and tastes.

Understanding the product

For a recommender system to function well, the team needs not merely to understand the customers but also to understand the products or services. If the product is a fridge, this is relatively easy but, if it is a new TV show, such as Chernobyl, it isn't reasonable to expect a member of the team to watch the show while taking notes to generate the data required about the show for the recommender system. Especially given there are likely thousands of TV shows which the system might recommend. However, it is likely that the company already has many data about this and other shows. So Chernobyl has already been segmented as a historical drama by critics, there have already been many reviewers who have watched the show (some professional, some amateur) and according to Wikipedia "the show was acclaimed by critics, with particular praise towards historical accuracy and attention to detail."

Thus the data is the key to understanding both the product and the customer. The way to use the data available in building the recommender system is by giving attributes to both the product and the customer. Giving characteristics to a product allows the recommender system to rate it for specific factors, which it can then match to relevant customers. The team achieves this attribution by using the data to extract as many attributes about the product as possible. For instance, historically accurate might be an attribution for a drama. Thus dramas with this accuracy attribute would make suitable recommendations for a lover of historical and factual documentaries. A historically inaccurate program such as The Tudors equally might receive a historically incorrect attribution and would thus make a better recommendation for a lover of fictional shows, for whom historical accuracy is unimportant.

It is also necessary to compare products with each other, using the attribution system the team has devised, as a fan of one show will be likely to enjoy a second show if the two shows share many attributes. The key to this, once all the team has collected and cleansed the relevant datasets, is to data mine the data for specific patterns, and applying business rules which the team has created. These patterns and rules allow each product and each customer to receive multiple attributions. So the team might define a TV viewing customer as having a high score for documentaries and historically accurate dramas and a low score for sport and comedies while the team also attributes shows (the product) with similar descriptions, such as a documentary, a comedy. Ideally, the team can also mine a 400-word synopsis or review of the show for the specific attributes the team is looking for within the text. Once the characteristics are all in place, the recommender system should easily be able to match the customer with product and also a product with another product, so if a consumer likes product A, she will probably like product B.

So how do we build one?

Ten years ago, building a recommender system was an extremely challenging task beyond anyone without years of experience in statistics and computer programming, meaning that most companies which wanted to use one had no real options other than to use a commercially available product. However, in 2019, it is relatively simple for anyone with some programming knowledge to build such a system using Python, the general purpose programming language. While it is unlikely that a first attempt to create one will match some of the sophisticated recommender systems built by the likes of YouTube and Amazon, making your own can help you begin to understand the process of constructing recommender systems better, which will then allow you and your team to build more complex and improved methods.

In a content-based recommender, we categorize all the items (perhaps e-commerce products, TV shows or songs) genres and other essential attributes with the fundamental rating concept being that those who like something will probably like something similar.

In a collaborative filtering recommender, the rating is calculated using a type of predictive analytics which tries to estimate how users will rate future items based on how other users rated similar past items. Unlike with content-based systems, collaborative filters don't require the attribution of items.

When building a collaborative filtering recommender, the focus isn't on the attributes of the services or products but instead on the customers and their customer behavior. So if person Y likes a product and person Z has similar tastes, then the recommender can recommend the product to person B. Products are still used in this system but are now customer-focused. So if X and Y like two products, the recommender can identify them as similar and when person Z buys either one of the products, the recommender can recommend the other product. The difference is that in the content-based system it was the attributes of the products and services that the recommender focused on and with the collaborative filtering the focus is entirely on customer behaviors, and this means using different datasets and different formulas to build the recommender.

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you

Other articles about Data Science


Dativa is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California
94901

Registered in Delaware

Thames Tower
Station Road
Reading
RG1 1LX

Registered in England & Wales, number 10202531