Joined up data is a key concept for any enterprise using data from multiple sources, whether these belong to the company or not. Once the data science team have successfully cleansed these data sources and ensured that each strand of data works properly, then is the time to join-up these various strands so that they can all seamlessly work together. Just as a soccer team requires eleven players to work together in a coordinated manner for the good of the team, so joined-up data can work together in a way that individual strands cannot do. We have already written about a single customer view (SCV) where a company holds every piece of data about a customer in the same place, and this is a great illustration of when the data are joined-up and working. And with propensity modeling able to automate the processes that create an SCV through machine learning, it is feasible to have SCVs updated in real-time.
Our data science consulting team recommends that any company which does not have SCVs in place for all its customers should do a gap analysis (which compares actual and desirable performance in the business) with the aim of creating said SCVs. The other important activity in creating joined-up date is data matching. We have written before about IP matching, which is one type of data matching.
Getting the data joined up
Almost no business in 2019 will have to deal with just one dataset. There is so much useful data available that such a business would find itself rapidly outcompeted by its rivals with their multiple datasets and great cross-analysis between them. Successfully joining up these data sets represents what our data science consulting team can only describe as a huge prize for the enterprises which can do this successfully. In practical terms, the prize is an increased ability to execute its goals, drive its KPIs, and building an SCV.
When you have two or more datasets that come from different sources, the same item, such as a somebody's name, their email address, their IP address or another identifier the company has assigned to the individual, may appear in several different datasets. Data matching can either be deterministic, where a perfect match is found, such as the same email address, or probabilistic matching. If deterministic matching works better than probabilistic matching, why would any company ever use the latter? The answer is PII data. We have written previously about the 46 elements which make up the BIT protocol, and these 46 elements are what constitute PII data. If you are a company which does business in the EU, then you need to be extremely careful that your deterministic matching does not violate GDPR. However, probabilistic matching is also helpful in other examples. For instance, if a company has an IP address, and there are seven devices connected to it, but only five of them regularly connect to the IP address while the other two only connect occasionally then a probabilistic match might conclude that only the five belong to the household and that the other two are from visitors. This conclusion is reinforced when these two other devices are portable and appear more regularly in data from other locations.
Data matching is particularly useful in helping a company understand its customers as well as in advertising its products and services, that is, marketing and sales. Examples of this include matching users of a subscription app which, among other things, advertises a particular product, with a database of email addresses for advertising the same product. By removing those who are both on the email database list and the subscription app from one of these two datasets, the company can avoid targeting the same users twice; thus both saving on advertising costs and preventing churn.
Another matching example involves using some of the large databases held by information service providers (ISP) such as Experian and Acxiom, who segments users according to behavior, attitudes, and purchasing habits. If an enterprise has access to such a database, it can search for matches with its own customer database. If the business finds a match with a customer, it can then use the segmented info about the matched user in its own operations, e.g., targeting a user which the ISP database has segmented as a soccer fanatic with soccer-related advertising, or with sportswear if the customer already has a long history of making leisurewear purchases. Data matching is such a worthwhile activity with ISP databases because it is a much more potent tool than the demographics which advertisers traditionally used. So if the company is advertising a particular brand of truck it can save money by focussing its efforts on people with a track record of liking or purchasing trucks and this will be much more efficient, thus avoiding lots of wasted effort and money, than advertising to demographics which have a reasonable number of truck purchasers within them.
Another area where matching is of interest to businesses is in understanding their customers better. All companies want better customer acquisition and retention, and many would like to upsell to them as well, and data matching can really help with all these activities. The more you know about a customer, the easier it becomes to engage with them, as well as to build products and services tailored for them, take them into consideration when doing business planning, tailoring marketing and increasing sales to them through providing insights into them for advertising to them. With an SCV in place which has already matched the particular customer to wherever she or he appears in any database held by the enterprise, joined-up data becomes a reality.