Introducing the Overlays Data Capture Architecture

Introducing the Overlays Data Capture Architecture

Paul Knowles, Sun 17 March 2019

Alongside the implementation of GDPR last year, the global rise of Distributed Ledger Technologies (DLT) and the adoption of blockchains as an immutable store of information, have highlighted a risk in how enterprises handle personally identifiable information (PII) data. The blockchain is a visible, distributed ledger which allows for a permanent and unalterable record of whatever data the enterprise captures, such as transactions and any other data. However, with GDPR and similar data laws worldwide already passed or in the pipeline, a flaw in existing data capture and storage solutions is how they deal with PII data.

If a company is doing some research into cancer among young people and publishes this information to a pooled data lake, the problem might be that among other pieces of information that the researchers have captured and stored are the dates of birth of the participants. Perhaps this is one piece of information alongside their age, ethnicity and medical history. However, date-of-birth, especially in combination with these other data, could be used to identify the person, which is both immoral if the individual has not consented to be identified, as well as illegal under regimes such as GDPR.

Storing personal data on a public blockchain is not sensible, but there are many and various legitimate reasons to share data: these range from sharing clinical trial research to building better advertising funded services. Last September the blinding identity taxonomy (BIT) initiative was released which elaborated on 46 elements which require encryption when stored in databases, and including date of birth as well as any other data elements which can identify an individual, such as their name, where they live, or their email address or phone number.

When capturing or storing information, how do companies protect the PII data? To provide a solution to this challenging question, Dativa's Innovation and Emerging Technology Lead, Paul Knowles, who is also chair of the semantics workgroup at Hyperledger Indy, a distributed ledger purpose-built for decentralized identity, has developed the Overlays Data Capture Architecture (ODCA). We aim to provide a standardized global solution for data capture and exchange which protects PII data.

Data Capture and Exchange Architecture

A schema, which is a machine-readable definition of the semantics of data structure, is typically created as a single data object. However, this new architecture represents the schema as a multi-dimensional object consisting of generic schema bases with overlays, which are data structures that provide an extra layer of contextual and conditional information to the schema, providing additional extensions, coloration, and functionality to those base objects.

ODCA works in an open source environment, ensuring that the schema bases can remain generic, thus allowing diverse use cases for each schema. These schemas thus provide a standard starting point from which to decentralize data because the base definitions remain in their simplest and purest form. ODCA enables overlays, separate data objects linked to specific schema bases. The degree of separation between schema bases and overlays allows multiple parties to use the same base objects for similar data capture requirements, thus offering a structure for capturing data in a standardized format which is available to all. Each party can create their suite of overlays to add extra context to transform how information is displayed to a viewer or to guide an agent in how to apply a custom process to schema data. Although the early phase development of ODCA will take place on Hyperledger Indy, the aim is to develop the code so that it is indeed platform-agnostic.

Overlays architecture

In the example above we have multiple overlays developed by different organizations to provide a set of metadata that adequately describes a single set of data.

Advantages of ODCA implementation

There are many advantages to using ODCA:

  1. Data pooling. Decoupling can occur at any time as overlays are linked objects. With all coloration stored in the overlays, combining data from related sources becomes much easier to use. Overlays can be removed from the base objects before the data merging process begins and reapplied to ensure consistent coloration post data pooling.
  2. Stable schema bases. Most updates to schema tend to be at the application stage. In the case of the OCDA architecture, the definition of all additional extensions, coloration, and functionality occurs in the overlays, which means that those using it can edit one or more of the linked objects to create simple updates rather than having to reissue schema bases on a regular basis, saving a tremendous amount of effort.
  3. PII encryption. Issuers can flag PII attributes in the schema bases by referencing the BIT initiative on defining PII data. With PII attributes flagged at the base object layer, all corresponding data can be treated as sensitive throughout the data lifecycle and encrypted at any stage, making identifying individuals impossible and thus guaranteeing their privacy.
  4. Data decentralization. The architecture ensures that schema definitions can remain in their purest form thus providing a standard base from which to decentralize data. Any company contributing data to a decentralized data lake for third-party usage with the consent given can capture data using publicly available open source generic schema bases. This usage ensures that captured data is standardized before any data lake migration happens.

In conjunction with the emergence of Distributed Ledger Technologies (DLT) and using an innovative blockchain solution, the Overlays architecture, primarily conceived for data object interoperability and data sharing, is perfect for the Fourth Industrial Revolution of disruptive technologies and user-centric data trends in which we are living.

At Dativa, we believe this architecture will significantly enhance the ability to pool data more effectively in terms of simplicity, accuracy and the allocation of resources. Data interoperability is unquestionably a high priority for a number of our clients. Developing and deploying the right data capture architecture would ensure the future improvement of the quality of externally pooled data. We believe this will offer a positive alternative to what is currently available for many enterprises which are involved in the capture and exchanging of data. We are presently developing middleware tooling in Dativa's innovation hub.

You can download the presentation deck here.

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you

Other articles about The Innovation Hub


Dativa is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California
94901

Registered in Delaware

Thames Tower
Station Road
Reading
RG1 1LX

Registered in England & Wales, number 10202531