Dativa Pipeline API: Validating basic data types

Dativa Pipeline API: Validating basic data types

Previous: Sample Data | Next: Anonymizing data

The basic data type validation rules are String, Number, and Date.

String Validation

String rules validates a field as a string, according to its length and optionally using a regular expression.

It takes the following parameters:

  • minimum_length - the minimum length of the string, defaults to 0
  • maximum_length - the maximum allowed length of the string, defaults to 1024
  • regex - a regular expression to be applied to validate the string. This is processed using the standard Python regular expression processor and supports all syntax documented here. It default to ".*"

The example below validates that a string is between 1 and 40 characters and contains only alphanumeric characters:

config = {
    "rules": [
        {
            "rule_type": "String",
            "field": "Name"
            "params": {
                "fallback_mode": "remove_record",
                "maximum_length": 65536,
                "minimum_length": 1,
                "regex": "[/w/d]*"
            },
        }
    ]
},

Number Validation

Number rules validates a field as a number, according to the number of decimal places and its size.

It takes the following parameters:

  • minimum_value - the minimum allowed value of the number, defaults to 0
  • maximum_value - the maximum allowed value of the number, defaults to 65535
  • decimal_places - the number of decimal places the number should contains, defaults to 0
  • fix_decimal_places - if set to True, the default value, the rule will automatically fix all records to the same number of decimal places

If the column of the passed data is of dtype float then decimal places validation is

The example below validates that a number is between 0 and 255 and contains no decimal places. Any invalid decimal places are fixed automatically.

config = {
    "rules": [
        {
            "rule_type": "Number",
            "field": "Name"
            "params": {
                "fallback_mode": "remove_record",
                "minimum_value": 0,
                "maximum_value": 255,
                "decimal_places": 0,
                "fix_decimal_places": True
            },
        }
    ]
},

Date Validation

Date rules validate the field as a date, date format and the range of the dates.

It takes the following parameters:

  • date_format - the format of the date parameter. This is based on the standard strftime parameters with the addition of the format '%s' represents an integer epoch time in seconds. The default date format is '%Y-%m-%d %H:%M:%S'
  • range_check - this can be set to the following values:
    • none - in which case the date is not checked
    • fixed - in which case the field is validated to lie between two fixed dates
    • rolling in which case the field is validated against a rolling window of days.
  • range_minimum - if range_check is set to "fixed" then this represents the start of the date range. If the range check is set to "rolling" it represents the offset from the current time, in days, that should be used as the earliest point in the range. The default value is '2000-01-01 00:00:00'
  • range_maximum - if range_check is set to "fixed" then this represents the end of the date range. If the range check is set to "rolling" it represents the offset from the current time, in days, that should be used as the latest point in the range. The default value is '2020-01-01 00:00:00'

The example below validates that a date is in epoch time and within the last week of whenever the rule is run:

config = {
    "rules": [
        {
            "rule_type": "Date",
            "field": "Name"
            "params": {
                "fallback_mode": "remove_record",
                "date_format": "%s",
                "range_check": "rolling",
                "range_minimum": -7,
                "range_maximum": 0
            },
        }
    ]
},

Previous: Sample Data | Next: Anonymizing data

Related documentation

  • Dativa Pipeline API on AWS - The Dativa Pipeline API is available through the AWS marketplace (more)
  • Dativa Pipeline Python Client - Dativa Tools includes a client for the Pipeline API (more)
  • Dativa Pipeline API: Sample Data - Sample files to demonstrate usage of the Dativa Pipeline API (more)
  • Dativa Pipeline API: Anonymizing data - The Dativa Pipeline API support tokenization, hashing, and encyrption of incoming datasets for anonymisation and pseudonymization (more)
  • Dativa Pipeline API: Referential Integrity - Using the Dativa Pipeline API to validate data against other known good datasets to ensure referential integrity (more)
  • Dativa Pipeline API: Handling invalid data - Invalid data can be quarantined or automatically fixed by the Dativa Data Pipeline API (more)
  • Dativa Pipeline API: Working with session data - The Dativa Pipeline API can check for gaps and overlaps in session data and automatically fix them (more)
  • Dativa Pipeline API: Reporting and monitoring data quality - The Dativa Pipeline API logs data that does not meet the defined rules and quarantines bad data (more)
  • Dativa Pipeline API: Full API reference - A field by field breakdown of the full functionality of the Dativa Data Pipeline API (more)

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you


Dativa is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California
94901

Registered in Delaware

Thames Tower
Station Road
Reading
RG1 1LX

Registered in England & Wales, number 10202531