Dativa Pipeline Python Client

Dativa Pipeline Python Client

Previous: Overview | Next: Sample Data

The dativa tools library provides functionality to extend other Python libraries, typically boto3 and pandas, to make it easier to process, cleanse, and load large datasets at scale.

you can install it using pip as follows:

pip install dativatools

This includes the PipelineClient class, provide api key, source s3 location, destination s3 location, rules, and get source file cleaned and posted to destination.

Refer https://www.dativa.com/tools/dativatools/aws-api/ for more details.

  • param api_key - The individual key provided by the pipeline api
  • type api_key - str
  • param source_s3_url - The s3 source where the csv files are present
  • type source_s3_url - str
  • param destination_s3_url - The destination where the files are to be posted after cleansing
  • type destination_s3_url - str
  • param rules - Rules by which to clean the file
  • type rules - list, str specifying location of the rules file
  • param url - The url of the pipeline api, defaults to https://pipeline-api.dativa.com/clean
  • type url - str
  • param status_url - the url to query for to check status of the api call, defaults to * https://pipeline-api.dativa.com/status/{0}
  • type status_url - url
  • param source_delimiter - the delimiter of the source file, defaults to ,
  • type source_delimiter - str
  • param destination_delimiter - the delimiter of the destination file, defaults to ,
  • type destination_delimiter - str
  • param source_encoding - the encoding of the source file, defaults to utf-8
  • type source_encoding - str
  • param destination_encoding - the encoding of the destination file, defaults to utf-8
  • type destination_encoding - str
from dativa.tools.aws import PipelineClient

obj = PipelineClient(api_key=api_key,
                     rules=rules,
                     source_s3_url="https://s3-us-west-2.amazonaws.com/{0}/source_key".format(bucket),
                     destination_s3_url="https://s3-us-west-2.amazonaws.com/{0}/dest_key".format(bucket),
                     url="https://pipeline-api.dativa.com/clean",
                     status_url="https://pipeline-api.dativa.com/status/{0}",
                     )
obj.run_job()

Previous: Overview | Next: Sample Data

Related documentation

  • Dativa Pipeline API on AWS - The Dativa Pipeline API is available through the AWS marketplace (more)
  • Dativa Pipeline API: Sample Data - Sample files to demonstrate usage of the Dativa Pipeline API (more)
  • Dativa Pipeline API: Validating basic data types - Validating incoming datasets for basic string, number, and date type formatting and range checks using the Dativa Data Pipeline API (more)
  • Dativa Pipeline API: Anonymizing data - The Dativa Pipeline API support tokenization, hashing, and encyrption of incoming datasets for anonymisation and pseudonymization (more)
  • Dativa Pipeline API: Referential Integrity - Using the Dativa Pipeline API to validate data against other known good datasets to ensure referential integrity (more)
  • Dativa Pipeline API: Handling invalid data - Invalid data can be quarantined or automatically fixed by the Dativa Data Pipeline API (more)
  • Dativa Pipeline API: Working with session data - The Dativa Pipeline API can check for gaps and overlaps in session data and automatically fix them (more)
  • Dativa Pipeline API: Reporting and monitoring data quality - The Dativa Pipeline API logs data that does not meet the defined rules and quarantines bad data (more)
  • Dativa Pipeline API: Full API reference - A field by field breakdown of the full functionality of the Dativa Data Pipeline API (more)

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you


Dativa is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California
94901

Registered in Delaware

Thames Tower
Station Road
Reading
RG1 1LX

Registered in England & Wales, number 10202531