Previous: Working with session data | Next: Full API reference
Reporting
The run() method returns a dictionary of ReportEntry objects. The report entry object contains a number of different objects:
- ReportEntry.date - a python datetime object for when the entry was created
- ReportEntry.field - a string representing the name of the field which was processed
- ReportEntry.number_records - the number of records effected
- ReportEntry.category - a string categorizing the type of action applied to the rows
- ReportEntry.description - a string with more detail on the action taken
- ReportEntry.df - a pandas DataFrame containing any records that did not pass validation, and any values they were replaced with
The ReportEntry class serializes to a human readable log file, but it is more common for it to be post-processed into a machine readable format and for the DataFrames to be saved to disk for later review.
Custom reporting
By default the FileProcessor class uses the DefaultReportWrite() class that aggregated ReportEntry() classes and returns them in a list at the end of the project.
In order to write your own custom reporting class you need to implement a class with two interfaces: log_history, and get_report.
Here is an example that simply logs all information to stdout:
class MyReportWriter():
def log_history(self,
rule,
field,
df,
category,
description
):
print("{0}, Field {1}: #{2} {3}/{4}".format(date,
field,
df.shape[0],
category,
description))
def get_report(self):
return None
fp = FileProcessor(report_writer = MyReportWriter())
df = pd.read_csv(my_file)
report = fp.run(df,
config={"rules": [
{
"rule_type": "String",
"field": "name"
"params": {
"fallback_mode": "remove_record",
"regex": "[\w\s]*"
},
}]})
Related documentation
- Dativa Pipeline API on AWS - The Dativa Pipeline API is available through the AWS marketplace (more)
- Dativa Pipeline Python Client - Dativa Tools includes a client for the Pipeline API (more)
- Dativa Pipeline API: Sample Data - Sample files to demonstrate usage of the Dativa Pipeline API (more)
- Dativa Pipeline API: Validating basic data types - Validating incoming datasets for basic string, number, and date type formatting and range checks using the Dativa Data Pipeline API (more)
- Dativa Pipeline API: Anonymizing data - The Dativa Pipeline API support tokenization, hashing, and encyrption of incoming datasets for anonymisation and pseudonymization (more)
- Dativa Pipeline API: Referential Integrity - Using the Dativa Pipeline API to validate data against other known good datasets to ensure referential integrity (more)
- Dativa Pipeline API: Handling invalid data - Invalid data can be quarantined or automatically fixed by the Dativa Data Pipeline API (more)
- Dativa Pipeline API: Working with session data - The Dativa Pipeline API can check for gaps and overlaps in session data and automatically fix them (more)
- Dativa Pipeline API: Full API reference - A field by field breakdown of the full functionality of the Dativa Data Pipeline API (more)