Dativa tools Athena client

Dativa tools Athena client

An easy to use client for AWS Athena that will create tables from S3 buckets (using AWS Glue) and run queries against these tables. It support full customisation of SerDe and column names on table creation.

Examples:

Creating tables

The library creates a temporary Glue crawler which is deleted after use, and will also create the database if it does not exist.

from dativa.tools.aws import AthenaClient
ac = AthenaClient("us-east-1", "my_athena_db")
ac.create_table(table_name='my_first_table',
                crawler_target={'S3Targets': [
                    {'Path': 's3://my-bucket/table-data'}]}
                )

# Create a table with a custom SerDe and column names, typical for CSV files
ac.create_table(table_name='comcast_visio_match',
                crawler_target={'S3Targets': [
                    {'Path': 's3://my-bucket/table-data-2', 'Exclusions': ['**._manifest']}]},
                serde='org.apache.hadoop.hive.serde2.OpenCSVSerde',
                columns=[{'Name': 'id', 'Type': 'string'}, {
                    'Name': 'device_id', 'Type': 'string'}, {'Name': 'subscriber_id', 'Type': 'string'}]
                )

Running queries

from dativa.tools.aws import AthenaClient

ac = AthenaClient("us-east-1", "my_athena_db")
ac.add_query(sql="select * from table",
                 name="My first query",
                 output_location= "s3://my-bucket/query-location/")

ac.wait_for_completion()

Fetch results of query

from dativa.tools.aws import AthenaClient

ac = AthenaClient("us-east-1", "my_athena_db")
query = ac.add_query(sql="select * from table",
                     name="My first query",
                     output_location= "s3://my-bucket/query-location/")

ac.wait_for_completion()
ac.get_query_result(query)

Running queries with the output in Parquet and create an Athena table

from dativa.tools.aws import AthenaClient, S3Csv2Parquet

scp = S3Csv2Parquet(region="us-east-1",
                    template_location="s3://my-bucket/glue-template-path/")
ac = AthenaClient("us-east-1", "my_athena_db", s3_parquet=scp)
ac.add_query(sql="select * from table",
                 name="my query that outputs Parquet",
                 output_location="s3://my-bucket/query-location/",
                 parquet=True)

ac.wait_for_completion()

ac.create_table({'S3Targets': [{'Path': "s3://my-bucket/query-location/"}]},
                                        table_name="query_location")

Related documentation

  • Querying AWS Athena and getting the results in Parquet format - (more)
  • Dativa tools data validation - A library to perform basic validation on incoming data files (more)
  • Dativa tools pandas extensions - A set of extensions to pandas for consistent CSV processing and better date time handling (more)
  • Dativa tools pandas extensions - A set of extensions to pandas for consistent CSV processing and better date time handling (more)

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you


Dativa is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California
94901

Registered in Delaware

Thames Tower
Station Road
Reading
RG1 1LX

Registered in England & Wales, number 10202531