I had the task of storing a .CSV format file in an Amazon S3 storage, but the pre-requisite was converting this .CSV format file to Apache Parquet format. Thinking from this perspective, I assume that one of the reasons for this conversion is to reduce costs, and for that, the tool Amazon Glue is the perfect tool to accomplish this task. I thought to describe and show my implementation here, but I realized that there are many how-tos that I followed, and their link is shared below as a source:
Convert CSV / JSON files to Apache Parquet using AWS Glue
Three AWS Glue ETL job types for converting data to Apache Parquet
Format Options for ETL Inputs and Outputs in AWS Glue
AWS Glue | CSV to Parquet transformation | Getting started
AWS: How to use AWS Glue ETL to convert CSV to Parquet – Tutorial
Hi! I am Bruno, a Brazilian born and bred. Former Oracle ACE, Computer Scientist, MSc in Data Science, over ten years of experience in companies such as IBM, Epico Tech, and Playtech based in three different countries (Brazil, Hungary, and Sweden) and joined projects remotely in many others. I am super excited to show you my interest in Databases, Cloud, Data Science, Data Engineering, Bigdata, AI, Programming, Software Engineering, and data in general.
(Continue reading)