Exam Preparation – Google Cloud Certified – Professional Data Engineer

Google Professional Data Engineer is one of the most exciting and interesting exams I was aiming to pass. Over my career, I have worked mainly with database management until I realized something had happened – The RDBMS databases were no longer the inevitable choice for data management. As a Data Engineer, one is expected to work still with databases and streaming technologies, microservices, third-party APIs, data pipelines, big data technologies, etc. And back then, I had no idea what was happening. Therefore, what does one do when do not know about something? Start to search for answers that can be achieved from different trustworthy sources. In my case, I started taking a master’s program in Data Science at the University of Luleå and Software Engineering at the Blekinge Institute of Technology both in Sweden, which aligned with my bachelor’s in Computer Science and gave me even more understanding of data structures and machine learning algorithms that can be from different types such as Supervised Learning, Unsupervised Learning and Reinforced Learning that strengthen my solid foundation in software engineering and orchestration tools as Airflow, Python, and Terraform.

Additionally to this program, I started doing some research. It was like a Eureka moment for me when I learned the different types of data types such as structured, unstructured, and semi-structured data. And the growth of NoSQL databases and data warehousing in addition to the use of the traditional RDBMS for data retrieval. Also, the different kinds of methodologies and steps to manage the raw or processed form data from various storage providers. As the years passed and aligned with the studies, I started to work with data preparation, understanding, and manipulation to create a business logic with the crème de la crème of data analytics which, by enabling data-driven decision-making, one can gain important information from the dataset being analyzed (If you never heard about it, check this other blog post : https://www.techdatabasket.com/2022/12/20/analyzing-my-personal-running-history-of-2022/). Being said, I finally aimed for my first Data Engineer certification, which was the Google Professional Data Engineer exam. Then the first thing to do was to cover all the topics below from https://cloud.google.com/certification/guides/data-engineer:

Section 1: Designing data processing systems

1.1 Selecting the appropriate storage technologies. Considerations include:

    ●  Mapping storage systems to business requirements

    ●  Data modeling

    ●  Trade-offs involving latency, throughput, transactions

    ●  Distributed systems

    ●  Schema design

1.2 Designing data pipelines. Considerations include:

    ●  Data publishing and visualization (e.g., BigQuery)

    ●  Batch and streaming data (e.g., Dataflow, Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Pub/Sub, Apache Kafka)

    ●  Online (interactive) vs. batch predictions

    ●  Job automation and orchestration (e.g., Cloud Composer)

1.3 Designing a data processing solution. Considerations include:

    ●  Choice of infrastructure

    ●  System availability and fault tolerance

    ●  Use of distributed systems

    ●  Capacity planning

    ●  Hybrid cloud and edge computing

    ●  Architecture options (e.g., message brokers, message queues, middleware, service-oriented architecture, serverless functions)

    ●  At least once, in-order, and exactly once, etc., event processing

1.4 Migrating data warehousing and data processing. Considerations include:

    ●  Awareness of current state and how to migrate a design to a future state

    ●  Migrating from on-premises to cloud (Data Transfer Service, Transfer Appliance, Cloud Networking)

    ●  Validating a migration

Section 2: Building and operationalizing data processing systems

2.1 Building and operationalizing storage systems. Considerations include:

    ●  Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Datastore, Memorystore)

    ●  Storage costs and performance

    ●  Life cycle management of data

2.2 Building and operationalizing pipelines. Considerations include:

    ●  Data cleansing

    ●  Batch and streaming

    ●  Transformation

    ●  Data acquisition and import

    ●  Integrating with new data sources

2.3 Building and operationalizing processing infrastructure. Considerations include:

    ●  Provisioning resources

    ●  Monitoring pipelines

    ●  Adjusting pipelines

    ●  Testing and quality control

Section 3: Operationalizing machine learning models

3.1 Leveraging pre-built ML models as a service. Considerations include:

    ●  ML APIs (e.g., Vision API, Speech API)

    ●  Customizing ML APIs (e.g., AutoML Vision, Auto ML text)

    ●  Conversational experiences (e.g., Dialogflow)

3.2 Deploying an ML pipeline. Considerations include:

    ●  Ingesting appropriate data

    ●  Retraining of machine learning models (AI Platform Prediction and Training, BigQuery ML, Kubeflow, Spark ML)

    ●  Continuous evaluation

3.3 Choosing the appropriate training and serving infrastructure. Considerations include:

    ●  Distributed vs. single machine

    ●  Use of edge compute

    ●  Hardware accelerators (e.g., GPU, TPU)

3.4 Measuring, monitoring, and troubleshooting machine learning models. Considerations include:

    ●  Machine learning terminology (e.g., features, labels, models, regression, classification, recommendation, supervised and unsupervised learning, evaluation metrics)

    ●  Impact of dependencies of machine learning models

    ●  Common sources of error (e.g., assumptions about data)

Section 4: Ensuring solution quality

4.1 Designing for security and compliance. Considerations include:

    ●  Identity and access management (e.g., Cloud IAM)

    ●  Data security (encryption, key management)

    ●  Ensuring privacy (e.g., Data Loss Prevention API)

    ●  Legal compliance (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))

4.2 Ensuring scalability and efficiency. Considerations include:

    ●  Building and running test suites

    ●  Pipeline monitoring (e.g., Cloud Monitoring)

    ●  Assessing, troubleshooting, and improving data representations and data processing infrastructure

    ●  Resizing and autoscaling resources

4.3 Ensuring reliability and fidelity. Considerations include:

    ●  Performing data preparation and quality control (e.g., Dataprep)

    ●  Verification and monitoring

    ●  Planning, executing, and stress testing data recovery (fault tolerance, rerunning failed jobs, performing retrospective re-analysis)

    ●  Choosing between ACID, idempotent, eventually consistent requirements

4.4 Ensuring flexibility and portability. Considerations include:

    ●  Mapping to current and future business requirements

    ●  Designing for data and application portability (e.g., multicloud, data residency requirements)

    ●  Data staging, cataloging, and discovery

The best thing of preparing for a Google exam is that alike Oracle, Microsoft and Amazon, they do provide the full documentation where one can find the insights to study. Thus, I am sharing the two important links for this exam as below:

Google Cloud documentation: https://cloud.google.com/docs

Google Cloud quickstarts and tutorials: https://cloud.google.com/docs/tutorials

One thing that I liked about the Google exams is that they are pretty straightforward in the questions, and knowing the services and architecture they eventually explain in each question, one can pass the exam. In my case, I rebooked the exam to a date that I was not expected since I was confused some days and even that, by preparing for three weeks and enjoying the learning from my previous experience in the “EXAM PREPARATION – GOOGLE CLOUD CERTIFIED – PROFESSIONAL CLOUD DATABASE ENGINEER” I could pass. And after some days, I got the results from Google as below:

I hope some tips shared here motivate you to pursue this certification. I wish you happy studies!

Related posts

Leave a Comment