The below sections will help you prepare for the Google Cloud Platform- Professional Data Engineering Certification.

Please click on the links below to read the details of the relevant sections. 

Section 1 – Introduction to Data Engineering
  • Explore the role of a data engineer
  • Data engineering challenges
  • Intro to BigQuery
  • Data Lakes and Data Warehouses
  • Federated Queries with BigQuery
  • Transactional Databases vs Data Warehouses
  • Manage data access and governance
  • Build production-ready pipelines
Section 2 – Building a Data Lake
  • Introduction to Data Lakes
  • Data Storage and ETL options on GCP
  • Building a Data Lake using Cloud Storage
  • Optimizing cost with Google Cloud Storage classes and Cloud Functions
  • Securing Cloud Storage
  • Storing All Sorts of Data Types
  • Running federated queries on Parquet and ORC files in BigQuery
  • Cloud SQL as a relational Data Lake
Section 3: Building a Data Warehouse
  • The modern data warehouse
  • Intro to BigQuery
  • Getting Started
  • Loading Data
  • Querying Cloud SQL from BigQuery.
  • Exploring Schemas.
    • Exploring BigQuery Public Datasets with SQL using INFORMATION_SCHEMA.
  • Schema Design
  • Nested and Repeated Fields
  • Working with JSON and Array data in BigQuery
  • Optimizing with Partitioning and Clustering
  • Transforming Batch and Streaming Data.
Section 4: Introduction to Building Batch Data Pipelines
  • ELT and ETL
  • Quality considerations
  • How to carry out operations in BigQuery
  • ELT to improve data quality in BigQuery
  • Shortcomings
  • ETL to solve data quality issues
Section 5: Executing Spark on Cloud Dataproc
  • The Hadoop ecosystem
  • Running Hadoop on Cloud Dataproc
  • GCS instead of HDFS
  • Optimizing Dataproc
    • Running Apache Spark jobs on Cloud Dataproc.
Section 6: Serverless Data Processing with Cloud Dataflow
  • Cloud Dataflow
    • Why customers value Dataflow
  • Dataflow Pipelines
    • A Simple Dataflow Pipeline (Python/Java)
    • MapReduce in Dataflow (Python/Java)
  • Side Inputs (Python/Java)
  • Dataflow Templates
  • Dataflow SQL
Section 7: Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
  • Building Batch Data Pipelines visually with Cloud Data Fusion
  • Components
  • UI Overview
  • Building a Pipeline
  • Exploring Data using Wrangler
  • Lab: An Introduction to Cloud Composer.
  • Orchestrating work between GCP services with Cloud Composer
  • Apache Airflow Environment
  • DAGs and Operators
  • Workflow Scheduling
  • Monitoring and Logging.
Section 8: Introduction to Processing Streaming Data
  • Processing Streaming Data
Section 9: Serverless Messaging with Cloud Pub/Sub
  • Cloud Pub/Sub
Section 10: Cloud Dataflow Streaming Features
  • Cloud Dataflow Streaming Features
Section 11: High-Throughput BigQuery and Bigtable Streaming Features
  • BigQuery Streaming Features
  • Streaming Analytics and Dashboards
  • Cloud Bigtable
  • Streaming Data Pipelines into Bigtable
Section 12: Advanced BigQuery Functionality and Performance
  • Analytic Window Functions
  • Using With Clauses
  • GIS Functions
  • Demo: Mapping Fastest Growing Zip Codes with BigQuery GeoViz.
  • Performance Considerations
  • Optimizing your BigQuery Queries for Performance
  • Creating Date-Partitioned Tables in BigQuery
Section 13: Introduction to Analytics and AI
  • What is AI?
  • From Ad-hoc Data Analysis to Data Driven Decisions
  • Options for ML models on GCP
Section 14: Prebuilt ML model APIs for Unstructured Data
  • Unstructured Data is Hard
  • ML APIs for Enriching Data
  • Using the Natural Language API to Classify Unstructured Text
Section 15: Big Data Analytics with Cloud AI Platform Notebooks
  • Whats a Notebook
  • BigQuery Magic and Ties to Pandas
  • BigQuery in Jupyter Labs on AI Platform
Section 16: Production ML Pipelines with Kubeflow
  • Ways to do ML on GCP
  • Kubeflow
  • AI Hub
  •  Running AI models on Kubeflow
Section 17: Custom Model building with SQL in BigQuery ML
  • BigQuery ML for Quick Model Building
  • Demo: Train a model with BigQuery ML to predict NYC taxi fares
  • Supported Model
  • Lab Option 1: Predict Bike Trip Duration with a Regression Model in BQML
  • Lab Option 2: Movie Recommendations in BigQuery ML
Section 18: Custom Model building with Cloud AutoML
  • Why Auto ML?
  • Auto ML Vision
  • Auto ML NLP
  • Auto ML Tables

© 2024 All rights reserved.

WordPress Cookie Plugin by Real Cookie Banner