About How I Build Experience Projects Certs Skills Education Contact Download Resume

Mohamed Taha
Abo Heiba

Building data systems that are reliable by design,
maintainable in practice, and useful to the business.

Distributed Pipelines Lakehouse Architecture Data Warehousing Workflow Orchestration Fault-Aware Processing
Available for Data Engineering roles  ·  Egypt & Remote
0 End-to-end projects
200K+ Records processed
0 Industry certifications
Mohamed Taha Abo Heiba
#1 / 30+ Best-performing trainee
NTI Huawei Big Data cohort
200K+ Records processed across
Databricks & SQL Server DWs
2 Certs AWS CLF-C02 & Huawei HCIA
Big Data — both first attempt

Approach & Background

I approach data infrastructure as a systems problem, not a scripting task. A pipeline that runs once is not the same as one that runs reliably every day — and the gap between those two things is where most engineering decisions live: schema contracts, failure handling, execution logging, partition strategy, layer separation.

My work spans the full data stack — from raw ingestion to governed, analytics-ready outputs. SQL Server warehouses with stored-procedure transformation layers. Databricks lakehouses built on the Medallion architecture with Unity Catalog for lineage. FastAPI retrieval systems backed by vector stores. Each system is designed to be debuggable, reproducible, and maintainable — not just functional.

Certified in Huawei HCIA-Big Data (best trainee across 30+ participants, first attempt) and AWS Cloud Practitioner. Finishing a Computer Science degree at Fayoum University — GPA 3.4/4.0, graduating June 2026.

Get In Touch
taha@pipeline ~ bash
taha@pipeline ~ $

How I Build Systems

01
Reliability First
Correctness precedes performance. A pipeline that delivers wrong data quickly is worse than one that fails loudly at the boundary.
From SQL DW project: per-step execution-time logging surfaces failures at the exact stored procedure — not buried in aggregate error output.
02
Observability by Design
Logging and row-count validation are part of the initial design — not patched in after the first production incident.
From Lakehouse project: record counts validated across all 3 medallion layers. SQL DW: each stored procedure logs its own execution duration to a dedicated log table.
03
Layer Separation
Each zone — Bronze, Silver, Gold — is independently queryable and testable. Transformation logic doesn't bleed across boundaries.
From Lakehouse project: Lakehouse Gold exposes a governed star schema via Unity Catalog views, fully decoupled from the ETL notebooks in Silver.
04
Schema Contracts
Upstream schema changes must never silently corrupt downstream consumers. Validation belongs at the boundary, not at the point of failure.
From Lakehouse project: Delta Lake schema enforcement rejects malformed records at ingestion — they cannot propagate past the Bronze layer.
05
Modular Composition
Decomposable steps, not monolithic jobs. Each transformation unit can be tested, replaced, or rerun without touching the surrounding system.
From SQL DW project: 17 stored procedures orchestrated by 2 master procedures — any single unit reruns independently, no full-pipeline restarts.
06
Business Alignment
Data infrastructure exists to support decisions. The transformation is defined by what the analysis needs — not by what's technically interesting.
From Power BI project: every Power Query step — removing 135K nulls, engineering 8 features — driven directly by downstream analytical requirements.

Experience & Training

AWS Cloud Data Engineer Trainee Oct 2025 — Dec 2025
NTI — National Telecommunications Institute  ·  Digital Egypt Youth Program
  • Architected a multi-tier cloud data platform on AWS from S3 static hosting to a production-grade EC2/RDS/VPC architecture, applying Well-Architected Framework principles across compute, storage, and networking layers.
  • Configured RDS Multi-AZ with automated failover and S3 data lifecycle policies (Standard to Infrequent Access to Glacier) for cost-optimised cloud storage management.
Big Data Engineer Trainee Jul 2025 — Sep 2025
NTI — National Telecommunications Institute  ·  Huawei Big Data Program
Best-Performing Trainee — Exam Voucher Awarded
  • Awarded best-performing trainee across a cohort of 30+ and earned a Huawei exam voucher — passed the HCIA-Big Data certification on the first attempt.
  • Built GB-scale Spark batch pipelines on Hadoop (HDFS/YARN/Hive) and Kafka + Flink streaming pipelines for real-time log ingestion across 15+ hands-on labs.
Data Engineer Trainee Apr 2024 — Oct 2024
DEPI — Digital Egypt Pioneers Initiative
  • Built 4 SSIS ETL packages ingesting 7 CSV data sources into SQL Server via 17 stored procedures, automating data ingestion and transformation workflows end-to-end.
  • Delivered a validated dataset of 70K records powering a customer churn model achieving 97.4% accuracy, enabling data-driven retention strategies.

Projects

Certifications

Huawei Technologies
HCIA — Big Data Associate
Issued Mar 2026  ·  Valid until Mar 2029
Cert No. 010101801855810426731409
Best-Performing Trainee · Passed on First Attempt
Amazon Web Services
AWS Certified Cloud Practitioner (CLF-C02)
Issued May 2026  ·  Valid until May 2029
AWS Certified · Cloud Practitioner

Tools & Technologies

Pipeline Engineering
PySpark Delta Lake Databricks SSIS Medallion Architecture Kafka Flink ETL/ELT patterns Batch Processing
Storage & Warehousing
SQL Server Star Schema Unity Catalog ChromaDB Dimensional Modeling HDFS HBase Snowflake Schema
Cloud — AWS
S3 EC2 RDS IAM VPC Glue Redshift Athena Lambda CloudWatch
Languages & APIs
Python T-SQL FastAPI LangChain Pandas Bash NumPy
Analytics & Tooling
Power BI DAX Power Query Git Docker Matplotlib Linux

Education

Oct 2022 — Jun 2026
Bachelor of Science — Computer Science
Fayoum University  ·  Faculty of Computers and Artificial Intelligence
GPA 3.4 / 4.0. Core coursework: Big Data Engineering, Cloud Computing, Database Management Systems, Advanced SQL, Data Structures & Algorithms, Machine Learning, Operating Systems (Linux, Docker), Python Programming, Data Visualization (Power BI).

Graduation Project: EduMate — a unified academic management platform with an RAG chatbot built on LangChain and ChromaDB.

Get In Touch

Actively looking for Data Engineering, Big Data, and Cloud Data Engineering roles. If you're building data infrastructure, pipelines, or lakehouses — reach out. I respond within 24 hours.