Senior Data Engineer

Building Scalable
Data Pipelines

Transforming raw data into actionable insights. Specializing in distributed computing with Apache Spark, robust SQL architectures, and intuitive BI Dashboards.

View Projects GitHub

Technical Arsenal

Tools and technologies I use to solve complex data problems.

Core Engineering

Python (Pandas, PySpark)
Advanced SQL (Window Functions, CTEs)
Apache Spark (Optimization)
Airflow / Dagster

BI & Visualization

Microsoft Power BI (DAX)
Tableau (Storytelling)
Metabase (Self-service)
Data Modeling (Star Schema)

Cloud & Infra

AWS (S3, EMR, Redshift)
Docker & Kubernetes
Linux / Bash Scripting
Git / CI/CD Pipelines

Featured Projects

End-to-end data solutions from ingestion to visualization.

Real-Time ETL Pipeline with Spark

Designed a scalable pipeline ingesting 50k+ events/sec from Kafka. Processed data using PySpark on AWS EMR, transformed it into a Delta Lake format, and loaded it into Snowflake for analytics.

PySpark Kafka AWS EMR Delta Lake

View Code Live Demo

Pipeline Architecture Diagram

Enterprise Sales Dashboard

Built a comprehensive BI suite connecting to a SQL Server warehouse. Created complex DAX measures for YoY growth and cohort analysis. Deployed interactive dashboards in Power BI and Metabase for stakeholders.

Power BI SQL Server Metabase DAX

View Case Study

Dashboard Preview

Let's Connect

Currently open to opportunities where I can leverage my experience in Big Data and Analytics to drive business value.

GitHub

Building Scalable Data Pipelines