Resume - Vijay Anand Pandian - Senior Data Engineer

Professional Summary

Senior Data Engineer with 12+ years of experience architecting scalable data platforms, building customer analytics solutions, and delivering cloud-native applications. Currently at Marks & Spencer working on loyalty programme analytics using Azure Databricks and building interactive Streamlit applications. Proven track record of transforming team productivity through innovative tooling—including a Python Code Generator that reduced pipeline development time from days to minutes. Expert in end-to-end data engineering, stream processing, cloud architecture, and customer analytics with a strong bias for automation and business impact.

Core Competencies

Customer Analytics: Loyalty programme analytics, customer segmentation, behavioral insights
Platform Engineering: Azure Databricks, Streamlit applications, self-service analytics platforms
Data Infrastructure: Real-time streaming (Kafka), batch processing (Spark), data warehousing (BigQuery, Redshift, Snowflake)
Cloud Architecture: Multi-cloud expertise (Azure, AWS, GCP), Infrastructure-as-Code (Terraform, CloudFormation)
Innovation & R&D: Rapid technology adoption, POC development, automation frameworks

Technical Expertise

Languages & Frameworks

Python, Scala, Java, Bash, SQL, C#

Streamlit, Django, Flask, React, Spring Boot

Cloud Platforms

Azure (Databricks, Blob Storage, Data Factory, Synapse)

AWS (EC2, Lambda, Fargate, EMR, S3, Redshift, DynamoDB)

GCP (BigQuery, Composer, GCS, Pub/Sub)

Data Engineering

Azure Databricks, Apache Kafka, Spark, Airflow

DBT, kSQL, Kafka Connect, Sqoop

Data Warehousing

BigQuery, Snowflake, Redshift, Hive

Star Schema, Dimensional Modeling

Infrastructure & DevOps

Terraform, Docker, Kubernetes

Jenkins, GitHub Actions, Bamboo

New Relic, Splunk, CloudWatch

AI/ML & Analytics

TensorFlow, Scikit-learn, NLTK

NLP, Sentiment Analysis, LLMs

Elasticsearch, Kibana

Certifications & Publications

Apache Airflow Fundamentals

Apache Airflow DAG Authoring

Certified Scrum Product Owner (CSPO)

Published Research: "A Novel Cloud Based NIDPS for Smartphones" – International Conference on Security in Computer Networks and Distributed Systems (SNDS'14), Springer CCIS Series, Volume 420

Professional Experience

Senior Data Engineer | Marks and Spencer

May 2025 – Present | London, United Kingdom

Key Achievement: Building customer loyalty analytics platform using Azure Databricks and Streamlit, transforming millions of customer transactions into actionable insights that drive personalization and retention strategies.

Loyalty Programme Analytics

Architecting end-to-end analytics platform for M&S loyalty programme using Azure Databricks
Processing millions of customer transactions to generate insights on customer behavior, segmentation, and lifetime value
Building data pipelines that integrate point-of-sale, online transactions, and customer profile data
Implementing advanced analytics for customer churn prediction, personalized recommendations, and targeted campaigns

Interactive Analytics Applications

Developing Streamlit applications that democratize data access for business stakeholders
Creating self-service dashboards for marketing teams to explore customer segments and campaign performance
Building real-time visualization tools for loyalty metrics, redemption rates, and customer engagement
Impact: Empowering non-technical teams to make data-driven decisions without engineering support

Data Platform Engineering

Implementing medallion architecture (Bronze, Silver, Gold layers) in Azure Databricks for scalable data processing
Optimizing Spark jobs for cost efficiency and performance at scale
Establishing data quality frameworks ensuring accuracy in customer analytics

Technologies: Azure Databricks, Streamlit, PySpark, Python, Azure Blob Storage, Delta Lake, SQL, Power BI

Data Engineer | Sky UK

December 2022 – May 2025 | London, United Kingdom

Key Achievement: Architected and developed a Python Code Generator Framework that automated entire data pipeline creation, reducing development time from days to minutes and enabling the team to execute large-scale refactoring in record time.

Real-Time Analytics Platform

Engineered Kafka streaming pipeline connecting IBM SevOne NPM to GCP BigQuery for real-time network monitoring data ingestion
Implemented kSQL transformations and Kafka Connect for sub-second data availability
Built metadata extraction API service (Python) that processes device/object/indicator data to BigQuery via GCS

Innovation: Code Generator Framework

Developed JSON-to-Terraform code generator that creates complete pipeline infrastructure from simple configuration files
Enforced code consistency, naming standards, and automated documentation generation across 100+ pipelines
Enabled non-technical users to onboard new data sources without manual coding
Impact: 10x faster pipeline deployment, zero configuration drift, eliminated copy-paste errors

Reliability Engineering

Created Kafka Connect task monitoring tool with auto-restart capabilities and circuit-breaker pattern for persistent failures
Reduced pipeline downtime by 95% through proactive failure detection and automated recovery

Technologies: Apache Kafka, Apache Airflow, GCP BigQuery, Python, Terraform, Docker, Jinja2, IBM SevOne NPM

AWS Big Data Consultant (Contractor) | Channel4

May 2022 – December 2022 | London, United Kingdom

Key Achievement: Architected and delivered end-to-end MarTech Analytics Platform integrating data from Facebook, Instagram, Google Ads, Snapchat, and YouTube into a unified Star Schema data warehouse.

Designed Star Schema data model in AWS Redshift with dimension and fact tables for multi-channel marketing analytics
Built ETL pipelines using AWS Glue, Lambda, and SNS for daily API data extraction and transformation
Implemented incremental data loading strategy reducing processing time by 70% and costs by 50%
Integrated Snowflake for advanced analytics and cross-platform reporting
Created automated data quality checks ensuring 99.9% accuracy in marketing metrics

Technologies: AWS (Redshift, Glue, Lambda, Fargate, S3, Aurora), Snowflake, Terraform, Python, Scala, Spark

Senior Associate Manager | Eli Lilly and Company

May 2021 – April 2022 | Bangalore, India

Key Achievement: Led cloud modernization initiative for Clinical Supply Platform, migrating legacy Fuse/Java application to AWS serverless architecture, reducing operational costs by 60%.

Application Modernization - eCSP

Architected serverless message transformation platform using AWS Lambda, Fargate, and Apache Camel
Designed event-driven architecture handling 1M+ messages/day with IBM MQ → AWS MQ → Lambda → DynamoDB flow
Implemented CI/CD pipeline using GitHub Actions, CodePipeline, and CloudFormation with pre-commit hooks
Built comprehensive XML validation framework using Python (xmltodict, xmlschema) with 100% test coverage (pytest)

ETL & Data Transformation

Built Azure Databricks ETL pipelines processing multi-source data (SQL, Snowflake, JSON) with PySpark
Implemented DBT transformation layer ensuring data quality and business logic consistency
Automated cluster profile switching via Databricks API for cost optimization (40% reduction in compute costs)
Established data quality framework with automated integrity checks and anomaly detection

Technologies: AWS (Lambda, Fargate, MQ, DynamoDB, S3, Step Functions), Azure Databricks, Apache Camel, IBM MQ, DBT, Snowflake, Python, Terraform

Software Engineer | United Health Group – Optum

July 2019 – April 2021 | Chennai, India

Key Achievement: Served as Product Owner & Full Stack Engineer for Initiative Manager web application while simultaneously delivering Oracle-to-Spark migration for healthcare data processing.

Initiative Manager Web Application

Led full-stack development using Python/Django backend and React frontend serving 500+ internal users
Managed AWS infrastructure (EC2, CloudFormation, SNS, Alarms) with 99.95% uptime SLA
Implemented CI/CD pipeline using Bamboo, reducing deployment time from 2 hours to 15 minutes
Established monitoring and observability using New Relic, Splunk, and Sonar for proactive issue detection

Data Migration & Replication

Architected Oracle-to-Data Lake migration strategy processing 5TB+ of healthcare data using Spark and Scala
Created UDFs for complex Oracle procedures/functions, achieving 100% functional parity in Spark
Built data replication framework for continuous sync between on-prem Oracle and AWS S3 Data Lake
Developed JSON-to-flat table transformation pipelines for downstream analytics consumption
Designed Low-Level Design (LLD) documentation aligned with enterprise architecture standards

Technologies: Python, Django, React, Postgres, Spark, Scala, AWS (EC2, S3, EMR), Bash, Jenkins, Bamboo, New Relic, Splunk

Data Analytics Engineer | CEB (acquired by Gartner)

August 2015 – July 2019 | Chennai, India

*Corporate Executive Board (CEB) acquired by Gartner in 2017

Key Achievement: Pioneered multiple AI/ML-powered analytics products including Email Bot, Competitors Dashboard, and Next Best Action Recommendation Engine, directly impacting sales and marketing efficiency.

Email Analytics Bot (NLP-Powered)

Architected NLP-based email analytics service processing 10K+ emails daily from Sales & Marketing teams
Implemented text mining pipeline (NLTK) extracting company names, contact info, and sentiment from email threads
Built automated reporting system generating daily/weekly/monthly insights (OOO notifications, leads, invalid addresses)
Impact: Identified 200+ qualified leads/month, reduced manual email processing by 90%

Competitors Intelligence Dashboard

Developed web scraping framework (Scrapy, Python) harvesting competitor data from 50+ sources
Created Elasticsearch + Kibana dashboard visualizing events, locations, sponsors, and market trends
Orchestrated Airflow workflows for daily data refresh and automated alerts
Integrated Google Maps API for geospatial analysis of competitor activities

Next Best Action Recommendation Model

Collaborated with data scientists to deliver end-to-end ML pipeline for subscriber recommendation engine
Built ETL framework extracting features from Hive, transforming with PySpark, and serving via Elasticsearch
Deployed TensorFlow model on AWS with auto-scaling, handling 100K+ predictions/day
Transitioned data cleansing processes from local to distributed big data environment

Additional Projects

ETL Service: BigQuery → Hive migration service for Google Analytics data (scheduled daily processing)
Data Cleansing Tool: ML-based classifier (Naive Bayes, Decision Tree) for automated data munging
Hackathon Winner: Honorable Mention for Android Survey App with QR code integration and real-time analytics

Technologies: Python, TensorFlow, NLTK, Scrapy, Apache Airflow, Hive, PySpark, Sqoop, Elasticsearch, Kibana, MongoDB, GCP BigQuery, AWS

Software Engineer (Automation) | Symantec – Norton

January 2014 – August 2015 | Chennai, India

Key Achievement: Built comprehensive Python automation framework for Norton AntiVirus Mac security testing, reducing manual testing cycles from weeks to days. Submitted 3 patent ideas to Patent Filter Committee.

Security Testing Automation

Developed BAT (Build Acceptance Testing) automation framework for Norton AntiVirus and Symantec Cloud Security
Automated security scenarios: port scanning, firewall validation, brute force detection, malicious URL prevention
Implemented license validation automation ensuring server-client parameter accuracy across network conditions
Built telemetry ping validation tool capturing and validating HTTP packet transmissions to servers
Created scheduled scan automation (daily/weekly/monthly) with virus entry log verification

Framework Development

Contributed to core library development: logger, core-utils, and testing wrappers adopted across teams
Recognized by leadership for completing license test automation ahead of schedule

Security Analyst (Master's Internship)

Worked with Managed Security Services (MSS) team on SIEM tools (HP ArcSight, RSA Security Analytics)
Monitored security events from IDS/IPS, firewalls, and endpoint protection for threat detection
Simulated real-time system logs using Python and Bash scripting for security testing
Delivered weekly/monthly security reports highlighting vulnerabilities and triggered signatures
Built Proof of Concept using Apache Kafka (Java) for enhanced security log processing

Technologies: Python, Bash, Unix/Mac OS, Jenkins, HP ArcSight, RSA Security Analytics, Apache Kafka, Java

Education

Amrita Vishwa Vidyapeetham

Master of Technology in Cyber Security

2014 | Coimbatore, Tamil Nadu, India

Focus: Cloud Security, Network Intrusion Detection, SIEM Technologies

University College of Engineering Tindivanam

Bachelor of Technology in Information Technology

2012 | Tindivanam, Tamil Nadu, India

Focus: Software Engineering, Database Systems, Distributed Computing

Key Achievements & Recognition

🏆

Innovation Award - Sky UK (February 2025)

Received Sky Star Award for creative innovation in automation and code generation

📝

Patent Submissions

3 patent ideas submitted to Symantec Patent Filter Committee for innovative software solutions

🎤

Published Research Paper

"A Novel Cloud Based NIDPS for Smartphones" - SNDS'14, Springer CCIS (March 2014)

🥈

Hackathon Honorable Mention - Gartner (2016)

Android Survey App with QR code integration and real-time analytics dashboard

⭐

Leadership Recognition - Symantec

Completed all license test case automation ahead of schedule, recognized by senior leadership

Career Highlights

From Manual QA to Platform Engineering: My career is defined by a relentless drive to eliminate repetitive work through automation and innovation.

Symantec (2014-2015): Started as a Security Analyst intern working with SIEM tools, quickly transitioned to building Python automation frameworks that the entire Norton team adopted. This is where I discovered my passion for building tools that multiply team productivity.

CEB/Gartner (2015-2019): Evolved into a full-stack data engineer at Corporate Executive Board (acquired by Gartner in 2017), creating AI/ML-powered products including an Email Bot (NLP), Competitors Dashboard (web scraping + Elasticsearch), and contributing to a Next Best Action Recommendation Engine. Won hackathon recognition for innovative mobile survey application.

Optum (2019-2021): Took on dual role as Product Owner and Full Stack Engineer for Initiative Manager web application while simultaneously leading Oracle-to-Spark data migration processing terabytes of healthcare data.

Eli Lilly (2021-2022): Led cloud modernization initiatives, migrating legacy systems to AWS serverless architecture while building robust ETL pipelines in Azure Databricks.

Channel4 (2022): Architected complete MarTech Analytics Platform integrating multi-source advertising data into unified Star Schema warehouse.

Sky UK (2022-2025): Built game-changing Code Generator Framework that automated entire data pipeline creation process, transforming team velocity and enabling large-scale infrastructure management through simple JSON configurations.

Marks & Spencer (2025-Present): Building customer loyalty programme analytics platform using Azure Databricks and Streamlit, transforming millions of transactions into actionable business insights.

What drives me: I don't just solve problems—I build tools that prevent entire classes of problems from existing. Every role has been about pushing the boundaries of what's possible through automation, innovation, and intelligent tooling.