Resume
My professional journey and expertise
Vijay Anand Pandian
Senior Data Engineer | Loyalty Analytics | Customer Insights
London, United Kingdom
Skills
Professional Summary
Senior Data Engineer with 10+ years of experience architecting scalable data platforms, building customer analytics solutions, and delivering cloud-native applications. Currently at Marks & Spencer working on loyalty programme analytics using Azure Databricks and building interactive Streamlit applications. Proven track record of transforming team productivity through innovative tooling—including a Python Code Generator that reduced pipeline development time from days to minutes. Expert in end-to-end data engineering, stream processing, cloud architecture, and customer analytics with a strong bias for automation and business impact.
Core Competencies
- Customer Analytics: Loyalty programme analytics, customer segmentation, behavioral insights
- Platform Engineering: Azure Databricks, Streamlit applications, self-service analytics platforms
- Data Infrastructure: Real-time streaming (Kafka), batch processing (Spark), data warehousing (BigQuery, Redshift, Snowflake)
- Cloud Architecture: Multi-cloud expertise (Azure, AWS, GCP), Infrastructure-as-Code (Terraform, CloudFormation)
- Innovation & R&D: Rapid technology adoption, POC development, automation frameworks
Technical Expertise
Languages & Frameworks
Python, Scala, Java, Bash, SQL, C#
Streamlit, Django, Flask, React, Spring Boot
Cloud Platforms
Azure (Databricks, Blob Storage, Data Factory, Synapse)
AWS (EC2, Lambda, Fargate, EMR, S3, Redshift, DynamoDB)
GCP (BigQuery, Composer, GCS, Pub/Sub)
Data Engineering
Azure Databricks, Apache Kafka, Spark, Airflow
DBT, kSQL, Kafka Connect, Sqoop
Data Warehousing
BigQuery, Snowflake, Redshift, Hive
Star Schema, Dimensional Modeling
Infrastructure & DevOps
Terraform, Docker, Kubernetes
Jenkins, GitHub Actions, Bamboo
New Relic, Splunk, CloudWatch
AI/ML & Analytics
TensorFlow, Scikit-learn, NLTK
NLP, Sentiment Analysis, LLMs
Elasticsearch, Kibana
Certifications & Publications
Apache Airflow Fundamentals
Apache Airflow DAG Authoring
Certified Scrum Product Owner (CSPO)
Published Research: "A Novel Cloud Based NIDPS for Smartphones" – International Conference on Security in Computer Networks and Distributed Systems (SNDS'14), Springer CCIS Series, Volume 420
Professional Experience
Senior Data Engineer | Marks and Spencer
May 2025 – Present | London, United Kingdom
Key Achievement: Building customer loyalty analytics platform using Azure Databricks and Streamlit, transforming millions of customer transactions into actionable insights that drive personalization and retention strategies.
Loyalty Programme Analytics
- Architecting end-to-end analytics platform for M&S loyalty programme using Azure Databricks
- Processing millions of customer transactions to generate insights on customer behavior, segmentation, and lifetime value
- Building data pipelines that integrate point-of-sale, online transactions, and customer profile data
- Implementing advanced analytics for customer churn prediction, personalized recommendations, and targeted campaigns
Interactive Analytics Applications
- Developing Streamlit applications that democratize data access for business stakeholders
- Creating self-service dashboards for marketing teams to explore customer segments and campaign performance
- Building real-time visualization tools for loyalty metrics, redemption rates, and customer engagement
- Impact: Empowering non-technical teams to make data-driven decisions without engineering support
Data Platform Engineering
- Implementing medallion architecture (Bronze, Silver, Gold layers) in Azure Databricks for scalable data processing
- Optimizing Spark jobs for cost efficiency and performance at scale
- Establishing data quality frameworks ensuring accuracy in customer analytics
Technologies: Azure Databricks, Streamlit, PySpark, Python, Azure Blob Storage, Delta Lake, SQL, Power BI
Data Engineer | Sky UK
December 2022 – May 2025 | London, United Kingdom
Key Achievement: Architected and developed a Python Code Generator Framework that automated entire data pipeline creation, reducing development time from days to minutes and enabling the team to execute large-scale refactoring in record time.
Real-Time Analytics Platform
- Engineered Kafka streaming pipeline connecting IBM SevOne NPM to GCP BigQuery for real-time network monitoring data ingestion
- Implemented kSQL transformations and Kafka Connect for sub-second data availability
- Built metadata extraction API service (Python) that processes device/object/indicator data to BigQuery via GCS
Innovation: Code Generator Framework
- Developed JSON-to-Terraform code generator that creates complete pipeline infrastructure from simple configuration files
- Enforced code consistency, naming standards, and automated documentation generation across 100+ pipelines
- Enabled non-technical users to onboard new data sources without manual coding
- Impact: 10x faster pipeline deployment, zero configuration drift, eliminated copy-paste errors
Reliability Engineering
- Created Kafka Connect task monitoring tool with auto-restart capabilities and circuit-breaker pattern for persistent failures
- Reduced pipeline downtime by 95% through proactive failure detection and automated recovery
Technologies: Apache Kafka, Apache Airflow, GCP BigQuery, Python, Terraform, Docker, Jinja2, IBM SevOne NPM
AWS Big Data Consultant (Contractor) | Channel4
May 2022 – December 2022 | London, United Kingdom
Key Achievement: Architected and delivered end-to-end MarTech Analytics Platform integrating data from Facebook, Instagram, Google Ads, Snapchat, and YouTube into a unified Star Schema data warehouse.
- Designed Star Schema data model in AWS Redshift with dimension and fact tables for multi-channel marketing analytics
- Built ETL pipelines using AWS Glue, Lambda, and SNS for daily API data extraction and transformation
- Implemented incremental data loading strategy reducing processing time by 70% and costs by 50%
- Integrated Snowflake for advanced analytics and cross-platform reporting
- Created automated data quality checks ensuring 99.9% accuracy in marketing metrics
Technologies: AWS (Redshift, Glue, Lambda, Fargate, S3, Aurora), Snowflake, Terraform, Python, Scala, Spark
Senior Associate Manager | Eli Lilly and Company
May 2021 – April 2022 | Bangalore, India
Key Achievement: Led cloud modernization initiative for Clinical Supply Platform, migrating legacy Fuse/Java application to AWS serverless architecture, reducing operational costs by 60%.
Application Modernization - eCSP
- Architected serverless message transformation platform using AWS Lambda, Fargate, and Apache Camel
- Designed event-driven architecture handling 1M+ messages/day with IBM MQ → AWS MQ → Lambda → DynamoDB flow
- Implemented CI/CD pipeline using GitHub Actions, CodePipeline, and CloudFormation with pre-commit hooks
- Built comprehensive XML validation framework using Python (xmltodict, xmlschema) with 100% test coverage (pytest)
ETL & Data Transformation
- Built Azure Databricks ETL pipelines processing multi-source data (SQL, Snowflake, JSON) with PySpark
- Implemented DBT transformation layer ensuring data quality and business logic consistency
- Automated cluster profile switching via Databricks API for cost optimization (40% reduction in compute costs)
- Established data quality framework with automated integrity checks and anomaly detection
Technologies: AWS (Lambda, Fargate, MQ, DynamoDB, S3, Step Functions), Azure Databricks, Apache Camel, IBM MQ, DBT, Snowflake, Python, Terraform
Software Engineer | United Health Group – Optum
July 2019 – April 2021 | Chennai, India
Key Achievement: Served as Product Owner & Full Stack Engineer for Initiative Manager web application while simultaneously delivering Oracle-to-Spark migration for healthcare data processing.
Initiative Manager Web Application
- Led full-stack development using Python/Django backend and React frontend serving 500+ internal users
- Managed AWS infrastructure (EC2, CloudFormation, SNS, Alarms) with 99.95% uptime SLA
- Implemented CI/CD pipeline using Bamboo, reducing deployment time from 2 hours to 15 minutes
- Established monitoring and observability using New Relic, Splunk, and Sonar for proactive issue detection
Data Migration & Replication
- Architected Oracle-to-Data Lake migration strategy processing 5TB+ of healthcare data using Spark and Scala
- Created UDFs for complex Oracle procedures/functions, achieving 100% functional parity in Spark
- Built data replication framework for continuous sync between on-prem Oracle and AWS S3 Data Lake
- Developed JSON-to-flat table transformation pipelines for downstream analytics consumption
- Designed Low-Level Design (LLD) documentation aligned with enterprise architecture standards
Technologies: Python, Django, React, Postgres, Spark, Scala, AWS (EC2, S3, EMR), Bash, Jenkins, Bamboo, New Relic, Splunk
Data Analytics Engineer | CEB (acquired by Gartner)
August 2015 – July 2019 | Chennai, India
*Corporate Executive Board (CEB) acquired by Gartner in 2017
Key Achievement: Pioneered multiple AI/ML-powered analytics products including Email Bot, Competitors Dashboard, and Next Best Action Recommendation Engine, directly impacting sales and marketing efficiency.
Email Analytics Bot (NLP-Powered)
- Architected NLP-based email analytics service processing 10K+ emails daily from Sales & Marketing teams
- Implemented text mining pipeline (NLTK) extracting company names, contact info, and sentiment from email threads
- Built automated reporting system generating daily/weekly/monthly insights (OOO notifications, leads, invalid addresses)
- Impact: Identified 200+ qualified leads/month, reduced manual email processing by 90%
Competitors Intelligence Dashboard
- Developed web scraping framework (Scrapy, Python) harvesting competitor data from 50+ sources
- Created Elasticsearch + Kibana dashboard visualizing events, locations, sponsors, and market trends
- Orchestrated Airflow workflows for daily data refresh and automated alerts
- Integrated Google Maps API for geospatial analysis of competitor activities
Next Best Action Recommendation Model
- Collaborated with data scientists to deliver end-to-end ML pipeline for subscriber recommendation engine
- Built ETL framework extracting features from Hive, transforming with PySpark, and serving via Elasticsearch
- Deployed TensorFlow model on AWS with auto-scaling, handling 100K+ predictions/day
- Transitioned data cleansing processes from local to distributed big data environment
Additional Projects
- ETL Service: BigQuery → Hive migration service for Google Analytics data (scheduled daily processing)
- Data Cleansing Tool: ML-based classifier (Naive Bayes, Decision Tree) for automated data munging
- Hackathon Winner: Honorable Mention for Android Survey App with QR code integration and real-time analytics
Technologies: Python, TensorFlow, NLTK, Scrapy, Apache Airflow, Hive, PySpark, Sqoop, Elasticsearch, Kibana, MongoDB, GCP BigQuery, AWS
Software Engineer (Automation) | Symantec – Norton
January 2014 – August 2015 | Chennai, India
Key Achievement: Built comprehensive Python automation framework for Norton AntiVirus Mac security testing, reducing manual testing cycles from weeks to days. Submitted 3 patent ideas to Patent Filter Committee.
Security Testing Automation
- Developed BAT (Build Acceptance Testing) automation framework for Norton AntiVirus and Symantec Cloud Security
- Automated security scenarios: port scanning, firewall validation, brute force detection, malicious URL prevention
- Implemented license validation automation ensuring server-client parameter accuracy across network conditions
- Built telemetry ping validation tool capturing and validating HTTP packet transmissions to servers
- Created scheduled scan automation (daily/weekly/monthly) with virus entry log verification
Framework Development
- Contributed to core library development: logger, core-utils, and testing wrappers adopted across teams
- Recognized by leadership for completing license test automation ahead of schedule
Security Analyst (Master's Internship)
- Worked with Managed Security Services (MSS) team on SIEM tools (HP ArcSight, RSA Security Analytics)
- Monitored security events from IDS/IPS, firewalls, and endpoint protection for threat detection
- Simulated real-time system logs using Python and Bash scripting for security testing
- Delivered weekly/monthly security reports highlighting vulnerabilities and triggered signatures
- Built Proof of Concept using Apache Kafka (Java) for enhanced security log processing
Technologies: Python, Bash, Unix/Mac OS, Jenkins, HP ArcSight, RSA Security Analytics, Apache Kafka, Java
Education
Amrita Vishwa Vidyapeetham
Master of Technology in Cyber Security
2014 | Coimbatore, Tamil Nadu, India
Focus: Cloud Security, Network Intrusion Detection, SIEM Technologies
University College of Engineering Tindivanam
Bachelor of Technology in Information Technology
2012 | Tindivanam, Tamil Nadu, India
Focus: Software Engineering, Database Systems, Distributed Computing
Key Achievements & Recognition
Innovation Award - Sky UK (February 2025)
Received Sky Star Award for creative innovation in automation and code generation
Patent Submissions
3 patent ideas submitted to Symantec Patent Filter Committee for innovative software solutions
Published Research Paper
"A Novel Cloud Based NIDPS for Smartphones" - SNDS'14, Springer CCIS (March 2014)
Hackathon Honorable Mention - Gartner (2016)
Android Survey App with QR code integration and real-time analytics dashboard
Leadership Recognition - Symantec
Completed all license test case automation ahead of schedule, recognized by senior leadership
Career Highlights
From Manual QA to Platform Engineering: My career is defined by a relentless drive to eliminate repetitive work through automation and innovation.
Symantec (2014-2015): Started as a Security Analyst intern working with SIEM tools, quickly transitioned to building Python automation frameworks that the entire Norton team adopted. This is where I discovered my passion for building tools that multiply team productivity.
CEB/Gartner (2015-2019): Evolved into a full-stack data engineer at Corporate Executive Board (acquired by Gartner in 2017), creating AI/ML-powered products including an Email Bot (NLP), Competitors Dashboard (web scraping + Elasticsearch), and contributing to a Next Best Action Recommendation Engine. Won hackathon recognition for innovative mobile survey application.
Optum (2019-2021): Took on dual role as Product Owner and Full Stack Engineer for Initiative Manager web application while simultaneously leading Oracle-to-Spark data migration processing terabytes of healthcare data.
Eli Lilly (2021-2022): Led cloud modernization initiatives, migrating legacy systems to AWS serverless architecture while building robust ETL pipelines in Azure Databricks.
Channel4 (2022): Architected complete MarTech Analytics Platform integrating multi-source advertising data into unified Star Schema warehouse.
Sky UK (2022-2025): Built game-changing Code Generator Framework that automated entire data pipeline creation process, transforming team velocity and enabling large-scale infrastructure management through simple JSON configurations.
Marks & Spencer (2025-Present): Building customer loyalty programme analytics platform using Azure Databricks and Streamlit, transforming millions of transactions into actionable business insights.
What drives me: I don't just solve problems—I build tools that prevent entire classes of problems from existing. Every role has been about pushing the boundaries of what's possible through automation, innovation, and intelligent tooling.