Resume

My professional journey and expertise

Vijay Anand Pandian

Vijay Anand Pandian

Senior Data Engineer | Loyalty Analytics | Customer Insights

London, United Kingdom

Skills

PythonScalaAzureDatabricksStreamlitAWSGCPTerraformKafkaBigDataAI/MLDevOps

Professional Summary

Senior Data Engineer with 10+ years of experience architecting scalable data platforms, building customer analytics solutions, and delivering cloud-native applications. Currently at Marks & Spencer working on loyalty programme analytics using Azure Databricks and building interactive Streamlit applications. Proven track record of transforming team productivity through innovative tooling—including a Python Code Generator that reduced pipeline development time from days to minutes. Expert in end-to-end data engineering, stream processing, cloud architecture, and customer analytics with a strong bias for automation and business impact.

Core Competencies

  • Customer Analytics: Loyalty programme analytics, customer segmentation, behavioral insights
  • Platform Engineering: Azure Databricks, Streamlit applications, self-service analytics platforms
  • Data Infrastructure: Real-time streaming (Kafka), batch processing (Spark), data warehousing (BigQuery, Redshift, Snowflake)
  • Cloud Architecture: Multi-cloud expertise (Azure, AWS, GCP), Infrastructure-as-Code (Terraform, CloudFormation)
  • Innovation & R&D: Rapid technology adoption, POC development, automation frameworks

Technical Expertise

Languages & Frameworks

Python, Scala, Java, Bash, SQL, C#

Streamlit, Django, Flask, React, Spring Boot

Cloud Platforms

Azure (Databricks, Blob Storage, Data Factory, Synapse)

AWS (EC2, Lambda, Fargate, EMR, S3, Redshift, DynamoDB)

GCP (BigQuery, Composer, GCS, Pub/Sub)

Data Engineering

Azure Databricks, Apache Kafka, Spark, Airflow

DBT, kSQL, Kafka Connect, Sqoop

Data Warehousing

BigQuery, Snowflake, Redshift, Hive

Star Schema, Dimensional Modeling

Infrastructure & DevOps

Terraform, Docker, Kubernetes

Jenkins, GitHub Actions, Bamboo

New Relic, Splunk, CloudWatch

AI/ML & Analytics

TensorFlow, Scikit-learn, NLTK

NLP, Sentiment Analysis, LLMs

Elasticsearch, Kibana

Certifications & Publications

Apache Airflow Fundamentals

Apache Airflow DAG Authoring

Certified Scrum Product Owner (CSPO)

Published Research: "A Novel Cloud Based NIDPS for Smartphones" – International Conference on Security in Computer Networks and Distributed Systems (SNDS'14), Springer CCIS Series, Volume 420

Professional Experience

Senior Data Engineer | Marks and Spencer

May 2025 – Present | London, United Kingdom

Key Achievement: Building customer loyalty analytics platform using Azure Databricks and Streamlit, transforming millions of customer transactions into actionable insights that drive personalization and retention strategies.

Loyalty Programme Analytics

  • Architecting end-to-end analytics platform for M&S loyalty programme using Azure Databricks
  • Processing millions of customer transactions to generate insights on customer behavior, segmentation, and lifetime value
  • Building data pipelines that integrate point-of-sale, online transactions, and customer profile data
  • Implementing advanced analytics for customer churn prediction, personalized recommendations, and targeted campaigns

Interactive Analytics Applications

  • Developing Streamlit applications that democratize data access for business stakeholders
  • Creating self-service dashboards for marketing teams to explore customer segments and campaign performance
  • Building real-time visualization tools for loyalty metrics, redemption rates, and customer engagement
  • Impact: Empowering non-technical teams to make data-driven decisions without engineering support

Data Platform Engineering

  • Implementing medallion architecture (Bronze, Silver, Gold layers) in Azure Databricks for scalable data processing
  • Optimizing Spark jobs for cost efficiency and performance at scale
  • Establishing data quality frameworks ensuring accuracy in customer analytics

Technologies: Azure Databricks, Streamlit, PySpark, Python, Azure Blob Storage, Delta Lake, SQL, Power BI

Data Engineer | Sky UK

December 2022 – May 2025 | London, United Kingdom

Key Achievement: Architected and developed a Python Code Generator Framework that automated entire data pipeline creation, reducing development time from days to minutes and enabling the team to execute large-scale refactoring in record time.

Real-Time Analytics Platform

  • Engineered Kafka streaming pipeline connecting IBM SevOne NPM to GCP BigQuery for real-time network monitoring data ingestion
  • Implemented kSQL transformations and Kafka Connect for sub-second data availability
  • Built metadata extraction API service (Python) that processes device/object/indicator data to BigQuery via GCS

Innovation: Code Generator Framework

  • Developed JSON-to-Terraform code generator that creates complete pipeline infrastructure from simple configuration files
  • Enforced code consistency, naming standards, and automated documentation generation across 100+ pipelines
  • Enabled non-technical users to onboard new data sources without manual coding
  • Impact: 10x faster pipeline deployment, zero configuration drift, eliminated copy-paste errors

Reliability Engineering

  • Created Kafka Connect task monitoring tool with auto-restart capabilities and circuit-breaker pattern for persistent failures
  • Reduced pipeline downtime by 95% through proactive failure detection and automated recovery

Technologies: Apache Kafka, Apache Airflow, GCP BigQuery, Python, Terraform, Docker, Jinja2, IBM SevOne NPM

AWS Big Data Consultant (Contractor) | Channel4

May 2022 – December 2022 | London, United Kingdom

Key Achievement: Architected and delivered end-to-end MarTech Analytics Platform integrating data from Facebook, Instagram, Google Ads, Snapchat, and YouTube into a unified Star Schema data warehouse.

  • Designed Star Schema data model in AWS Redshift with dimension and fact tables for multi-channel marketing analytics
  • Built ETL pipelines using AWS Glue, Lambda, and SNS for daily API data extraction and transformation
  • Implemented incremental data loading strategy reducing processing time by 70% and costs by 50%
  • Integrated Snowflake for advanced analytics and cross-platform reporting
  • Created automated data quality checks ensuring 99.9% accuracy in marketing metrics

Technologies: AWS (Redshift, Glue, Lambda, Fargate, S3, Aurora), Snowflake, Terraform, Python, Scala, Spark

Senior Associate Manager | Eli Lilly and Company

May 2021 – April 2022 | Bangalore, India

Key Achievement: Led cloud modernization initiative for Clinical Supply Platform, migrating legacy Fuse/Java application to AWS serverless architecture, reducing operational costs by 60%.

Application Modernization - eCSP

  • Architected serverless message transformation platform using AWS Lambda, Fargate, and Apache Camel
  • Designed event-driven architecture handling 1M+ messages/day with IBM MQ → AWS MQ → Lambda → DynamoDB flow
  • Implemented CI/CD pipeline using GitHub Actions, CodePipeline, and CloudFormation with pre-commit hooks
  • Built comprehensive XML validation framework using Python (xmltodict, xmlschema) with 100% test coverage (pytest)

ETL & Data Transformation

  • Built Azure Databricks ETL pipelines processing multi-source data (SQL, Snowflake, JSON) with PySpark
  • Implemented DBT transformation layer ensuring data quality and business logic consistency
  • Automated cluster profile switching via Databricks API for cost optimization (40% reduction in compute costs)
  • Established data quality framework with automated integrity checks and anomaly detection

Technologies: AWS (Lambda, Fargate, MQ, DynamoDB, S3, Step Functions), Azure Databricks, Apache Camel, IBM MQ, DBT, Snowflake, Python, Terraform

Software Engineer | United Health Group – Optum

July 2019 – April 2021 | Chennai, India

Key Achievement: Served as Product Owner & Full Stack Engineer for Initiative Manager web application while simultaneously delivering Oracle-to-Spark migration for healthcare data processing.

Initiative Manager Web Application

  • Led full-stack development using Python/Django backend and React frontend serving 500+ internal users
  • Managed AWS infrastructure (EC2, CloudFormation, SNS, Alarms) with 99.95% uptime SLA
  • Implemented CI/CD pipeline using Bamboo, reducing deployment time from 2 hours to 15 minutes
  • Established monitoring and observability using New Relic, Splunk, and Sonar for proactive issue detection

Data Migration & Replication

  • Architected Oracle-to-Data Lake migration strategy processing 5TB+ of healthcare data using Spark and Scala
  • Created UDFs for complex Oracle procedures/functions, achieving 100% functional parity in Spark
  • Built data replication framework for continuous sync between on-prem Oracle and AWS S3 Data Lake
  • Developed JSON-to-flat table transformation pipelines for downstream analytics consumption
  • Designed Low-Level Design (LLD) documentation aligned with enterprise architecture standards

Technologies: Python, Django, React, Postgres, Spark, Scala, AWS (EC2, S3, EMR), Bash, Jenkins, Bamboo, New Relic, Splunk

Data Analytics Engineer | CEB (acquired by Gartner)

August 2015 – July 2019 | Chennai, India

*Corporate Executive Board (CEB) acquired by Gartner in 2017

Key Achievement: Pioneered multiple AI/ML-powered analytics products including Email Bot, Competitors Dashboard, and Next Best Action Recommendation Engine, directly impacting sales and marketing efficiency.

Email Analytics Bot (NLP-Powered)

  • Architected NLP-based email analytics service processing 10K+ emails daily from Sales & Marketing teams
  • Implemented text mining pipeline (NLTK) extracting company names, contact info, and sentiment from email threads
  • Built automated reporting system generating daily/weekly/monthly insights (OOO notifications, leads, invalid addresses)
  • Impact: Identified 200+ qualified leads/month, reduced manual email processing by 90%

Competitors Intelligence Dashboard

  • Developed web scraping framework (Scrapy, Python) harvesting competitor data from 50+ sources
  • Created Elasticsearch + Kibana dashboard visualizing events, locations, sponsors, and market trends
  • Orchestrated Airflow workflows for daily data refresh and automated alerts
  • Integrated Google Maps API for geospatial analysis of competitor activities

Next Best Action Recommendation Model

  • Collaborated with data scientists to deliver end-to-end ML pipeline for subscriber recommendation engine
  • Built ETL framework extracting features from Hive, transforming with PySpark, and serving via Elasticsearch
  • Deployed TensorFlow model on AWS with auto-scaling, handling 100K+ predictions/day
  • Transitioned data cleansing processes from local to distributed big data environment

Additional Projects

  • ETL Service: BigQuery → Hive migration service for Google Analytics data (scheduled daily processing)
  • Data Cleansing Tool: ML-based classifier (Naive Bayes, Decision Tree) for automated data munging
  • Hackathon Winner: Honorable Mention for Android Survey App with QR code integration and real-time analytics

Technologies: Python, TensorFlow, NLTK, Scrapy, Apache Airflow, Hive, PySpark, Sqoop, Elasticsearch, Kibana, MongoDB, GCP BigQuery, AWS

Software Engineer (Automation) | Symantec – Norton

January 2014 – August 2015 | Chennai, India

Key Achievement: Built comprehensive Python automation framework for Norton AntiVirus Mac security testing, reducing manual testing cycles from weeks to days. Submitted 3 patent ideas to Patent Filter Committee.

Security Testing Automation

  • Developed BAT (Build Acceptance Testing) automation framework for Norton AntiVirus and Symantec Cloud Security
  • Automated security scenarios: port scanning, firewall validation, brute force detection, malicious URL prevention
  • Implemented license validation automation ensuring server-client parameter accuracy across network conditions
  • Built telemetry ping validation tool capturing and validating HTTP packet transmissions to servers
  • Created scheduled scan automation (daily/weekly/monthly) with virus entry log verification

Framework Development

  • Contributed to core library development: logger, core-utils, and testing wrappers adopted across teams
  • Recognized by leadership for completing license test automation ahead of schedule

Security Analyst (Master's Internship)

  • Worked with Managed Security Services (MSS) team on SIEM tools (HP ArcSight, RSA Security Analytics)
  • Monitored security events from IDS/IPS, firewalls, and endpoint protection for threat detection
  • Simulated real-time system logs using Python and Bash scripting for security testing
  • Delivered weekly/monthly security reports highlighting vulnerabilities and triggered signatures
  • Built Proof of Concept using Apache Kafka (Java) for enhanced security log processing

Technologies: Python, Bash, Unix/Mac OS, Jenkins, HP ArcSight, RSA Security Analytics, Apache Kafka, Java

Education

Amrita Vishwa Vidyapeetham

Master of Technology in Cyber Security

2014 | Coimbatore, Tamil Nadu, India

Focus: Cloud Security, Network Intrusion Detection, SIEM Technologies

University College of Engineering Tindivanam

Bachelor of Technology in Information Technology

2012 | Tindivanam, Tamil Nadu, India

Focus: Software Engineering, Database Systems, Distributed Computing

Key Achievements & Recognition

🏆

Innovation Award - Sky UK (February 2025)

Received Sky Star Award for creative innovation in automation and code generation

📝

Patent Submissions

3 patent ideas submitted to Symantec Patent Filter Committee for innovative software solutions

🎤

Published Research Paper

"A Novel Cloud Based NIDPS for Smartphones" - SNDS'14, Springer CCIS (March 2014)

🥈

Hackathon Honorable Mention - Gartner (2016)

Android Survey App with QR code integration and real-time analytics dashboard

Leadership Recognition - Symantec

Completed all license test case automation ahead of schedule, recognized by senior leadership

Career Highlights

From Manual QA to Platform Engineering: My career is defined by a relentless drive to eliminate repetitive work through automation and innovation.

Symantec (2014-2015): Started as a Security Analyst intern working with SIEM tools, quickly transitioned to building Python automation frameworks that the entire Norton team adopted. This is where I discovered my passion for building tools that multiply team productivity.

CEB/Gartner (2015-2019): Evolved into a full-stack data engineer at Corporate Executive Board (acquired by Gartner in 2017), creating AI/ML-powered products including an Email Bot (NLP), Competitors Dashboard (web scraping + Elasticsearch), and contributing to a Next Best Action Recommendation Engine. Won hackathon recognition for innovative mobile survey application.

Optum (2019-2021): Took on dual role as Product Owner and Full Stack Engineer for Initiative Manager web application while simultaneously leading Oracle-to-Spark data migration processing terabytes of healthcare data.

Eli Lilly (2021-2022): Led cloud modernization initiatives, migrating legacy systems to AWS serverless architecture while building robust ETL pipelines in Azure Databricks.

Channel4 (2022): Architected complete MarTech Analytics Platform integrating multi-source advertising data into unified Star Schema warehouse.

Sky UK (2022-2025): Built game-changing Code Generator Framework that automated entire data pipeline creation process, transforming team velocity and enabling large-scale infrastructure management through simple JSON configurations.

Marks & Spencer (2025-Present): Building customer loyalty programme analytics platform using Azure Databricks and Streamlit, transforming millions of transactions into actionable business insights.


What drives me: I don't just solve problems—I build tools that prevent entire classes of problems from existing. Every role has been about pushing the boundaries of what's possible through automation, innovation, and intelligent tooling.