Resume
Alan Liang
Staff Platform Engineer
Specialising in AWS, DevOps, and Kubernetes solutions
Contact Information
Contact Information
Professional Summary
Staff Platform Engineer with six years experience in cloud infrastructure, DevOps practices, and enterprise-scale platform engineering. Specialised in AWS services, Kubernetes orchestration, and building developer-centric platforms that improve productivity and reduce operational overhead.
Professional Experience
Staff Platform Engineer, Network Monitoring Observability
Contributed to the development of the bank's Network Monitoring Observability (NMO) platform; a cloud-native, containerised, and scalable tech stack using Grafana Mimir, Telegraf, and various exporters.
Key Responsibilities & Achievements:
- Led the design and implementation of a greenfield NetFlow ingestion and observability feature, expanding the NMO platform to provide new network flow analysis capabilities.
- Designed the end-to-end data pipeline, using Telegraf to collect NetFlow data, stream it to a data lake (S3), and leverage ClickHouse for high-performance analysis.
- Architected the infrastructure and deployment pipeline on AWS, Kubernetes, utilising GitOps principles with Argo CD to manage all containerised services.
- Engineered and optimised the ClickHouse database with advanced techniques to achieve sub-second query response times on terabytes of NetFlow data.
- Built CI/CD pipelines for Infrastructure as Code (IaC) using GitHub Actions, reducing manual operations by 1 hour per week.
- Developed a helm template and diff check pipeline to generate raw Kubernetes manifests and validate changes to the raw resources before deploying to clusters. This reduced the number of unexpected changes and failed deployments.
- Integrated an MS Teams CoPilot AI chatbot called ChatIT with product documentation to enhance the tenant support and user experience.
- Conducted two knowledge sharing sessions in my first three months in the team.
- Mentoring for two engineers in the group outside of my domain; API Gateway/Risk.
Impact & Results:
- Improved observability of Network Devices using NetFlow significantly reducing MTTR.
Senior Platform Engineer, Public Cloud Container Services
Leading enterprise-scale platform engineering initiatives, focusing on AWS migration, Kubernetes platform development, and DevOps transformation.
Key Responsibilities & Achievements:
- Built and managed 100+ production-grade EKS clusters, including patching, vulnerability management, upgrades, and support.
- Engineered and scaled a monitoring stack across 100+ EKS clusters, ensuring 99.95% uptime SLA by implementing highly available Prometheus with a centralized Thanos/AMP backend.
- Enhanced developer experience by releasing self-service Argo CD deployments and accelerating image retrieval by up to 3 hours through migration to ECR Pull Through Cache.
- Led key platform improvements, including PSP migration, implementation of Kyverno security guardrails, and delivering a secure EFS Cross-Account feature for critical clients.
- Championed customer success by resolving 88 tenant queries, developing an EFS Cross Account feature, building a feature request dashboard, and leading a documentation uplift to empower users with self-service capabilities.
Impact & Results:
- Achieved significant AWS cost reductions, including up to 58.33% weekly savings by integrating an auto-shutdown solution for EKS nodes and pioneering the use of Spot and Graviton instances.
- Ensured 99.95% uptime SLA for 100+ EKS clusters.
- Accelerated image retrieval by up to 3 hours through migration to ECR Pull Through Cache.
- Resolved 88 tenant queries.
Systems Engineer, Public Cloud Container Services
Focused on building a scalable Kubernetes platform for the group.
Key Responsibilities & Achievements:
- Implemented Infrastructure as Code practices using Terraform and CloudFormation
- Built automated deployment pipelines reducing manual deployment effort by 80%
- Established monitoring and alerting systems for production applications
Impact & Results:
- Improved deployment frequency from monthly to daily releases
- Reduced infrastructure provisioning time from days to hours
Cloud Support Engineer (II), Containers
Focused on providing world class support for large enterprise organisations using containerised platforms at scale at AWS such as EKS and ECS.
Key Responsibilities & Achievements:
- Assisted customers with a wide range of container-related challenges across AWS services like EKS, ECS, and ECR.
- Advised customers on container technologies, offering solution recommendations and architectural guidance for deploying applications on AWS.
- Created proof of concepts and replicated customer issues to accelerate resolution and analysis.
- Provided timely and effective support through various channels (phone, live-chat, email), addressing technical challenges related to containers and CI/CD.
- Participated in the recruitment process by interviewing candidates for Cloud Support Engineer roles.
- Developed a Python-based tool to automate the collection and analysis of employee performance data, providing valuable insights.
Impact & Results:
- Automated the collection and analysis of employee performance data using a Python-based tool, providing valuable insights. Reducing the time required to analyse team performance and backlog from days to minutes.
Integration Engineer, Business Support Systems
Focused on building Ericsson's world first BSS system built ontop of Kubernetes and integrating it into Telstra's ecosystem.
Key Responsibilities & Achievements:
- Leveraged Kubernetes, Docker, and agile methodologies to ensure efficient and reliable releases on cloud platforms.
- Automated testing and deployment processes, streamlining software delivery.
- Proactively identified and resolved operational issues and critical defects, minimising downtime.
- Enhanced Ericsson product functionality and integration by developing REST APIs using Velocity Javascript.
- Created Python and Ansible scripts to automate tasks like backup/restore, infrastructure management, and environment monitoring.
- Worked with customers and internal teams on production deployments, minimising disruptions and conducting knowledge transfer sessions.
- Contributed to the design and optimisation of environment clusters to ensure high availability and performance.
Impact & Results:
- Successfully delivered the project within commitment dates without needing to delay the delivery.
Technical Skills
Cloud Platforms
AWS
EC2, EKS, RDS, Lambda, CloudFormation, IAM, VPC, Route53
Azure
AKS, Azure DevOps, ARM Templates
Google Cloud
GKE, Cloud Build, Cloud Functions
Container Orchestration
Kubernetes
Cluster management, RBAC, networking, storage, operators
Docker
Containerization, multi-stage builds, registry management
Amazon ECS
Task definitions, service management, auto-scaling
Infrastructure as Code
Terraform
Module development, state management, enterprise patterns
CloudFormation
Template development, stack management
Crossplane
Infrastructure management via the Kubernetes API
CI/CD & GitOps
GitHub Actions
Workflow automation, custom actions, enterprise deployment
Argo CD
GitOps deployment, application management, multi-cluster setup
AWS CodeBuild
Pipeline development, custom build environments, enterprise integration
Monitoring & Observability
Prometheus/Thanos/AlertManager
Metrics collection, alerting rules, service discovery
Grafana/Mimir
Dashboard development, data source integration
FluentBit/Observe/Splunk
APM, infrastructure monitoring, log management
Telegraf
Custom metrics collection, custom processors, data lakes
Programming Languages
Python
Automation scripts, API development, data processing
Go
CLI tools, microservices, Kubernetes operators
Bash
System administration, deployment scripts
Education
Bachelor of Engineering (Honours) & Bachelor of Information Technology (IT)
- Relevant coursework in electrical engineering, software engineering, and computer networks
- Final year thesis on building a LLM for vehicle detection
Certifications
AWS Certified Solutions Architect - Associate
Certified Kubernetes Application Developer (CKAD)
Certified Kubernetes Administrator (CKA)
Certified Kubernetes and Cloud Native Associate (KCNA)
Projects & Contributions
bAIwatch: Real Time Car Park Occupancy System
Active contributor to Kubernetes ecosystem tools and Terraform modules, focusing on platform engineering solutions and developer productivity improvements.
Technical Writing
Regular blog posts on platform engineering, AWS, and Kubernetes best practices, sharing knowledge and experiences with the broader tech community.
Community Involvement
Speaker at local DevOps meetups and cloud computing events, contributing to knowledge sharing and professional development in the community.