Joan Marc Riera Duocastella - Infrastructure and Platform Engineering Manager

About

Technical Program Manager with 15+ years leading large-scale infrastructure programs across bioinformatics, enterprise IT, and supercomputing environments. Currently at EMBL-EBI managing 120+PB storage infrastructure growing at 20PB annually, with proven track record of delivering multi-million euro projects from concept to production.

Based in the UK with roots in Catalonia, I specialise in:

Petabyte-scale storage systems - Led 6-year program scaling FIRE archive from prototype to 120+PB production (growing 20PB/year), achieving 99.9% uptime during pandemic while other services experienced weeks of downtime
Network infrastructure modernization - Drove EBI network evolution from 20Gbps to 100Gbps with JISC, introduced NIC bonding, migrated from 1Gbps to 2x25Gbps bonded links
Infrastructure automation - Led multi-year transformation of unreliable transfer platform from daily incidents to 99.9% uptime through cross-team collaboration, achieving first sustained week of uptime within initial months (previously impossible for years)
Cross-functional program leadership - Led institute-wide IAM transformation (15+ departments, 2,000+ users), tape tender/migration (consolidated 4 LTO drive types, retired Spectralogic, saved millions), data centre migrations using VLAN stretching
Enterprise monitoring systems - Deployed distributed CheckMK across 3,500+ servers with 100,000+ monitoring points, reducing incident detection time by 60%

I excel at translating complex technical requirements into measurable business outcomes, managing stakeholder expectations across technical and executive audiences, and delivering mission-critical infrastructure that directly enables groundbreaking scientific research impacting millions worldwide.

120+ Petabytes Managed

15+ Years Experience

3,500+ Servers Monitored

100K+ Monitoring Points

Professional Experience

2016 - Present

SDO Service & Data Technical Coordinator

EMBL-EBI (European Bioinformatics Institute) • 9 Years

Lead cross-functional technical programs delivering mission-critical infrastructure for Europe's leading bioinformatics research institute, managing €15M+ in infrastructure investments.

Program Leadership: Coordinate 5+ concurrent infrastructure programs across storage, networking, monitoring, and security teams (20+ engineers)
Stakeholder Management: Present quarterly roadmaps to executive leadership and external funding bodies, securing continued investment
Risk Management: Reduced system downtime from daily incidents to 99.9% uptime through proactive monitoring and automation programs
Budget Management: Delivered all major projects on-time and within budget, achieving 40% cost reduction in per-TB storage costs
Process Improvement: Introduced ITIL frameworks and centralized monitoring (CheckMK) across 1,000+ hosts, improving incident response time by 60%

2010 - 2016

Senior Systems Engineer & Team Lead

Atos | Bull • Enterprise IT & Supercomputing

Led technical teams delivering infrastructure services to major clients including Barcelona Supercomputing Centre, while advancing through progressive leadership roles recognized through "Talent Pool" selection at Atos and "Top Contributor" award at Bull.

Client Delivery: Managed infrastructure for 10+ enterprise clients (Quadis, Arag, Estevanell Energia, TMB, Avertis, Barcelona SuperComputing Centre)
Team Leadership: Led engineering teams of 5-10 members across mixed environments (Linux, Windows, AIX, macOS)
Recognition: Selected for Atos "Talent Pool" program; awarded "Top Contributor" at Bull for exceptional technical delivery
Technical Excellence: Designed enterprise-scale solutions, mentored junior engineers, delivered projects on-time and within budget

2007 - 2010

Systems Engineer

Eurecat (Technology Centre of Catalonia)

Provided technical infrastructure support for research teams, managing heterogeneous IT environments and leading small engineering team.

Team Leadership: Led team of 3 engineers supporting research infrastructure
Multi-platform Expertise: Managed Linux, Windows, and macOS environments
Research Support: Enabled technical research through reliable infrastructure delivery

Program Impact Metrics

Key performance indicators across 15+ years of infrastructure program delivery

Infrastructure Scale

120PB

Current FIRE Size

Growing 20PB annually

3,500+

Servers Monitored

Distributed CheckMK deployment

100K+

Monitoring Points

Services across infrastructure

Operational Excellence

99.9%

Uptime SLA

From ~80% baseline

85%

Manual Work Reduced

Through automation initiatives

60%

Faster MTTD

Mean Time To Detection improvement

Cost & Efficiency

40%

Cost Reduction

Per-TB storage costs

€2M+

Cost Avoidance

vs. commercial alternatives

1,500+

Hours Saved/Year

Process automation & efficiency

Team & Stakeholders

15+

Peak Team Size

Direct & indirect leadership at EBI

15+

Departments Coordinated

IAM transformation program

10+

Enterprise Clients

Atos/Bull period (BSC, TMB, others)

Technical Expertise

Infrastructure & Systems

Enterprise Linux Administration
High-Availability System Design
Storage Systems (15PB → 100PB+)
Data Centre Operations
Disaster Recovery Planning

Platform Engineering

Scalability Architecture
Automation & Orchestration
Container Technologies
DevOps Practices
Infrastructure as Code

Data Management

Petabyte-scale Storage
Data Archiving Solutions
Transfer Services
Erasure Coding
Geo-distributed Replication

Security & Access

Identity & Access Management
Multi-factor Authentication
Security Architecture
Compliance Management
Risk Assessment

Key Achievements

Petabyte-Scale Infrastructure Program

Impact: Led 5-year infrastructure scaling program from 15PB to 100+PB (600% growth), supporting 2,000+ scientists across 50+ countries. Reduced per-TB costs by 40% while maintaining 99.9% uptime SLA.

100 PB Milestone Article

Platform Automation Program

Impact: Led multi-year transformation of data-transfer platform from daily incidents to 99.9% uptime through cross-team collaboration and stakeholder engagement, reducing manual intervention by 85% and saving 500+ engineering hours annually. Initial breakthrough achieved first sustained week of uptime (previously impossible for years).

Technical Paper (IEEE MSST 2020)

Cross-Functional IAM Transformation

Impact: Coordinated 15+ departments (IT, HR, Security) to implement institute-wide IAM overhaul serving 2,000+ users. Automated joiner/leaver processes saving 500+ hours annually while improving security compliance by centralizing account lifecycle management.

Infrastructure Monitoring Initiative

Impact: Led deployment of distributed CheckMK monitoring across 1,000+ hosts and multiple data centers. Reduced mean time to detection (MTTD) by 60% and enabled proactive incident prevention, contributing to 99.9% uptime achievement.

FIRE Archive: Research to Production

Impact: Managed FIRE (File Replication) archive system from research prototype to production scale (100+PB). Delivered vendor-agnostic, geo-distributed replication serving 50+ countries, published IEEE paper, achieving €2M+ cost savings vs. commercial alternatives.

FIRE Archive Evolution Technical Paper (IEEE MSST 2020)

Data Centre Migration Program

Impact: Convinced network teams to adopt VLAN stretching for data centre migrations, revolutionizing migration approach. Eliminated need for coordination meetings (previously managed via Doodle polls), saving 1,000+ hours of cross-team coordination time.

Program Case Studies

Deep dives into major programs demonstrating end-to-end delivery, from requirements to measurable impact

FIRE Archive System: Research to Production

2017 - 2022 • 5 years

Challenge

EMBL-EBI needed a cost-effective, vendor-agnostic archive solution to replace expensive commercial tape systems while scaling to support exponential research data growth. System had to maintain highest reliability standards while managing petabyte-scale operations.

Approach

Led cross-functional team of 8 engineers to design erasure-coded, geo-distributed architecture
Managed stakeholder expectations across research teams, IT operations, funding bodies, and executive leadership
Implemented phased rollout: research prototype → pilot deployment → full production scale
Coordinated multi-site infrastructure buildout with network and data centre teams
Established operational procedures and monitoring for 24/7 production reliability

Impact & Results

120+PB current production capacity (growing 20PB/year)
99.9% uptime during COVID-19 pandemic (while other services had weeks of downtime)
€2M+ annual cost savings vs. commercial tape alternatives
50+ countries served globally
IEEE publication documenting technical architecture (MSST 2020)
Zero data loss incidents since production launch

Research to Production Cross-functional Leadership Risk Management Stakeholder Communication

Institute-Wide IAM Transformation

2022 - 2025 • 3 years

Challenge

Decentralized account management created security risks and operational inefficiency. Accounts created ad-hoc across departments with no systematic joiner/leaver process or HR integration.

Approach

Coordinated 15+ departments (HR, IT, Research, Security) to define unified requirements
Led technical design sessions balancing security needs with user experience
Managed vendor selection process and integration roadmap
Implemented change management program to drive adoption across 2,000+ users
Acted as primary liaison between HR and IT, translating business needs to technical solutions

Impact & Results

2,000+ users migrated to centralized IAM
15+ departments coordinated
500+ hours saved annually in manual processes
100% joiner/leaver automation achieved
Zero security incidents post-implementation

Multi-stakeholder Coordination Change Management Security Compliance Process Automation

Transfer Platform Reliability Program

2016 - 2019 • 3 years

Challenge

Inherited data-transfer platform with years of reliability issues and daily incidents. Previous administrator unable to resolve systematic problems, resulting in researcher frustration and workflow disruptions.

Approach

Analyzed years of incident patterns to identify root causes and dependency chains across teams
Built business case for automation investment, presenting to executive leadership
Led cross-team collaboration to help dependent teams understand service impact on end users
Scaffolded improvements through multiple dependencies and teams over years
Developed automated monitoring, alerting, and self-healing capabilities
Established SLA framework and incident response procedures

Impact & Results

1 week continuous uptime achieved in initial phase (breakthrough after years of daily incidents)
99.9% uptime achieved over multi-year program through sustained cross-team collaboration
85% reduction in manual interventions
7 years later, transfer services no longer a problem for anyone
Network upgrade enabled: traffic growth necessitated EBI network upgrade from 20Gbps to 100Gbps with JISC

Automation Strategy SLA Management Stakeholder Trust Building Long-term Impact

Tape Infrastructure Consolidation & S3 Migration

2019 - 2022 • 3 years (Tender 2019-2020, Migration 2020-2022)

Challenge

Complex tape infrastructure with 2 redundant FIRE replicas using multiple incompatible LTO drive types (4 different types), Spectralogic and IBM systems. Legacy system required reading entire tape to retrieve single file; required multiple servers, Lustre filesystem, and Oracle database - operationally expensive and inefficient.

Approach

Conducted comprehensive tender process for next-generation tape solution
Evaluated S3-compatible tape interfaces capable of selective file retrieval
Led vendor selection balancing cost, performance, and operational simplicity
Designed migration strategy to consolidate infrastructure while maintaining redundancy
Managed 3-year migration from legacy Lustre/Oracle system to modern S3 tape interface
Coordinated retirement of Spectralogic system and 4 LTO drive types

Impact & Results

Multi-million € cost reduction through infrastructure consolidation
4 → 1 LTO tape drive types consolidated (focused on IBM)
Spectralogic retirement simplified vendor management
Single file retrieval vs. full tape read (massive efficiency gain)
Infrastructure simplification eliminated Lustre, Oracle, multiple servers
S3-compatible interface modern API enabling cloud-like access patterns

Tender Management Vendor Evaluation Cost Optimization Infrastructure Modernization

Program Timeline

9 years of parallel program delivery at EMBL-EBI

2016 2017 2018 2019 2020 2021 2022 2023 2024 2025

FIRE Archive System 120PB Production

IAM Transformation 2,000+ Users

Transfer Platform Automation 99.9% Uptime

Monitoring Infrastructure 3,500+ Hosts, 100K+ Points

Data Centre Migration (VLAN Stretching) 1,000+ Hours Saved

ITIL Process Implementation Institute-wide

Tape Tender & S3 Migration Multi-million Cost Savings

Research to Production

Cross-functional Coordination

Operational Excellence

Infrastructure Modernization

Professional Endorsements

Recognition from senior scientists and technical leaders

"Marc's technical leadership has been instrumental in delivering mission-critical infrastructure that enables groundbreaking research across the European bioinformatics community."

Senior Scientist, EMBL-EBI

"Marc demonstrates exceptional ability to coordinate complex technical programs across multiple stakeholder groups while maintaining focus on measurable outcomes and delivery excellence."

Group Leader, EMBL-EBI

Detailed professional reference letters available in the Documents section below

Professional Development & Recognition

Continuous learning and industry recognition

Executive Education

Chief Technology Officer Program

Judge Business School, University of Cambridge

Technology leadership, strategic planning, executive decision-making

Strategic Thinking for the CXO

Judge Business School, University of Cambridge

Strategic thinking, executive leadership, organizational transformation

Steering Complex Projects

Professional Development Program

Large-scale program management, stakeholder coordination, risk management

Industry Recognition

Talent Pool Selection

Atos (Global IT Services)

Selected for high-potential talent development program recognizing exceptional performance and leadership potential

Top Contributor Award

Bull (Enterprise IT & Supercomputing)

Awarded for outstanding technical contributions and project delivery excellence

Get In Touch

Ready to discuss your next infrastructure challenge?

I'm always interested in exploring opportunities where I can leverage my expertise in platform engineering and large-scale infrastructure to drive innovation and efficiency.

Email marc@riera.co.uk

LinkedIn linkedin.com/in/joanmarcriera

GitHub github.com/joanmarcriera

Location

Based in the United Kingdom

Originally from Catalonia, Spain

Open to remote and hybrid opportunities