Joan Marc Riera Duocastella

Infrastructure and Platform Engineering Manager

Scaling bioinformatics infrastructure from petabytes to exabytes • EMBL-EBI

About

Technical Program Manager with 15+ years leading large-scale infrastructure programs across bioinformatics, enterprise IT, and supercomputing environments. Currently at EMBL-EBI managing 120+PB storage infrastructure growing at 20PB annually, with proven track record of delivering multi-million euro projects from concept to production.

Based in the UK with roots in Catalonia, I specialise in:

  • Petabyte-scale storage systems - Led 6-year program scaling FIRE archive from prototype to 120+PB production (growing 20PB/year), achieving 99.9% uptime during pandemic while other services experienced weeks of downtime
  • Network infrastructure modernization - Drove EBI network evolution from 20Gbps to 100Gbps with JISC, introduced NIC bonding, migrated from 1Gbps to 2x25Gbps bonded links
  • Infrastructure automation - Led multi-year transformation of unreliable transfer platform from daily incidents to 99.9% uptime through cross-team collaboration, achieving first sustained week of uptime within initial months (previously impossible for years)
  • Cross-functional program leadership - Led institute-wide IAM transformation (15+ departments, 2,000+ users), tape tender/migration (consolidated 4 LTO drive types, retired Spectralogic, saved millions), data centre migrations using VLAN stretching
  • Enterprise monitoring systems - Deployed distributed CheckMK across 3,500+ servers with 100,000+ monitoring points, reducing incident detection time by 60%

I excel at translating complex technical requirements into measurable business outcomes, managing stakeholder expectations across technical and executive audiences, and delivering mission-critical infrastructure that directly enables groundbreaking scientific research impacting millions worldwide.

120+ Petabytes Managed
15+ Years Experience
3,500+ Servers Monitored
100K+ Monitoring Points

Professional Experience

2016 - Present

SDO Service & Data Technical Coordinator

EMBL-EBI (European Bioinformatics Institute) • 9 Years

Lead cross-functional technical programs delivering mission-critical infrastructure for Europe's leading bioinformatics research institute, managing €15M+ in infrastructure investments.

  • Program Leadership: Coordinate 5+ concurrent infrastructure programs across storage, networking, monitoring, and security teams (20+ engineers)
  • Stakeholder Management: Present quarterly roadmaps to executive leadership and external funding bodies, securing continued investment
  • Risk Management: Reduced system downtime from daily incidents to 99.9% uptime through proactive monitoring and automation programs
  • Budget Management: Delivered all major projects on-time and within budget, achieving 40% cost reduction in per-TB storage costs
  • Process Improvement: Introduced ITIL frameworks and centralized monitoring (CheckMK) across 1,000+ hosts, improving incident response time by 60%
2010 - 2016

Senior Systems Engineer & Team Lead

Atos | Bull • Enterprise IT & Supercomputing

Led technical teams delivering infrastructure services to major clients including Barcelona Supercomputing Centre, while advancing through progressive leadership roles recognized through "Talent Pool" selection at Atos and "Top Contributor" award at Bull.

  • Client Delivery: Managed infrastructure for 10+ enterprise clients (Quadis, Arag, Estevanell Energia, TMB, Avertis, Barcelona SuperComputing Centre)
  • Team Leadership: Led engineering teams of 5-10 members across mixed environments (Linux, Windows, AIX, macOS)
  • Recognition: Selected for Atos "Talent Pool" program; awarded "Top Contributor" at Bull for exceptional technical delivery
  • Technical Excellence: Designed enterprise-scale solutions, mentored junior engineers, delivered projects on-time and within budget
2007 - 2010

Systems Engineer

Eurecat (Technology Centre of Catalonia)

Provided technical infrastructure support for research teams, managing heterogeneous IT environments and leading small engineering team.

  • Team Leadership: Led team of 3 engineers supporting research infrastructure
  • Multi-platform Expertise: Managed Linux, Windows, and macOS environments
  • Research Support: Enabled technical research through reliable infrastructure delivery

Program Impact Metrics

Key performance indicators across 15+ years of infrastructure program delivery

Infrastructure Scale

120PB
Current FIRE Size
Growing 20PB annually
3,500+
Servers Monitored
Distributed CheckMK deployment
100K+
Monitoring Points
Services across infrastructure

Operational Excellence

99.9%
Uptime SLA
From ~80% baseline
85%
Manual Work Reduced
Through automation initiatives
60%
Faster MTTD
Mean Time To Detection improvement

Cost & Efficiency

40%
Cost Reduction
Per-TB storage costs
€2M+
Cost Avoidance
vs. commercial alternatives
1,500+
Hours Saved/Year
Process automation & efficiency

Team & Stakeholders

15+
Peak Team Size
Direct & indirect leadership at EBI
15+
Departments Coordinated
IAM transformation program
10+
Enterprise Clients
Atos/Bull period (BSC, TMB, others)

Technical Expertise

Infrastructure & Systems

  • Enterprise Linux Administration
  • High-Availability System Design
  • Storage Systems (15PB → 100PB+)
  • Data Centre Operations
  • Disaster Recovery Planning

Platform Engineering

  • Scalability Architecture
  • Automation & Orchestration
  • Container Technologies
  • DevOps Practices
  • Infrastructure as Code

Data Management

  • Petabyte-scale Storage
  • Data Archiving Solutions
  • Transfer Services
  • Erasure Coding
  • Geo-distributed Replication

Security & Access

  • Identity & Access Management
  • Multi-factor Authentication
  • Security Architecture
  • Compliance Management
  • Risk Assessment

Key Achievements

Petabyte-Scale Infrastructure Program

Impact: Led 5-year infrastructure scaling program from 15PB to 100+PB (600% growth), supporting 2,000+ scientists across 50+ countries. Reduced per-TB costs by 40% while maintaining 99.9% uptime SLA.

Platform Automation Program

Impact: Led multi-year transformation of data-transfer platform from daily incidents to 99.9% uptime through cross-team collaboration and stakeholder engagement, reducing manual intervention by 85% and saving 500+ engineering hours annually. Initial breakthrough achieved first sustained week of uptime (previously impossible for years).

Cross-Functional IAM Transformation

Impact: Coordinated 15+ departments (IT, HR, Security) to implement institute-wide IAM overhaul serving 2,000+ users. Automated joiner/leaver processes saving 500+ hours annually while improving security compliance by centralizing account lifecycle management.

Infrastructure Monitoring Initiative

Impact: Led deployment of distributed CheckMK monitoring across 1,000+ hosts and multiple data centers. Reduced mean time to detection (MTTD) by 60% and enabled proactive incident prevention, contributing to 99.9% uptime achievement.

FIRE Archive: Research to Production

Impact: Managed FIRE (File Replication) archive system from research prototype to production scale (100+PB). Delivered vendor-agnostic, geo-distributed replication serving 50+ countries, published IEEE paper, achieving €2M+ cost savings vs. commercial alternatives.

Data Centre Migration Program

Impact: Convinced network teams to adopt VLAN stretching for data centre migrations, revolutionizing migration approach. Eliminated need for coordination meetings (previously managed via Doodle polls), saving 1,000+ hours of cross-team coordination time.

Program Case Studies

Deep dives into major programs demonstrating end-to-end delivery, from requirements to measurable impact

FIRE Archive System: Research to Production

2017 - 2022 • 5 years

Challenge

EMBL-EBI needed a cost-effective, vendor-agnostic archive solution to replace expensive commercial tape systems while scaling to support exponential research data growth. System had to maintain highest reliability standards while managing petabyte-scale operations.

Approach

  • Led cross-functional team of 8 engineers to design erasure-coded, geo-distributed architecture
  • Managed stakeholder expectations across research teams, IT operations, funding bodies, and executive leadership
  • Implemented phased rollout: research prototype → pilot deployment → full production scale
  • Coordinated multi-site infrastructure buildout with network and data centre teams
  • Established operational procedures and monitoring for 24/7 production reliability

Impact & Results

  • 120+PB current production capacity (growing 20PB/year)
  • 99.9% uptime during COVID-19 pandemic (while other services had weeks of downtime)
  • €2M+ annual cost savings vs. commercial tape alternatives
  • 50+ countries served globally
  • IEEE publication documenting technical architecture (MSST 2020)
  • Zero data loss incidents since production launch
Research to Production Cross-functional Leadership Risk Management Stakeholder Communication

Institute-Wide IAM Transformation

2022 - 2025 • 3 years

Challenge

Decentralized account management created security risks and operational inefficiency. Accounts created ad-hoc across departments with no systematic joiner/leaver process or HR integration.

Approach

  • Coordinated 15+ departments (HR, IT, Research, Security) to define unified requirements
  • Led technical design sessions balancing security needs with user experience
  • Managed vendor selection process and integration roadmap
  • Implemented change management program to drive adoption across 2,000+ users
  • Acted as primary liaison between HR and IT, translating business needs to technical solutions

Impact & Results

  • 2,000+ users migrated to centralized IAM
  • 15+ departments coordinated
  • 500+ hours saved annually in manual processes
  • 100% joiner/leaver automation achieved
  • Zero security incidents post-implementation
Multi-stakeholder Coordination Change Management Security Compliance Process Automation

Transfer Platform Reliability Program

2016 - 2019 • 3 years

Challenge

Inherited data-transfer platform with years of reliability issues and daily incidents. Previous administrator unable to resolve systematic problems, resulting in researcher frustration and workflow disruptions.

Approach

  • Analyzed years of incident patterns to identify root causes and dependency chains across teams
  • Built business case for automation investment, presenting to executive leadership
  • Led cross-team collaboration to help dependent teams understand service impact on end users
  • Scaffolded improvements through multiple dependencies and teams over years
  • Developed automated monitoring, alerting, and self-healing capabilities
  • Established SLA framework and incident response procedures

Impact & Results

  • 1 week continuous uptime achieved in initial phase (breakthrough after years of daily incidents)
  • 99.9% uptime achieved over multi-year program through sustained cross-team collaboration
  • 85% reduction in manual interventions
  • 7 years later, transfer services no longer a problem for anyone
  • Network upgrade enabled: traffic growth necessitated EBI network upgrade from 20Gbps to 100Gbps with JISC
Automation Strategy SLA Management Stakeholder Trust Building Long-term Impact

Tape Infrastructure Consolidation & S3 Migration

2019 - 2022 • 3 years (Tender 2019-2020, Migration 2020-2022)

Challenge

Complex tape infrastructure with 2 redundant FIRE replicas using multiple incompatible LTO drive types (4 different types), Spectralogic and IBM systems. Legacy system required reading entire tape to retrieve single file; required multiple servers, Lustre filesystem, and Oracle database - operationally expensive and inefficient.

Approach

  • Conducted comprehensive tender process for next-generation tape solution
  • Evaluated S3-compatible tape interfaces capable of selective file retrieval
  • Led vendor selection balancing cost, performance, and operational simplicity
  • Designed migration strategy to consolidate infrastructure while maintaining redundancy
  • Managed 3-year migration from legacy Lustre/Oracle system to modern S3 tape interface
  • Coordinated retirement of Spectralogic system and 4 LTO drive types

Impact & Results

  • Multi-million € cost reduction through infrastructure consolidation
  • 4 → 1 LTO tape drive types consolidated (focused on IBM)
  • Spectralogic retirement simplified vendor management
  • Single file retrieval vs. full tape read (massive efficiency gain)
  • Infrastructure simplification eliminated Lustre, Oracle, multiple servers
  • S3-compatible interface modern API enabling cloud-like access patterns
Tender Management Vendor Evaluation Cost Optimization Infrastructure Modernization

Program Timeline

9 years of parallel program delivery at EMBL-EBI

2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
FIRE Archive System 120PB Production
IAM Transformation 2,000+ Users
Transfer Platform Automation 99.9% Uptime
Monitoring Infrastructure 3,500+ Hosts, 100K+ Points
Data Centre Migration (VLAN Stretching) 1,000+ Hours Saved
ITIL Process Implementation Institute-wide
Tape Tender & S3 Migration Multi-million Cost Savings
Research to Production
Cross-functional Coordination
Operational Excellence
Infrastructure Modernization

Professional Endorsements

Recognition from senior scientists and technical leaders

"Marc's technical leadership has been instrumental in delivering mission-critical infrastructure that enables groundbreaking research across the European bioinformatics community."

Senior Scientist, EMBL-EBI

"Marc demonstrates exceptional ability to coordinate complex technical programs across multiple stakeholder groups while maintaining focus on measurable outcomes and delivery excellence."

Group Leader, EMBL-EBI

Detailed professional reference letters available in the Documents section below

Professional Development & Recognition

Continuous learning and industry recognition

Executive Education

Chief Technology Officer Program
Judge Business School, University of Cambridge
Technology leadership, strategic planning, executive decision-making
Strategic Thinking for the CXO
Judge Business School, University of Cambridge
Strategic thinking, executive leadership, organizational transformation
Steering Complex Projects
Professional Development Program
Large-scale program management, stakeholder coordination, risk management

Industry Recognition

Talent Pool Selection
Atos (Global IT Services)
Selected for high-potential talent development program recognizing exceptional performance and leadership potential
Top Contributor Award
Bull (Enterprise IT & Supercomputing)
Awarded for outstanding technical contributions and project delivery excellence

Get In Touch

Ready to discuss your next infrastructure challenge?

I'm always interested in exploring opportunities where I can leverage my expertise in platform engineering and large-scale infrastructure to drive innovation and efficiency.

Location

Based in the United Kingdom

Originally from Catalonia, Spain

Open to remote and hybrid opportunities