26
207

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Sarthak Pati
Ujjwal Baid
Brandon Edwards
Micah J. Sheller
Shih-Han Wang
G. A. Reina
Patrick Foley
Alexey Gruzdev
Deepthi Karkada
Christos Davatzikos
C. Sako
S. Ghodasara
Michel Bilello
S. Mohan
Philipp Vollmuth
G. Brugnara
C. J. Preetha
F. Sahm
Klaus Maier-Hein
M. Zenk
Martin Bendszus
Wolfgang Wick
Evan Calabrese
J. Rudie
J. Villanueva-Meyer
S. Cha
M. Ingalhalikar
Manali Jadhav
Umang Pandey
Jitender Saini
J. Garrett
Matthew H. Larson
R. Jeraj
S. Currie
R. Frood
K. Fatania
Raymond Y. Huang
Ken Chang
C. Quintero
J. Capellades
J. Puig
J. Trenkler
J. Pichler
Georg Necker
Andreas Haunschmidt
S. Meckel
G. Shukla
Spencer Liem
G. Alexander
Joseph Lombardo
J. Palmer
Adam Flanders
A. Dicker
Haris I. Sair
Craig K. Jones
A. Venkataraman
Meirui Jiang
T. So
Cheng Chen
Pheng Ann Heng
Qi Dou
Michal Kozubek
F. Lux
Jan Michálek
P. Matula
Milovs Kevrkovský
Tereza Kopvrivová
Marek Dostál
Václav Vybíhal
M. Vogelbaum
J. R. Mitchell
Joaquim M Farinhas
J. Maldjian
C. Yogananda
Marco Pinho
Divya Reddy
J. Holcomb
B. Wagner
B. Ellingson
T. Cloughesy
Catalina Raymond
T. Oughourlian
A. Hagiwara
Chencai Wang
Minh Nguyen Nhat To
Sargam Bhardwaj
Chee Chong
M. Agzarian
A. X. Falcão
S. B. Martins
Bernardo C. A. Teixeira
Flávia Sprenger
David Menotti
D. Lucio
Pamela LaMontagne
Daniel S. Marcus
Benedikt Wiestler
Florian Kofler
Ivan Ezhov
M. Metz
Rajan Jain
Matthew C. H. Lee
Yvonne W. Lui
Richard McKinley
J. Slotboom
Piotr Radojewski
Raphael Meier
Roland Wiest
D. Murcia
Eric Fu
Rourke Haas
J. Thompson
D. Ormond
Chaitra Badve
A. Sloan
V. Vadmal
K. Waite
Rivka R Colen
Linmin Pei
M. Ak
A. Srinivasan
J. Bapuraj
Arvind Rao
Nicholas C. Wang
Ota Yoshiaki
T. Moritani
Sevcan Turk
Joonsan Lee
Snehal Prabhudesai
Fanny E. Moron
J. Mandel
Konstantinos Kamnitsas
Ben Glocker
Luke V. M. Dixon
Matthew Williams
P. Zampakis
V. Panagiotopoulos
P. Tsiganos
Sotiris Alexiou
Ilias Haliassos
E. Zacharaki
Konstantinos Moustakas
C. Kalogeropoulou
D. Kardamakis
Y. Choi
Seung-Koo Lee
Jong-Hee Chang
S. Ahn
Bing Luo
L. Poisson
Ning Wen
Pallavi Tiwari
R. Verma
R. Bareja
I. Yadav
Jonathan Chen
Neeraj Kumar
M. Smits
S. V. D. Voort
A. Alafandi
Fatih Incekara
M. Wijnenga
G. Kapsas
R. Gahrmann
J. Schouten
H. Dubbink
A. Vincent
M. Bent
P. French
Stefan Klein
Yading Yuan
Sonam Sharma
T. Tseng
S. Adabi
S. Niclou
O. Keunen
A. Hau
M. Vallières
D. Fortin
M. Lepage
Bennett Landman
Karthik Ramadass
Kaiwen Xu
Silky Chotai
L. Chambless
A. Mistry
Reid C. Thompson
Yuriy Gusev
K. Bhuvaneshwar
A. Sayah
Camelia Bencheqroun
A. Belouali
Subha Madhavan
Thomas C Booth
Alysha Chelliah
Marc Modat
Haris Shuaib
Carmen Dragos
Aly H. Abayazeed
K. Kolodziej
Michael Hill
A. Abbassy
S. Gamal
Mahmoud Mekhaimar
Mohamed Qayati
M. Reyes
Ji Eun Park
J. Yun
H. Kim
A. Mahajan
M. Muzi
Sean Benson
R. Beets-Tan
Jonas Teuwen
A. Herrera-Trujillo
M. Trujillo
W. Escobar
A. Abello
Jose Bernal
Jhonny C. Gómez
Josephine Choi
Stephen Seung-Yeob Baek
Yusung Kim
H. Ismael
B. Allen
John Buatti
Aikaterini Kotrotsou
Hongwei Bran Li
T. Weiss
M. Weller
A. Bink
Bertrand Pouymayou
Hassan F Shaykh
Joel H. Saltz
Prateek Prasanna
Sampurna Shrestha
K. M. Mani
David Payne
Tahsin M. Kurc
Enrique Peláez
Heydy Franco-Maldonado
Francis R. Loayza
Sebastián Quevedo
Pamela Guevara
Esteban Torche
C. Mendoza
Franco Vera
Elvis Ríos
E. López
S. Velastín
G. Ogbole
Dotun Oyekunle
O. Odafe-Oyibotha
B. Osobu
Mustapha Shu’aibu
Adeleye Dorcas
M. Soneye
Farouk Dako
Amber L. Simpson
M. Hamghalam
Jacob J. Peoples
Ricky Hu
A. Tran
D. Cutler
F. Moraes
M. Boss
James F. Gimpel
Deepak Kattil Veettil
Kendall Schmidt
Brian Bialecki
S. Marella
C. Price
Lisa Cimino
Charles Apgar
Prashant Shah
Bjoern H. Menze
J. Barnholtz-Sloan
Jason Martin
Spyridon Bakas
Abstract

Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.

View on arXiv
Comments on this paper