MatSwarm: Trusted Swarm Transfer Learning in NMDMS

The rapid progression of Industry 4.0 has created a critical need for efficient collaboration among material research institutions to accelerate the discovery of advanced materials. However, existing platforms face challenges in effectively aggregating, normalizing, and utilizing large-scale heterogeneous data, leading to data silos and limited collaboration. Many current solutions fall short in supporting true cross-institutional data sharing and utilization, restricting the potential of material data for innovative research.

National Material Data Management and Services (NMDMS)

To address these challenges, researchers from University of Sciecne and Technology Beijing have developed and deployed a novel big data aggregation and sharing framework within the National Material Data Management and Services (NMDMS) platform, now actively serving over thirty research institutions and enterprises in the materials domain across China. The NMDMS platform has successfully aggregated over 14 million material data entries, becoming a critical resource for collaborative intelligence in materials genomic engineering. This real-world deployment has demonstrated significant value in supporting high-throughput experimentation and collaborative research, positioning the platform as a vital tool for advancing materials science.

What sets the platform apart is its ability to seamlessly handle heterogeneous datasets from multiple sources. By introducing advanced data normalization techniques and distributed storage architecture, the platform enables efficient and scalable aggregation of diverse data, providing users with access to comprehensive, high-quality material datasets. These innovations allow institutions to collaborate and share data more effectively, something traditional platforms have struggled to achieve at this scale.

The following video presents the backgrounds of the NMDMS platform, and introduces MatSwarm framework as a whole on its basis.

Introduction of MatSwarm

MatSwarm: Trusted Swarm Transfer Learning in NMDMS

Further distinguishing the NMDMS platform is its integration of a blockchain-based middleware, which secures data exchanges between organizations while maintaining the independence of their systems. This ensures both data integrity and confidentiality, addressing key concerns in multi-institutional collaborations. Through these unique features, the platform facilitates a trusted and efficient environment for data sharing and computation, driving significant progress in materials research.

The MatSwarm framework was proposed to achieve collaborative computing for the materials domain. Building upon existing swarm learning, which integrates federated learning with blockchain technology, MatSwarm introduces  swarm transfer learning method to enhance the congruence of local model parameters across different organizations. This approach significantly improves the accuracy and generalization of models trained on non-independent and identically distributed (non-i.i.d.) data.

Enabling Secure and Collaborative Materials Computation of the Future

The NMDMS platform represents a major advancement in materials genomic engineering. MatSwarm not only addresses the challenges of heterogeneous data integration and secure sharing but also demonstrates its value through successful deployment and application in real-world scenarios. This pioneering platform is poised to continue driving innovation, accelerating material discovery, and fostering collaboration in the materials science community.

The following video details the procedures of swarm learning tasks based on MatSwarm.

MatSwarm-procedures

The following video demonstrates the practical operation process of federated tasks based on MatSwarm.

MatSwarm-operations