ArticlePDF Available

Collaborative and privacy-preserving retired battery sorting for profitable direct recycling via federated machine learning Check for updates

Springer Nature
Nature Communications
Authors:

Abstract and Figures

Unsorted retired batteries with varied cathode materials hinder the adoption of direct recycling due to their cathode-specific nature. The surge in retired batteries necessitates precise sorting for effective direct recycling, but challenges arise from varying operational histories, diverse manufacturers, and data privacy concerns of recycling collaborators (data owners). Here we show, from a unique dataset of 130 lithium-ion batteries spanning 5 cathode materials and 7 manufacturers, a federated machine learning approach can classify these retired batteries without relying on past operational data, safeguarding the data privacy of recycling collaborators. By utilizing the features extracted from the end-of-life charge-discharge cycle, our model exhibits 1% and 3% cathode sorting errors under homogeneous and heterogeneous battery recycling settings respectively, attributed to our innovative Wasserstein-distance voting strategy. Economically, the proposed method underscores the value of precise battery sorting for a prosperous and sustainable recycling industry. This study heralds a new paradigm of using privacy-sensitive data from diverse sources, facilitating collaborative and privacy-respecting decision making for distributed systems.
This content is subject to copyright. Terms and conditions apply.
Article https://doi.org/10.1038/s41467-023-43883-y
Collaborative and privacy-preserving retired
battery sorting for protable direct recycling
via federated machine learning
Shengyu Tao
1,6
, Haizhou Liu
1,6
, Chongbo Sun
1,6
,HaochengJi
1
, Guanjun Ji
1
,
Zhiyuan Han
1
, Runhua Gao
1
,JunMa
1
,RuifeiMa
1
,YuouChen
1
,ShiyiFu
2
,
Yu Wang
2
, Yaojie Sun
2
, Yu Rong
3
, Xuan Zhang
1
, Guangmin Zhou
1
&
Hongbin Sun
1,4,5
Unsorted retired batteries with varied cathode materials hinder the adoption
of direct recycling due to their cathode-specic nature. The surge in retired
batteries necessitates precise sorting for effective direct recycling, but chal-
lenges arise from varying operational histories, diverse manufacturers, and
data privacy concerns of recycling collaborators (data owners). Here we show,
from a unique dataset of 130 lithium-ion batteries spanning 5 cathode mate-
rials and 7 manufacturers, a federated machine learning approach can classify
these retired batteries without relying on past operational data, safeguarding
the data privacy of recycling collaborators. By utilizing the features extracted
from the end-of-life charge-discharge cycle, our model exhibits 1% and 3%
cathode sorting errors under homogeneous and heterogeneous battery
recycling settings respectively, attributed to our innovative Wasserstein-
distance voting strategy. Economically, the proposed method underscores the
value of precise battery sorting for a prosperous and sustainable recycling
industry. This study heralds a new paradigm of using privacy-sensitive data
from diverse sources, facilitating collaborative and privacy-respecting deci-
sion-making for distributed systems.
Lithium-ion batteries (LIBs), serving as energy storage devices, have
gained widespread utilization across various domains, from industry
production to daily life as an accepted technical route. Projections
suggest that the global production scale of LIBs will surpass1.3 TWh by
20301when the escalating demand for batteries will far outpace the
availability of vital metal resources like lithium and cobalt2,3. However,
the current average lifespan of LIB products stands at 58years,
leading to an imminent surge in retired batteries in many countries. If
not appropriately managed, retired batteries will result in unsustain-
able resource wastage and environmental harm. Given these
circumstances, the development of battery recycling technology
assumes crucial importance as we confront the impending tide of LIB
retirements4.
Recent advances in battery recycling research have been focused
on the pyrometallurgical, hydrometallurgical, and direct recycling
approaches5. In contrast to the pyrometallurgical and hydro-
metallurgical methods, direct recycling stands apart as a distinct
approach. This process does not inict secondary damage on the
material structure, enabling more efcient structural repair and per-
formance restoration. Moreover, direct recycling exhibits higher
Received: 22 June 2023
Accepted: 22 November 2023
Check for updates
1
Tsinghua-Berkeley Shenzhen Institute, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China.
2
School of Information
Science and Technology, Fudan University, Shanghai, China.
3
Tencent AI Lab, Tencent, Shenzhen, China.
4
Department of Electrical Engineering, Tsinghua
University, Beijing, China.
5
College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan, China.
6
These authors contributed equally:
Shengyu Tao, Haizhou Liu, Chongbo Sun. e-mail: xuanzhang@sz.tsinghua.edu.cn;guangminzhou@sz.tsinghua.edu.cn;shb@tsinghua.edu.cn
Nature Communications | (2023) 14:8032 1
1234567890():,;
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
protability, as it entails lower energy consumption, reduced green-
house gas emissions, and lighter environmental footprints6,7. In actual
production, however, battery recyclers frequently encounter LIBs
comprising unknown components or battery modules that consist of a
mixture of different cathode material types. Considering that direct
recycling can be heavily cathode-specic, such a complexity renders
the application of direct recycling infeasible for achieving value con-
version of the retired batteries8. It is crucial to emphasize that even if
the vital metals from mixed cathode material types can be extracted
using conventional recycling strategies, the interplay between differ-
ent cathode materials during the recycling process can adversely
impact product quality9. Therefore, understanding cathode material
type information on the recycling side markedly impacts the direct
recycling route choice and ultimately improves product quality,
protability, and sustainability.
Human-assisted direct recycling has been proposed to identify
retired battery cathode material type information in the pre-treatment
link, which is still not nancially viable when the recycling industry is
scaling up1. To effectively retrieve the retired battery cathode type
information, the scientic and industrial community has recently
initiated a battery lifetime tracing system10 and emerging concepts like
battery passport11 and battery data genome12. Although substantial
batteries have been utilized before those initiatives, there is a growing
consensus that battery information should be accessible throughout
the life chain to facilitate second-life decision-making13.Thisisnotably
the case for the battery recycling sector, the last station of the batterys
second life, as the recycling route can be heavily cathode-specic.
However, battery lifetime tracing systems or battery passports are
enabled by electronic gadgets like bar codes and near-eld commu-
nications, which could introduce intensive investment and could be
widely incompatible with different battery designers. Furthermore,
electronic gadgets remain challenging to consistently manage
throughout their lifespan, leading to worn-out devices and inaccessi-
bilityat the recyclingstage since the modern manufacturing process of
LIBs is still not production-to-recycling integrated14.Hence,more
breakthroughs are urgently needed to achieve an efcient battery
cathode type sorting only using easy-to-access eld information15,16,
opposite to the historical data recorded or the human-assisted man-
ner, facilitating the adoption of direct recycling to improve the quality
and protability of recycled products.
In the past few years, machine learning has emerged as a viable
tool to tackle open questions in all battery elds. In other battery-
related topics, machine learning has recently allowed us to auto-
matically discover complex battery mechanisms1719, predict remaining
useful life2024, evaluate the state of health19,25,26, optimize the cycling
prole27,28, approximate the failure distribution29,eventoguidethe
battery design30,31, and predictlife-long performance immediately after
manufacturing32. In the case of battery recycling, few works have
investigated machine learning regarding cathode materials33,34,which
blames the scarce battery data, especially for those cycled to the end-
of-life stage. The vast majority of published studies showcase very
limited sample sizes35 and are even more limited in battery cathode
diversity36. The scarcity is attributed to the intensive cost, the long
testing time37, and, most importantly, the data privacy due to com-
mercial or interest concerns. Consequently, the privacy issue rigidies
the dilemma where the existing battery data, though substantial in
volume and diversity from multiple parties such as battery manu-
facturers, practical applications, academic institutions, and third-party
platforms, cannot be shared. Such a dilemma calls for studying the
cathode material sorting to optimize battery recycling route choice in
a collaborative while privacy-persevering fashion.
Federated machine learning, as a distributed and privacy-
preserving paradigm, has the potential to resolve both multi-party
collaboration (equivalently, the battery data volume and diversity) and
privacy issues through collaborative machine learning3840.Ineach
training iteration, the distributed data owners perform local training
with their local computational power, encrypt the as-trained model
parameters/results, and upload them to a central coordinator for
aggregation. Facts that raw datasets never leave their respective data
owners and that transferred parameters/results are properly encryp-
ted to protect data privacy. Federated machine learning has been
extensively investigated in numerous applicative elds, including
public health41,42, clinical diagnosis4345, e-commerce46,Internetof
Things47, mobile computing48, and smart grid4951. This approach can
revolutionize the data-driven research paradigm in wide energy sec-
tors by enabling privacy-preserving collaboration, especially for those
with limited data access. Regarding the battery recycling sector, fed-
erated machine learning assumes promising possibilities for lever-
aging the giant amount of battery data that already exists but cannot
be shared due to privacy concerns. With such a collaborative while
privacy-preserving paradigm, retired battery sorting can be imple-
mented with high accuracy, efciency, scalability, and generalization,
optimizing the quality and protability of recycled products. To our
knowledge, federated machine learning studies focused on battery
recycling have never been reported.
In this study, we perform a cathode material sorting of the retired
batteries, leveraging the existing battery data from multiple colla-
borators, such as battery manufacturers, practical application opera-
tors, academic research institutions, and third-party platforms, in a
collaborative while privacy-preserving machine learning fashion as
illustrated in Fig. 1. Our federated machine learning model was trained
using only one cycle of eld testing data via a standardized feature
extraction process, without any prior knowledge of the historical
operation conditions. We compare the predictive power of our fed-
erated machine learning model with that of independently learned
local models based on local data under both homogeneous and het-
erogeneous battery recycling circumstances. The heterogeneity issue
is resolved by our proposed Wasserstein-distance voting strategy. An
economic evaluation of retired battery recycling using our proposed
federated machine learning framework is conducted, highlighting the
relevance and necessity of accurate sorting of retired batteries. We
comprehensively discuss the model interpretability, battery recycling
implications, and broader prospects of the future battery recycling
practice integrated with federated machine learning.
Results
Data collection and standardization
The unique battery kinetics in different battery types are often high-
dimensional and hard to characterize due to divergent operating
cases, manufacturing variability, and historical usages52.Tond a
solution to this dilemma, we collected and standardized 130 retired
batteries with 5 cathode material types from 7 manufacturers to con-
struct an out-of-distribution, equivalently heterogeneous dataset.
Given different historical usages, the capacities of the collected bat-
teries are below 90% of the nominal capacity. The battery cathode
materials are lithium cobalt oxide (LCO), nickel manganese cobalt
(NMC), lithium ferrophosphate (LFP), nickel-cobalt-aluminum oxide
(NCA), and NMC-LCO blended types, which are further grouped into 9
classes based on the manufacturers (Supplementary Table 1). We
intentionally include batteries with divergent historical usages, from
laboratory testing to electric vehicle driving proles, to train a gen-
eralized model for the battery recycler independent of historical usa-
ges and battery types.
For standardization, all data required from the recycler are the
currently-probed (eld-testing) cycle with one charging and dischar-
ging test, which is easy to implement in practical cases. The as-probed
data are rst denoised by lling in missing values, replacing outliers,
and performing median ltering. Human-induced and cathode-
heterogeneity-induced noises are deliberately retained, though, to
make the model robust to imperfect inputs. The data are then linearly
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 2
Content courtesy of Springer Nature, terms of use apply. Rights reserved
interpolated for curve lling (Supplementary Fig. 1) and feature engi-
neered for dimensionality reduction, with a shared set of standardiza-
tion parameters (Supplementary Note 1). Features extracted from the
standardization pipeline are well interpretable, a concern of signicant
commercial interest. To the best of our knowledge, it is the rst time
that heterogeneous battery data from multiple sources and historical
usages are utilized to assist in the strategy design of battery recycling.
Figure 2a, b demonstrate the feature engineering process. We
focus on the charging and discharging curve of the retired batteries in
the last cycle, i.e., one charging and one discharging cycle (Supple-
mentary Figs. 25). In the charging cy cle, 15 features are extracted from
the voltage-capacity and dQ/dV curves, where V and Q refer to the
voltage and capacity values, respectively. The same set of features are
extracted for the discharging cycle. As a result, 30 features are
extracted in total, as indicated from F1 to F30. Refer to Supplementary
Table 2 and Supplementary Note 2 for a detailed explanation of the
features. Figure 2c showcases the absolute and relative feature values
of the selected batteries from each class. Most relative feature valuesin
different classes overlap in the 1 to 0 region (with the light green
color) and are indistinguishable, illustrating the difculty in classifying
battery type using one cycle of battery data. The difculty is expected
because the divergent historical operation conditions can inuence
the charging-discharging kinetics of the batteries so that the extracted
features can be largely correlated despite the different battery types
(Supplementary Fig. 6). Rather than directly interpreting theextracted
features using expert knowledge, we employ an alternative data-driven
approach that automatically leverages the latent patterns across var-
ious battery types.
Retired battery sorting with homogeneous data access
We rst consider a setting where the battery data are homogeneously
distributed across the collaborators (namely, the clients). The homo-
geneity means that each client offers to share the battery data across
all 9 classes, even though the specic number of batteries is not
restricted (Supplementary Table 3). We train our federated machine
learning model without requiring information on the historical use of
the retired batteries. In our work, the recycler and the clients only need
to test the retired batteries at the current (eld-testing) cycle, speci-
cally, with a complete charging-discharging cycle for a standard fea-
ture engineering process initiated by the recycler. Local models are
trained based on features extracted from their private battery data.
The federated machine learningframework aggregates the local model
parameters, rather than the private battery data, for the recycler to
classify the retired batteries.
Figure 3shows the sorting results when clients contribute
homogeneous battery data. Figure 3a compares two federated
machine learning methods, i.e., the majority voting (MV) and our
proposed Wasserstein distance voting (WDV), with the independent
learning (IL) paradigm. It should be noted that the accuracy for theIL is
averaged over all clients in a non-federated manner. Compared with
the IL, the MV does not sacrice sorting performance, with an average
accuracy of 95%, while being capable of protecting data privacy and
mitigating computational burden. However, 3 classes are missorted
using the MV. For instance, 3 batteries in NMC (SNL, class 8, 15 in total)
are missorted into NCA (SNL, class7), resulting in a sorting accuracy of
80%. The sorting accuracy for NCA (UL-PUR, class 9) is 81%, with 2
batteries missorted into NMC (MICH_Form, class 4) and 1 battery
missorted into NMC/LCO blended type (HNEI, class 2), respectively. In
contrast, the WDV outperforms the MV since it only missorted one
battery, resulting in a sorting accuracy of 99%. We also evaluate the
prediction probability of each class for the MV and WDV, respectively.
It turns out that the WDV makes a more condent sorting than the MV
since the prediction probabilities of the WDV are generally right-
skewed to a higher probability value. Therefore, our proposed WDV
produces higher sorting accuracies across allclasses, and the sorting is
of richer probability condence margins.
Classified
batteries
Optimized
recycling
Battery
manufacture
Practical
application
Academic
research
Third party
platform
Privacy
Aggregation
Data source
General model
Data sharing Privacy preserving
Scalable collaborators
Feature extraction Local model Global model
Wasserstein distance
Model deployment
Our work
Retired
batteries
Federated
model
Federated learning framework for battery recycling classification
Traditional Data islanding Independent model
Voting
a
b
Fig. 1 | The federated machine learningframework of retiredbattery sortingfor
recycling. a Multiple data sources, such as battery manufacturers (Image courtesy
Addionics), practical application operators (battery pack in the oorpan of a Tesla.
Image courtesy ofTesla), academicresearch institutions, and third-partyplatforms,
can be data contributors. The battery data are neither exchanged between con-
tributors nor uploaded to the battery recycler. Instead, the data contributors train
local models and share model parameterswith the battery recyclerto build a global
model.The proposed Wasserstein-distancevoting technique fuses the localmodels
into the global model, which is robust to data imbalance and noise. Battery recy-
clers can usethe jointly-builtmodel for battery sorting, combined with the easy-to-
access eld testing data. bOur federated machine learning framework encourages
collaborators to sharing the data while preserving data privacy as apposed to the
traditional data islanding paradigm.
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
We also evaluate the privacy budget (PB, Methodssection),
considering that client data might be vulnerableto reverse engineering
by eavesdropping on private data53. In this regard, we add random
Gaussian noise to the client data with different intensities. The
intensity of the randomness is controlled by a noise-to-signal ratio
(NSR), ranging from 1% to 10%. Figure 3b shows the accuracy and
privacy budget comparison when using IL, MV, and WDV, respectively.
The sorting accuracy of the MV decreases from 95% to 82%, similar to
Charging data
Discharging data
a
b
Q1
Q2
F6
Q1 Q2 Q3
F3
F1
F2
F4
F5
Q3
F7 F8
F9
F10
F11
F12: Kurtosis Capacity
F13: Kurtosis Voltage
F14: Skewness Capacity
F15: Skewness Voltage
Q1
Q2
F21
Q1 Q2 Q3
Q3
Base distribution
Qk:25 × k% quantile
F23
F22
F25
F26
F24
F16
F18
F19
F20
F27: Kurtosis Capacity
F28: Kurtosis Voltage
F29: Skewness Capacity
F30: Skewness Voltage
Base distribution
-----
c
F17
39.8
-18.5
Absolute feature value
LCO (CACLE)
NMC/ LCO (HNEI)
NMC (MICH_Expa)
NMC (MICH_Form)
LCO (OX)
LFP (SNL)
NCA (SNL)
NMC (SNL)
NCA (UL-PUR)
Z-score (standard score of feature value)
F: Feature
Fig. 2 | The feature engineering result. a For the charging process, 15 features are
extracted from the voltage-capacity (left) and dQ/dV curve (right). bThe same set
of features are for the discharging process as F16 to F30. cFeatures are visualized
by classes, following the format CxBn, indicating the nth battery from class x. The
size of a circle maps the absolute feature value. Source data are provided as a
Source Data le.
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 4
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Separable distance
Separable distance
bd
Accuracy
MV:95%
IL:95%
a
0 1 0 1
Prediction probability distribution Prediction probability distribution
cMV WDV
IL
Accuracy
WDV:99%
IL:95%
PB reference
46
10
PB
PB
PB
NSR(%)
PB reference
NSR(%)
5% 95%
25% 75%
50%
=10
=10
=1800
1Zoomed in view
mean
1
-1
mean
Neighbouring regions
Salient features
Non-salient features
Importance (arb. units)
Fig. 3 | Sorting results when clients have homogeneous data access. a The
confusion matrix for the majority voting (MV) and Wasserstein distance voting
(WDV) methods, respectively. We consider the prediction probability distribution
for each class. The sorting of independent learning (IL) is annotated. bSorting
accuracy distribution and privacy budget (PB) of the IL, MV, and WDV in the pre-
sence of random noise. The PB value is referenced at a 90% accuracy level.
cAverage F1-score of sorting results and PBs in each class using the IL, MV, and
WDV. The PB values are all referenced at a 0.9 F1-score level. Data are presented as
mean values±1 standard deviation. dFeature importance, in descending order.The
subplot shows the feature space spanned by the rst two mostsalient features. Data
are presented as mean values+ 1standard deviation. Source data are provided as a
Source Data le.
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
that of the IL when the noise intensity increases from 1% to 10%. In this
noise range, the median sorting accuracy of the MV and IL is 92% and
86%, respectively. In contrast, the sorting accuracy of the WDV is still
above 90% in the presence of 10% noise, which is a stringent noise level
in practical cases. The WDV has a median sorting accuracy of over 95%
in the same noise range. Taking an average sorting accuracy of 90% as
an acceptable reference level, the PB values of IL, MV, and WDV are 4,
6, and 10, respectively. Therefore, applying federated machine learn-
ing produces a more privacy-secure sorting than IL, hence being cap-
able of preventing data eavesdropping. Furthermore, our proposed
WDV is more accurate and performs much better (with nearly doubled
PB values) in the privacy-accuracy trade-off than the MV. In addition,
the robustness to stringent noise when using the WDV also implies a
good tolerance of the ba ttery measurement requirement, reducing the
expensive battery testing disbursement.
Noticing that a high sorting accuracy does not necessarily imply
an acceptable sorting for a specic class, we also consider within-class
sorting performance s. Figure 3c shows the F1-score and privacy budget
of the IL, MV, and WDV in each predicted class; note that the privacy
setting is identical to that in Fig. 3b. The result shows that the IL has
smaller F1-scores than the federated machine learning manner in all
the classes, making poor sortings. Regarding federated machine
learning, WDV outperforms the MV in each predicted class by produ-
cing higher average F1 scores. The deviation range of the F1 scores for
WDV is smaller than that of the MV, indicating that the WDV is more
robust (Supplementary Fig. 7). Therefore, our proposed WDV not only
has a better overall sorting accuracy among all nine classes (Fig. 3b)
but also within each class, compared with the MV. Regarding the
privacy budget, the PB value when using the non-federated IL, refer-
enced at a 0.9 F1-score level, is signicantly lower than the federated
way (Supplementary Table 4) across all classes. This indicates a more
severe risk of data leakage for IL compared with federated machine
learning. When further applying our proposed WDV, the private bud-
get can increase by 78% and 44% compared with the non-federated IL
and the federated MV, respectively (Supplementary Table 4). The
results demonstrate that the WDV successfully leverages the battery-
chemistry-related insights hidden in clients while effectively preser-
ving client data privacy.
We then interpret our federated machine learning model by
evaluating the most salient features correlated with battery cathode
chemistry. Figure 3d shows the importance of the features in des-
cending order. The error bar indicates the importance deviation.
Features F1 and F16 rank the top two features regarding out-of-bag
importance (Methodssection). Interestingly, these two features have
a clear physical interpretation of the battery dynamics, which we will
further discussin later sections.Here, we rationalize these two features
by plotting the grouped battery samples in the feature space spanned
by features F1 and F16. The subplot of Fig. 3dshowsthatNMC/LCO
blended type (HNEI, class 2), NMC (MICH_Expa, class 3), and LFP (SNL,
class 6) (sharing the color with Fig. 3a) are clearly separable in the
spanned featurespace. For the remaining classes,the batteries are still
separable (see the zoomed-in view), though in relatively more minor
grains. On the contrary, the non-salient features have a relatively
weaker sorting ability due to the non-separable feature space spanned
(Supplementary Fig. 8). As a result, our federated machine learning
framework successfully discovered useful mechanism insights to
guarantee sorting accuracies. Such an insight could be further exten-
ded to simplify the model for light computation, hence less invest-
ment. Once the client models classify the batteries, the recycler can
aggregate the client results to make a naldecisiononthebattery
cathode material types underpinned by the salient features.
Retired battery sorting with heterogeneous data access
We also consider an extreme, while a more actual situation where the
data can be exclusively scattered among clients, i.e., the data
distribution is heterogeneous. In this situation, the heterogeneity issue
poses more challenges to battery type sorting since the clients are
prone to train biased models and deteriorate global accuracy, which is
still an open question in federated machine learning. In this section, we
explore a more challenging situation rather than having homogeneous
data access among each client (Supplementary Note 3). We demon-
strate that our federated machine learning framework can still classify
retired batteries based on the standard feature engineering process at
the current (eld-testing) cycle without any knowledge of the previous
operation conditions.
Figure 4shows the sorting results when clients have hetero-
geneous data access. We consider the heterogeneity index, dened as
the minimum number of battery classes for each client in each Monte
Carlo simulation run. A higher heterogeneity index indicates a less
heterogeneous battery data distribution. The heterogeneity index is
no smaller than two such that one client can train a local model for a
sorting task. Figure 4a shows average sorting accuracy when the het-
erogeneity index varies. The average accuracies are plotted with solid
lines, with the ± 1 standard deviation range indicated in the shaded
region. As the heterogeneity index decreases from 9 to 2, the perfor-
mance of the MV and the IL rapidly deteriorates at a sublinear rate. The
average sorting accuracy of the MV is 0.55, slightly better than the IL,
equivalent to a random guess when the heterogeneity level is two. This
observation shows that the MV can help little to aggregate the local
models under heterogeneous data access. In contrast, the WDV out-
performs its MV counterpart in all heterogeneity levels, successfully
mitigating the heterogeneous data distribution issue. Moreover, the
WDV shows an interesting asymptotic effect when the heterogeneity
index increases. This indicates that the WDV can potentially support
the optimal allocation/distribution of client battery data to reduce the
collaboration cost in practical battery recycling situations.
We select the best model using the MV when the heterogeneity
index equals two and compare it with the sorting result of our pro-
posed WDV under the same setting. The selected best model has an
average sorting accuracy of 71%, as shown in Fig. 4a.
The detailed battery data distribution setting of the best model
using MV is illustrated in Fig. 4b, which is heterogeneous (Supple-
mentary Table 5). For instance, client 2 contributes to all battery
classes except for NMC (MICH-Expa, class 3), while client 5 only con-
tributes to NMC/LCO blended type(HNEI, class 2) and NMC (SNL, class
8). Under the heterogeneous data distribution setting in Supplemen-
tary Table 5, we further compare the class-wise and client-wise sorting
performance of the MV and the WDV to the non-federated IL with two
considerations: (1) the signicance of our federated machine learning
framework and (2) why our proposed WDV outperforms the MV. First,
we evaluate the client-wise sorting accuracy, shown in the lower side of
Fig. 4c. Client 5 achieves an average sorting accuracy of 25%, ranking
last among all clients. Meanwhile, client 2 achieves an average sorting
accuracy of 86%, ranking rst among all clients. However, the average
sorting accuracy is only 55%, close to a random guess. Therefore, the
client performance using the non-federated IL depends heavily ondata
access (Supplementary Fig. 9). In fact, without our federated machine
learning framework, the battery recycler i s equivalent to a single client,
and the battery recycler can only make sortings on the battery types
stored in its local database. This non-federated paradigm could not
handle various types of retired batteries if the recycler did not build a
database covering all the battery types it would handle. With our
federated machine learning framework, the recycler can collaborate
with several clients, even if under heterogeneous data situations.
We turn to analyze how to collaborate with clients under het-
erogeneous data access settings. The upper part of Fig. 4cshowsthe
class-wise accuracy of the MV and WDV. It is noticed that the average
sorting accuracy after using the MV is better than the non-federated
way, which is 79%, as indicated in the lower side of Fig. 4c. It demon-
strates the success of applying the federated machine learning
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 6
Content courtesy of Springer Nature, terms of use apply. Rights reserved
framework to address the heterogeneous data distribution issue in this
case. However, the MV totally missorted LFP (SNL, class 6) and NMC
(SNL, class 8) with zero accuracy. The failure of the MV in specic
classes can be rationalized by its core idea of giving more weight to the
clients who contribute more battery samples while not guaranteeing
diversity in battery types. For instance, the contribution of client 7 will
be strengthened by the MV due to a large number of batteries (spe-
cically, 195 augmented batteries, ranking second among clients),
despite only contributing four classes of batteries. As a result, the MV
will lead the aggregated model to be biased towards large client such
as client 7 (Supplementary Table 4). The biased phenomenon is evi-
denced bythe as-described zero sorting accuracy for LFP (SNL,class 6)
since the large client, such as client 7, never contributed any batteries
in class 6. Similarly, client 1, the largest client with 197 augmented
batteries, failed to contribute helpful information to the recycler
regarding classifying NMC (SNL, class 8), which is consistent with zero
accuracy in class 8. In contrast to the MV, our proposed WDV focuses
on the battery similarities between the recycler and each client by
measuring the pairwisedistance. We aim to assign fewer weightings to
the clients with biased data distributions (equivalently, higher het-
erogeneity),whose batteries are of higher similarities with the recycler,
such that the recycler can have generalized information from each
client. The results show that our proposed WDV successfully leverages
helpful information from heterogeneous data distribution among cli-
ents. The WDV achieves 100% and 89% sorting accuracy for the
otherwise missorted batteries in LFP (SNL, class6) and NMC (SNL,class
8), respectively. The overall sorting accuracy using the WDV is up to
97%, with only 5 batteries missorted out of 144 samples. In Supple-
mentary Fig. 10, we also notice that the missorted batteries are of
similar cathode materials. Specically, 2 batteries with the NMC cath-
ode material were missorted into the NMC/LCO blended type; while 1
batterywith the NCA cathode material was correctin material type but
missorted into another manufacturer. On the contrary, the missorted
results produced by the MV can spread to either many irrelevant
a
c
71%
b
0.52
0.86
0.590.60
0.25
0.44
0.32
0.69
0.55 0.68
0.0
1.0
0.5
1.0
0.5
0.71
(Avg. MV)
0.97
(Avg. WDV) MV
WDV
0.55
(Avg. IL)
Prediction accuracy
Prediction accuracy
Federated
learning
Independent
learning
Task specific
General model
0.92
Accuracy by collaborator
(averaged among classes)
0.87 0.89
00
0.83
0.0
12345 678910 Collaborators(data owners)
Accuracy for recycler (by class)
Privacy
preserving
Retired batteries
WD-Voting
Aggregation
±1
=50
Insufficient data
00.4
Class distribuon
Fig. 4 | Sorting results when clients have heterogeneous data access. a Sorting
accuracy as a function of heterogeneity index. The results are averaged over 50
Monte Carlo runs (n=50), with one standard deviation region ( ± 1σ) indicated by
shaded color. bThe data distribution when benchmarking the best majority voting
(MV) performance. cClass-wise (upper part) and client-wise (lower part) sorting
accuracy corresponds to our federated and independent machine learning (IL)
methods. The Sanky chart (middle) presents the heterogeneous data distribution
among clients. Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
classes or manufacturers. Therefore, we conclude that the WDV can
aggregate helpful client insights by distinguishing inherited differ-
ences in cathode material types. Inspired by this, the WDV also sug-
gests that the clients are encouraged to contribute more battery data
in diversity rather thanmore data in some specicclasses.Therecycler
can optimize the benet distribution based on helpful client infor-
mation provided. Ultimately, our federated machine learning frame-
work enables the recycler to know the battery cathode material type,
even if without their own data access to various battery data, while
preserving the data privacy of potential clients.
An economic evaluation of retired battery recycling
To help understand the relevance and necessity of battery sorting in
actual recycling practice, also to verify the signicance of our pro-
posed WDV strategy, an economic evaluation is performed. Three
recycling methods (pyrometallurgy, hydrometallurgy, and direct
recycling), two battery cathode types (LFP-graphite and NMC-gra-
phite), two recycling modes (individual, hybrid), three sorting
accuracy levels (97%, 71%, 55%) induced by the federated and non-
federated machine learning methods (WDV, MV, IL) are included in the
evaluation. The notation of ML-direct in Fig. 5a denotes direct recy-
cling enabled by our federated machine learning framework. The
individual mode denotes that batteries have been previously sorted in
a human-aided manner (Fig. 5bd), which is used to compare different
recycling methods given a known cathode type. The hybrid mode
denotes that batteries are collected with mixed cathode types
(Fig. 5eg), which is used to analyze the signicance of the battery
sorting toward recycling prots. The detailed calculation procedure
and numerical results can be found in Supplementary Note 4 and
Supplementary Tables 615, respectively.
Figure 5a shows a schematic diagram of three recycling methods,
including pyrometallurgy, hydrometallurgy, and ML-direct recycling.
The nal product of pyrometallurgy is metalalloy. While nal products
of hydrometallurgy are lithium salt and precursor, which should be
further processed to assemble batteries, as indicated by red and
blue arrows in Fig. 5a. Compared to the other two non-machine
Data
sharing Profit
Environment
protection
Privacy
Operation simplicity
Pyro- Hydro- ML-Direct
(¥)
NMC
LFP
NMC
LFP
NMC
LFP
ML-Direct
Pyro-
Hydro-
54%
18%
17%
524
74%
13% 8
21
2
Inner: LFP
Outer: NMC
Retired NMC and LFP batteries
Classification
Annealing
Assembly
Disassembly
Disassembly
separation
Acid/alkali
leaching
Extraction/
precipitation
Lithium salt +
precursor
Crushing
Metal alloy
Refine
Slags
Smelting
Hydro-
Pyro- ML-direct
New batteries
abc
de
LFP NMC LFP NMC LFP NMC
Hydro-
Pyro- ML-Direct
Cost (¥) Revenue (¥) Profit (¥)
Raw material Average labor Equipment depreciation
Reagent Electricity & Water Sewage treatment
fg
(k¥)
75
60
45
30
15
0WDV MV IL
1:2 1:1 2:1
LFP : NMC
(k¥)
75
60
45
30
15
0WDV MV IL WDV MV IL
ML-Direct Pyro- Hydro-
WDV MV IL WDV MV IL WDV MV IL
(k¥)
LFP : NMC 1:2 1:1 2:1
0
5
10
15
20
25
30
010 20 30 40
1
5
NMC: Nickel Manganese Cobalt Oxide
LFP: Lithium Iron Phosphate
ML: Machine Learning
Fig. 5 | An economic evaluation of retired batteryrecycling. a Comparison of the
Pyro- (pyrometallurgical), Hydro-(hydrometallurgical), and ML-direct (machine
learning aided direct) recycling methods. bCostanalysis of Lithium Iron Phosphate
(LFP) andNickel Manganese CobaltOxide (NMC) batteries usingdifferent recycling
methods in individual modes. cCost analysis of LFP and NMC batteries using ML-
direct recycling in individual mode. dCost, revenue, and protcomparisonofthe
individual battery type using different recycling methods in individual mode.
eCost, revenue, and prot comparison using Wasserstein distance voting (WDV),
majority voting (MV), and independent learning (IL) methods in hybrid mode. The
ratio isthe amount of LFP batteryto that of NMC battery.fSensitivityanalysis of the
prot of WDV, MV, and IL methods in a hybrid model towards sorting accuracy in
hybrid mode. The ratio is the amount of LFP to that of the NMC battery.
gComprehensive comparison of different battery recycling technologies in hybrid
mode. Source data are provided asa Source Data le. The graphics in panel a were
created using icons from Flaticon.com.
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 8
Content courtesy of Springer Nature, terms of use apply. Rights reserved
learning-aided methods, ML-direct recycling has the shortest process
ow since the product is standard battery materials, which brings
about the largest possible convenience and the least possible envir-
onmental footprints. It should be stressed that such convenience is
enabled by accurate sorting, a vital link in pretreatment for actual
battery recycling practice, thanks to our federated machine learning
framework.
The cost analysis of LFP and NMC batteries using different recy-
clingmethodsisshowninFig.5b, including raw material, reagent,
average labor,electricity& water, equipment depreciation, and sewage
treatment. It can be observed that the raw material accounts for the
largest proportion of the cost. As a result, the cost of NMC is always
higher than LFP in any method owing to the large price difference
between NMC and LFP. Besides, for the same type of batteries, the cost
of ML-direct recycling is the largest while the pyrometallurgy is the
least, owing to the large expense of reagents. Considering the reagents
can be heavily cathode material specic, the protability of ML-direct
recycling largely depends on the sorting accuracy of the mixed retired
batteries. Further analysis of the detailed proportion of cost structure
in ML-direct recycling is summarized in Fig. 5c. The outer and inner
annuluses stand for NMC and LFP batteries, respectively. Except for
raw material and reagent, the sum of the other costs is the same in
price (5620 ¥/t) but more than twice the difference in percentage
(NMC for 28%, LFP for 13%). The cost of raw materials for NMC (29900
¥/t, accounting for 74%) is nearly three times that of LFP (9687.5 ¥/t,
accounting for 54%), which again indicates the prot of ML-direct
recycling is sensitive to the sorting accuracies. Figure 5d lists the cost,
revenue, and prot of LFP and NMC batteries using different recycling
methods. For the largest prot option, NMC battery using ML-direct
recycling (29944.25 ¥/t) is 2.25 times the second largest prot option
(LFP batteries using ML-direct recycling, 13279.51 ¥/t). It can be sum-
marized that ML-direct recycling has the largest revenue and prot.
Moreover, it is also noticed that the prot of recycling NMC is always
larger than LFP, highlighti ng the signicance of efciently sorting high-
value recycling candidates from a bulk of mixed retired batteries.
In a practical scenario, collected retired batteries could be
expensive and even impossible to sort by human-aided pretreatment,
especially when the recycling is scaling up. On the contrary, ML-direct
recycling has the unique advantage of efciently sorting the retired
batteries by leveraging existing data sources from multiple battery
recycling collaborators. An economic analysis using different machine
learning paradigms (independent learning, i.e., IL; and federated
machine learning, i.e., MV and WDV) is carried out in Fig. 5e, f. Due to
the high sorting accuracy of WDV, the two types of batteries (LFP and
NMC) can be completely sorted and the nal product can be utili zed to
assemble new batteries directly. On the contrary, the MV and IL would
produce signicant errors in distinguishing cathode materials, thus
leading to low-value products (impure materials) that are unable to be
directly utilized, requiring further rening. As a result, the prot
decreases asymptotically for MV and IL methods when sorting accu-
racy is lower than WDV, specically 97%. NMC battery recycling using
WDV-based ML-direct recycling has a high prot of 24389.33, 21611.88,
and 18834.42 ¥/t for the LFP/NMC ratio of 1:2, 1:1 and 2:1, respectively,
which are higher than those of pyrometallurgy (4372.32, 3994.46, and
3616.61 ¥/t) and hydrometallurgy (9957.45, 10039.27, and 10121.09
¥/t). The prots of pyrometallurgy and hydrometallurgy are not sen-
sitive to sorting accuracy since these methods do not require stringent
retired battery cathode material information. Such a high prot from
ML-direct recycling not merely stems from the inherited advantage of
direct recycling but is enabled by our effective and accurate retired
battery sorting. Finally, a qualitative comparison of different battery
recycling technologies is illustrated in Fig. 5g. ML-direct recycling
performs noticeable advantages in environmental protection7,54,
operation simplicity, privacy, data sharing, and prot. Our ML-direct
recycling method has huge socioeconomic values and can quickly
accelerate the development of the battery recycling industry, espe-
cially when next-generation batteries are even more complex in cath-
ode material diversities.
Discussion
We have successfully demonstrated our federated machine learning
framework, especially our proposed WDV strategy, serving as a key to
protable battery recycling from a practical perspective. Such success
is achieved by leveraging existing data sources to train a general data-
driven battery sorting model, rather than an expensive human-aided
sorting. Our model features a collaborative while privacy-preserving
fashion, enabling the direct recycling methodology, which is currently
heavily cathode-specic and sensitive to the recycling candidates. We
discuss the merit of our work from a multi-level perspective, including
the fundamental mechanism of battery sorting, the implication of
protable recycling, and the advantage of the federated battery recy-
cling paradigm.
To realize the sorting of retired batteries, we extracted 30 features
based on the battery charging-discharging and dQ/dV curves in the
feature engineering process. In the previous sections, we have ratio-
nalized the salient features, i.e., F1 and F16, from the machine learning
perspective that the feature space spanned by F1 and F16 is separable
for different cathode material types, as shown in Fig. 3d. Here we aim
to rationalize the physical interpretation of these two salient features.
Features F1 and F16 are extracted from the dQ/dV curve, commonly
used to analyze phase reactions in electrochemistry, though agnostic
to the underlying mechanism. Regarding battery thermodynamics, the
number of dQ/dV peaks and the corresponding voltage values can be
used to analyze the reaction on electrodes and to judge the compo-
sition of the cathode material. Regarding battery kinetics, the shape of
the dQ/dV curve can help analyze the transport capacity of electrons
and ions inside the battery, from which the chemical properties of
battery materials can be deduced. Here, the Gibbs phase law can fur-
ther help the rationalization: F = C-P + n, where F represents the degree
of freedom, C represents the number of independent components, P
represents the number of phase states, and n represents external
factors. When studyingelectrode materials, constanttemperature and
pressure are assumed; thus, n = 0. The number of independent com-
ponentsonthecathodeisC=2.SincethedischargingprocessofLFPis
a phase change process, there are two phases, i.e., P= 2. Since LCO,
NCM, and NCA are solid solutions, only one phase exists during the
discharging process i.e., P = 1. Therefore, the degree of freedom of LFP
(F = 0) is lower than that of LCO, NCM, and NCA (F = 1). As a result, the
voltage of LFP does exhibit signicant change during the reaction
process, consequently, there is a noticeable peak on the dQ/dV curve.
On the comparison, during the charging and discharging process of
LCO, NCM, and NCA (F = 1), the slope of the voltage change in the
plateau is more signicant than that of LFP, which can be reected on
the dQ/dV curve accordingly. Although LCO, NMC, and NCA (F = 1)
have similar structures, their components, and Li-ion mobility during
the charging-discharging process differ, resulting in different dQ/dV
peak values, which can be interpreted from F1 and F16. While other
features are possible to decipher battery kinetics, they demonstrate
less importance since more complicated expert knowledge is required
for further processing. However, we highlight the power of our
machine learning model by automatically utilizing the information
provided by F1 and F16, whichhave a clear physical interpretation and
underpin a general and high-accuracy model. Such good accuracies
are independent of historical usages and use only one cycle of end-of-
life charging and discharging data. Consequently, the battery recycling
collaborators realize good sorting accuracies with our proposed sali-
ent features aided by machine learning.
When the recycling collaborators successfully sort the retired
batteries from the recycler, a voting procedure is performed. Noting
that the data volume and data diversity of each recycling collaborator
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
may differ, the voting results could be biased to the speciccathode
depending on the data distribution. This still hinders the protability
of battery recycling given low sorting accuracies, highlighting the
signicance of our WDV strategy. The result shows using our WDV-
based federated machine learning framework, the battery recycling
industry has a high possibility of transforming from the current
human-aided battery sorting to an automatic, collaborative, and
privacy-preserving fashion, with high sorting accuracy. Such effective
retired battery sorting serves as a key to the battery recycling practice
using direct recycling, equivalently, our ML-direct recycling. Without
our method, the economic benets of direct recycling could be greatly
reduced to a level lower than traditional pyrometallurgy and hydro-
metallurgy even with small errors in sorting accuracy. In next-
generation battery recycling, there could be various battery types
involving different chemical compositions, including Si anode, Na ion,
lithium-sulfur, and zinc-air batteries, etc.The data collection for model
training will be even more challenging due to data privacy and data
heterogeneity (battery diversity), calling for federated machine
learning to address such issues. In addition to battery type informa-
tion, direct recycling also requires sorting for the state of health (SOH)
since SOH directly determines the amount of reagents added to the
direct recycling process, accounting for the majority of the recycling
cost. Excessive or insufcient reagents will lead to a declined product
quality, which in turn leads to a decline in revenue and sustainability.
Different from the sorting problem, SOH estimation is more challen-
ging since it requires historical data toformulate a regressionproblem.
Moreover, the SOH can be heavily dependent onhistorical usage, while
such information is difcult to retrieve at the end-of-life stage due to
poor lifecycle management. Using only eld-available information to
determine SOH remains a critical challenge. It should be noted that the
prot calculation assumes an 80% SOH forthe retired batteries. Future
work should consider SOH information to increase the protability of
ML-direct recycling, which is a great commercial concern.
As mentioned above, the core idea of adopting federated machine
learning into battery recycling is leveraging the existing data infor-
mation in a collaborative while privacy-preserving manner, which is
intuitively consistent with the distributed nature of battery data. We
note that the cost of sorting through machine learning is not con-
sidered, which attributes to a lack of relevant industry data and con-
version standards. Under a federated machine learning setting, the
recycler only needs to process battery information, thus the machine
learning cost is not sensitive to the recycling scale. We, therefore,
assume that the once-for-all machine learning cost will be covered
when the battery recycling scale enlarges. Even though more in-depth
investigations on the accuracy-privacy-cost balance should be con-
ducted, we emphasize that the proposed federated machine learning
framework tackles the common concerns in collaborative learning,
including privacy, efciency, and fairness, which can be addressed
consistently and elegantly. We begin by noticing that the data privacy
of the collaborating clients is fully protected, as neither the raw battery
data nor the extracted features are leaked out of their respective data
sources; even the as-trained local models are kept condential to the
data sources themselves. The only information being transferred, the
local battery cathode sorting result, can be appropriately encrypted
before transferal to eliminate potential eavesdroppers, ensuring priv-
acy budget. Also, with the full support of parallelized localtraining and
only one round of result transfer, the proposed framework is highly
efcient in computation and communication, which remains a huge
challenge in commercialization in other elds. Specically, the selec-
tion of random forests as the bottom-level machine learning algo-
rithm,instead of more advanced neural networkarchitectures, is made
with full consideration of the feature engineering settings and cost-
effectiveness requirements. Feature engineering, which prepares the
data for federated machine learning with expert-knowledge-based
information extraction, transforms the raw gigabyte-scale sequential
data into kilobyte-scale tabular data. Decision trees such as random
forests are more adept at learning from such low-dimensional data
with heterogeneous features, whether in terms of accuracy, efciency,
or interpretability. Also, advanced neural network architectures such
as Convolutional Neural Networks (CNNs) require much higher com-
putational power from every collaborating manufacturer, with a sig-
nicantly lengthened training time and compromised model
interpretability, despite gaining a slight edge in accuracy (See Sup-
plementary Fig. 11). Thus, our proposed framework is light and scalable
without requiring intensive investment in the battery recycling sector,
which is of signicant interest to industrial practice. The framework
further achieves fairness by assigning a local training task of the same
scale, though in different cathode types and sample sizes, to all clients
under the recycling collaboration. Even when compared with other
alternative federated machine learning frameworks, the proposed
framework is still better in terms of these metrics, as those alternative
frameworks would most likely require various rounds of model
updates with considerable parameter transferals, which would com-
promise efciency and expose collaborators to an immense level of
privacy leakage. Currently, the framework is implemented under the
ideal assumption that all collaborators are fully cooperative, such that
the uploaded local results are assumed to be reliable. Supplementary
Figure 12 shows that, despite such an ideal assumption, the random
forest model, incorporated with the Wasserstein distance voting, is
naturally robust against random parameter transfer losses even if
parameters from a few collaborators end up missing. The sorting
accuracy only slightly degrades given the same heterogeneity setting.
In the case of a blockchain-like environment with numerous colla-
borators of unknown trustworthiness and reliabilities, our research
could be further extended in search of a proper incentivization
mechanism such that all recycling collaborators would fully contribute
to their respective local model instead of attempting to become total
free-riders. We prospect quantifying the helpful information that the
recycling collaborators provided to design a benet distribution
strategy and a free-rider detection scheme to make the federated
battery recycling ecosystem economically feasible.
By exploring federated machine learning in the battery recycling
sector, the major concern on the protability of recycled products can
be guaranteed. Our work highlights a general retired battery sorting
model only using one cycle of end-of-life battery data, enabling the
rational design of a direct recycling route for higher product quality
and protability in practice. The privacy-preserving information-shar-
ing mechanism encourages extensive multi-party collaboration in
battery recycling practices, thanks to our proposed WDV strategy. Our
work enlightens using machine learning to facilitate an efcient and
protable next-generation battery recycling industry in the future.
To conclude, federated machine learning is a promising route for
retired battery sorting and enables emerging battery recycling tech-
nologies, especially direct recycling, in their development, practical
application, and optimization. We create a retired battery sorting
model using only one cycle of end-of-life charging and discharging
data as opposed to any historical data while preserving the data priv-
acy budgets of multiple battery recycling collaborators. In the homo-
geneous setting, we obtain a 1% cathode material sorting error; in the
heterogeneous setting, we obtain a 3% cathode material sorting error,
thanks to our Wasserstein-distance voting strategy. Such a level of
accuracy is achieved by (1) automaticallyexploring theunique patterns
in the salient features without assuming any prior knowledge of his-
torical operation conditions and (2) using our proposed Wasserstein-
distance voting strategy to correct heterogeneous data distribution
among recycling collaborators. An economic evaluation showcases the
relevance and necessity of accurate retired battery sorting to the
protable battery recycling industry using direct recycling. In general,
our approach can complement the existing rst-principle-based recy-
cling route research paradigms on actual battery recycling practice,
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 10
Content courtesy of Springer Nature, terms of use apply. Rights reserved
where retired batteries arenecessary whilechallenging to sort. Broadly
speaking, our work enlightens the possibilities of leveraging existing
data from multiple data owners, rather than extra time-consuming and
expensive data generations, to develop and optimize complex
decision-making procedures suchas the battery recycling route design
in a collaborative while privacy-preserving fashion.
Methods
Privacy-informed data augmentation
We perform data augmentation with two considerations: (1) more
diversied data for training a generalized model and (2) protecting
data privacy by preventing adversarial reconstruction (reverse engi-
neering to eavesdrop on the private data).Given a feature matrix FN×M
and a class label vector CN×1=fcig,i=f1,2, ,9g,whereNand Mare
the numbers of measured batteries (known as observations)and
associated features, respectively. The data augmentation includes the
following three steps: First, we index the feature matrix Ffor a subset
Fiusing each unique class label ci. Second, we augment Fiinto Fi
Aug by
resampling with replacements using Bootstrapping. The resampling
size (or the number of bootstrapped observations) of Fi
Aug is set to
S= 200. For the class label of Fi
Aug,wehavefe
ci
Augg=fcig, which means
the augmented class labels fe
ci
Auggis identical to theoriginal class labels.
Third, we add random Gaussian noise to each observation of Fi
Aug to
evaluate the robustness and the privacy budget of the trained model,
then we have a noisy feature matrix subset e
Fi
Aug.Ahyperparameter
NSR, i.e., the noise-to-signal ratio in percentage, controls the noise
intensity, dened as the ratio of noise power to signal power. Intui-
tively, the model performance Acan be deteriorated by increasing
NSR, denoted by a function AðNSRÞ.Thedenition of the privacy
budget (PB) is given:
PB= 100% × maxðNSRjAðNSRÞAÞð1Þ
where Adenotes the lower bound of acceptable accuracy of the model,
depending on specic application requirements.
Finally, we stack the augmented data of each class to get the
augmented feature matrix e
F9S×M=fe
Fi
Auggand the class label vector
e
C9S×1 =fe
ci
Augg,wherei=f1,2, ,9g. We use 80% (the primary split) of
the augmented data to generate the client samples as the training set.
We use 40% (the secondary split) of the remaining augmented data as
the testing set. Both primary and secondary splits are in a stratied
manner to ensure samples required in each client are sampled. The
detailed data split setting here is for illustration and can be modied to
further investigate the minimum data sample requirement for
collaborators.
Client Simulation
The federated machine learning framework involves multiple colla-
borators, known as clients, to train a global model jointly. One client
serves as a data contributor for the battery type sorting task in our
setting. In this work, we simulate 10 clients, each possessing different
classes and different observations of battery data. To be specic, each
Clientkis dened over a triplet, i.e., Clientkðlbk,ubk,NCkÞ,
k=f1,2, ,10g, where lbkand ubkare the minimum and maximum
number of observations in Clientk. The value of lbkand ubkare set to
100 and 200 for all clients, respectively. NCkstands for the minimum
number of classes in each Clientk, quantifying the level of client-wise
heterogeneity (namely, the heterogeneity index in the main manu-
script). Then, random observations are subsequentially drawn from
the augmented data fe
F,e
Cgfor the client based on the as-dened triplet.
Client model
The random forest is a decision-tree-based machine-learning algo-
rithm, with each tree dened over a collection of random variables.
Formally, for an m-dimensional feature vector X½x1,,xmT=e
F,
and a response vector Ye
C, the goal is to learn a prediction function
gðXÞfor predicting Y. The prediction function gðXÞis determined by
minimizing the expectation of the loss function L:
EXY LY,gXðÞðÞ
½ ð2Þ
where,thesubscriptsdenoteexpectationsonthejointdistributionof
Xand Y.
The j-thdecisiontree,orthej-th base learner, is denoted as
hjðX;ΘjÞ,whereΘjparameterizes a random collection of a set of ran-
dom variables of X. In the sorting, a class assignment rule to every
terminal (leaf) node tconsidering a zero-one loss function gives:
hjðX;ΘjÞ=argmax
y2e
C
pyjtðÞ ð3Þ
where, we pick the class with the maximum posterior probability.
The random forest constructs gby learning a series of base lear-
ners, h1ðX;Θ1Þ,,hJðX;ΘJÞ. These base learners are combined to give
an ensembled prediction function g, determined by the most fre-
quently predicted classes:
gXðÞ= argmax
y2e
CX
J
j=1
Iðy=hjðX;ΘjÞÞ ð4Þ
where, Iis the indicator function. Iðy=hjðX;ΘjÞÞ =1ify=hjðX;ΘjÞand 0
otherwise. The number of trees in each random forest is xed at ten,
i.e., J= 10 for a balanced classication accuracy and computation cost
(Supplementary Figure 13). We deliberately let the collaborators (cli-
ents) learn the most suitable random forest structure, i.e., the model
parameters, by themselves, rather than xing the parameters since
each collaborator could have very different battery numbers and
cathode material types. By only presetting the number of trees in the
random forest, the collaborators could have enough exibility to train
the best model that suits their own data distribution. The bottom-level
random forest algorithm (client model) is implemented using readily
available MATLAB packages, more specically, the TreeBagger func-
tion in the Statistics and Machine Learning Toolbox. The MATLAB
version is R2022a, and the code runs on a personal computer with Intel
(R) Core (TM) i5-10400 CPU @ 2.90 GHz RAM 8 GB.
Federated machine learning
In the proposed federated machine learning framework, local random
forests are rst trained on each client with its own local data in a
parallel fashion. Then the local client models are aggregated into a
global model by means of a proper voting strategy from the local
sorting results.In our work, batteryclass distribution across clients can
be heterogeneous, which brings difculty in aggregating the biased
client models. To this end, we propose a Wasserstein distance voting
method to aggregate the client models rather than the traditional
majority voting. Our model aggregation method is robust to hetero-
geneous class distributions across clients. The core idea of the Was-
serstein distance voting is to reduce the weightings of clients whose
observations are similar to ones in the global model. The Wasserstein
distance measure is dened as:
Wq,ðÞ=inf
γ2MP ZΩ1×Ω2
jx1x2jqdγðx1,x2Þ
!
1
q
ð5Þ
where γis a transport operator, referring to the transport of arbitrary
attributes pairs, i.e., ðx1,x2Þ, from the global feature space Ω1to the
client feature space Ω2. MP stands for a measurably preserved
transport.
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
The Wasserstein distance voting term ωis dened as:
ωk=αλð1MkðWqÞÞ ð6Þ
where, α>0andλ> 0 are voting hyperparameters. Mkis the average
operator on the pairwise Wasserstein distance between feature spaces
of Clientkand the global feature space, i.e., the recycler.
The aggregated global model GðxÞcan be obtained as:
GxðÞ= argmax
y2e
CX
K
k=1
ωkIy=gkxðÞ
 ð7Þ
where, Kis the number of clients. Iis the indicator function.
Iðy=hjðX;ΘjÞÞ =1if y=hjðX;ΘjÞand 0 otherwise. Finally, to maximally
protect data privacy, the local votes are properly encrypted before
being uploaded to the recycler. Most encryption methods, e.g., secure
Hash Algorithms, shall sufce without compromising the sorting
accuracy. The high-level federated machine learning framework,
including the Wasserstein-distance voting and the transfer of
parameters, is implemented from scratch.
Feature importance
We use permutation importance to measure the feature importance in
the client model, i.e., the random forest algorithm. The core idea of the
permutation importance is to use out-of-bag data to examine the
effect of feature permutation using the trained random forest model.
In the rst step, a prediction is made on several observations of the
out-of-bag data. In the second step, the feature θm,m=1, ,mis
randomly permutated for observations of out-of-bag data, then the
modied out-of-bag data is passed down each tree in the random
forest. In the rst and second steps, two predictions are made by the
trained random forest model, i.e., ^
Cand ^
C*. The permutation feature
importance of θmis dened as:
Impm
n=1
JnX
j2In
Iðyn^
C*
n,jÞ 1
JnX
j2In
Iðyn^
Cn,jÞð8Þ
where, Inis the cardinality of the nout-of-bag observations, Jnis the
number of trees in the random forest considering nout-of-bag
observations. The feature importance of Impm
nis averaged over all
observations as a global importance in the client model. Similarly,
feature importance in the federated machine learning framework is
averaged over all clients.
Evaluation metric
We use a one-vs-all prediction strategy to predict a multi-class classi-
cation problem, such that the original problem is converted into
several binary classication problems. The accuracy of each binary
classication sub-problems is dened as follows:
accura cy = Number of correct predictions
Total number of predictions =TP + TN
TP+FN+FP+TN ð9Þ
where, TP,FP,FN,TN refer to the number of true positive, false positive,
false negative, and true negative predictions.
The prediction accuracy for the multi-class classication problem
gives:
Accuracy = 1
NC X
NC
i=1
accuracyið10Þ
where, iis the class label, NC is the number of battery classes.
The accuracy could not provide an adequate model evaluation
when classication samples are imbalanced, i.e., heterogeneous data
distribution. Thus, the F1-score is used, whose denition is as follows:
F1 = 2 × Precision × Reca ll
Precis ion + Recall ð11Þ
where, Precision = TP =ðTP + FPÞand Recall = TP=ðTP + FNÞ.
Reporting summary
Further information on research design is available in the Nature
Portfolio Reporting Summary linked to this article.
Data availability
The Center for Advanced Life Cycle Engineering (CALCE), Hawaii
Natural Energy Institute (HNEI), University of Michigan (MICH), Uni-
versity of Oxford (OX), the Sandia National Laboratories (SNL) and
Underwriters LaboratoriesPurdue University (UL-PUR) datasets used
in this study are available at www.batteryarchive.org.Forthefull
details of the dataset and policies for data reuse, please refer to their
website, respectively. Source data are provided with this paper.
Code availability
Code for the modeling work is available from the corresponding
authors upon request.
References
1. Zheng, M. et al. Intelligence-assisted predesign for the sustainable
recycling of lithium-ion batteries and beyond. Energy Environ. Sci.
14,58015815 (2021).
2. Tao, Y., Rahn, C. D., Archer, L. A. & You, F. Second life and
recycling: Energy and environmental sustainability perspectives for
high-performance lithium-ion batteries. Sci. Adv. 7,eabi7633
(2021).
3. Gent, W. E., Busse, G. M. & House, K. Z. The predicted persistence of
cobalt in lithium-ion batteries. Nat. Energy 7, 11321143 (2022).
4. Harper, G. et al. Recycling lithium-ion batteries from electric vehi-
cles. Nature 575,7586 (2019).
5. Chen, M. et al. Recycling end-of-life electric vehicle lithium-ion
batteries. Joule 3,26222646 (2019).
6. Wang, J. et al. Direct and green repairing of degraded LiCoO2 for
reuse in lithium-ion batteries. Natl Sci. Rev. 9, nwac097 2022.
7. Ji, G. et al. Direct regeneration of degraded lithium-ion battery
cathodes with a multifunctional organic lithium salt. Nat. Commun.
14,584(2023).
8. Wu, J. et al. Direct recovery: a sustainable recycling technology
for spent lithium-ion battery. Energy Storage Mater. 54,
120134 (2023).
9. Zheng, Y. et al. The effects of phosphate impurity on recovered
LiNi0.6Co0.2Mn0.2O2 cathode material via a hydrometallurgy
method. ACS Appl. Mater. Interfaces 14,4862748635 (2022).
10. Yu, H. et al. Key technology and application analysis of quickcoding
for recovery of retired energy vehicle battery. Renew. Sustain.
Energy Rev. 135, 110129 (2021).
11. Weng, A., Dufek, E. & Stefanopoulou, A. Battery passports for pro-
moting electric vehicle resale and repurposing. Joule 7,
837842 (2023).
12. Ward, L. et al. Principles of the battery data genome. Joule 6,
22532271 (2022).
13. Lai, X. et al. Sorting, regrouping, and echelon utilization of the large-
scale retired lithium batteries: a critical review. Renew. Sustain.
Energy Rev. 146, 111162 (2021).
14. Tan,D.H.S.,Banerjee,A.,Chen,Z.&Meng,Y.S.Fromnanoscale
interface characterization to sustainable energy storage using all-
solid-state batteries. Nat. Nanotechnol. 15,170180 (2020).
15. Sulzer, V. et al. The challenge and opportunity of battery lifetime
prediction from eld data. Joule 5,19341955 (2021).
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved
16. Aitio,A.&Howey,D.A.Predictingbatteryendoflifefromsolaroff-
grid system eld data using machine learning. Joule 5,
32043220 (2021).
17. Chen, B.-R. et al. Battery aging mode identication across NMC
compositions and designs using machine learning. Joule 6,
27762793 (2022).
18. Lu, Y., Zhao, C.-Z., Huang, J.-Q. & Zhang, Q. The timescale identi-
cation decoupling complicated kinetic processes in lithium bat-
teries. Joule 6, 11721198 (2022).
19. Zhang, Y. et al. Identifying degradation patterns of lithium ion bat-
teries from impedance spectroscopy using machine learning. Nat.
Commun. 11, 1706 (2020).
20. Severson, K. A. et al. Data-driven prediction of battery cycle life
before capacity degradation. Nat. Energy 4,383391 (2019).
21. Hu,X.,Xu,L.,Lin,X.&Pecht,M.Batterylifetimeprognostics.Joule
4,310346 (2020).
22. Tao, S. et al. Battery cross-operation-condition lifetime prediction
via interpretable feature engineering assisted adaptive machine
learning. ACS Energy Lett.8,32693279 (2023).
23. Li, T., Zhou, Z., Thelen, A., Howey, D. & Hu, C. Predicting battery
lifetime under varying usage conditions from early aging data . arXiv
preprint arXiv:230708382 (2023).
24. Fu, S. et al. Data-driven capacity estimation for lithium-ion batteries
with feature matching based transfer learning method. Appl.
Energy 353, 121991 (2024).
25. Jones, P. K., Stimming, U. & Lee, A. A. Impedance-based forecasting
of lithium-ion battery performance amid uneven usage. Nat. Com-
mun. 13, 4806 (2022).
26. Ng, M.-F., Zhao, J., Yan, Q., Conduit, G. J. & Seh, Z. W. Predicting the
state of charge and health of batteries using data-driven machine
learning. Nat. Mach. Intell. 2,161170 (2020).
27. Attia, P. M. et al. Closed-loop optimization of fast-charging proto-
cols for batteries with machine learning. Nature 578,
397402 (2020).
28. Jiang, B. et al. Bayesian learning for rapid prediction of lithium-ion
battery-cycling protocols. Joule 5,31873203 (2021).
29. Harris,S.J.&Noack,M.M.Statistical and machine learning-based
durability-testing strategies for energy storage. Joule 7,
920934 (2023).
30. Meunier, V., Leal De Souza, M., Morcrette, M. & Grimaud, A. Design
of workows for crosstalk detection and lifetime deviation onset in
Li-ion batteries. Joule 7,4256 (2023).
31. Lv, C. et al. Machine learning: an advanced platform for materials
development and state prediction in lithium-ion batteries. Adv.
Mater. 34, 2101474 (2022).
32. Weng, A. et al. Predicting the impact of formation protocols on
battery lifetime immediately after manufacturing. Joule 5,
29712992 (2021).
33. Dikmen, İ.C.&Karadağ, T. Electrical method for battery chemical
composition determination. IEEE Access 10, 64966504
(2022).
34. Zhong, P., Deng, B., He, T., Lun, Z. & Ceder G. Deep learning of
experimental electrochemistry for battery cathodes across diverse
compositions. arXiv https://doi.org/10.48550/arXiv.2304.
04986 (2023).
35. Aykol, M., Herring, P. & Anapolsky, A. Machine learning for con-
tinuous innovation in battery technologies. Nat. Rev. Mater. 5,
725727 (2020).
36. dos Reis, G., Strange, C., Yadav, M. & Li, S. Lithium-ion battery data
and where to nd it. Energy AI 5, 100081 (2021).
37. Dufek, E. J., Tanim, T. R., Chen, B.-R. & Sangwook, K. Battery
calendar aging and machine learning. Joule 6,13631367 (2022).
38. Zhang, C. et al. A survey on federated learning. Knowl.-Based Syst.
216, 106775 (2021).
39. Collaborative learning without sharing data. Nat. Mach. Intell. 3,
459 (2021).
40. Moore, H., Ramage, E., Hampson, D. & Blaise, S. Communication-
efcient learning of deep networks from decentralized data. Arti-
cial intelligence and statistics 12731282 (PMLR, 2017).
41. Warnat-Herresthal, S. et al. Swarm Learning for decentralized and
condential clinical machine learning. Nature 594,265270 (2021).
42. Dayan, I. et al. Federated learning for predicting clinical outcomes
in patients with COVID-19. Nat. Med. 27,17351743 (2021).
43. Ogier du Terrail, J. et al. Federated learning for predicting histolo-
gical response to neoadjuvant chemotherapy in triple-negative
breast cancer. Nat. Med. 29,135146 (2023).
44. Pati, S. et al. Federated learning enables big data for rare cancer
boundary detection. Nat. Commun. 13, 7346 (2022).
45. Bercea, C. I., Wiestler, B., Rueckert, D. & Albarqouni, S. Federated
disentangled representation learning for unsupervised brain
anomaly detection. Nat. Mach. Intell. 4,685695 (2022).
46. Wu, C. et al. A federated graph neural network framework for
privacy-preserving personalization. Nat. Commun. 13, 3091 (2022).
47. Yang, H. et al. Lead federated neuromorphic learning for wireless
edge articial intelligence. Nat. Commun. 13, 4269 (2022).
48. Lim, W. Y. B. et al. Federated learning in mobile edge networks: a
comprehensive survey. IEEE Commun. Surv. Tutor. 22,
20312063 (2020).
49. Liu, H., Zhang, X., Shen, X., Sun, H. & Shahidehpour, M. A hybrid
federated learning framework with dynamic task allocation for
multi-party distributed load prediction. IEEE Trans. Smart Grid 14,
24602472 (2023).
50. Liu, H., Zhang, X., Shen, X. & Sun, H. Privacy-preserving power
consumption prediction based on federated learning with cross-
entity data. 2022 34th Chinese Control and Decision Conference
(CCDC)2022. p. 181-186.
51. Liu H., Zhang X., Sun H. & Shahidehpour M. Boosted multi-task
learning for inter-district collaborative load forecasting. IEEE
Transactions on Smart Grid (IEEE, 2023).
52. Geslin, A. et al. Selecting the appropriate features in batt ery lifetime
predictions. Joule 7,19561965 (2023).
53. Olejnik, Ł.,Acar,G.,Castelluccia,C.&Diaz,C.Data Privacy Man-
agement, and Security Assurance (eds. Garcia-Alfaro, J., Navarro-
Arribas, G., Aldini, A., Martinelli, F. & Suri, N.) p. 254-263 (Springer
International Publishing, 2016).
54. Wang, J. et al. Sustainable upcycling of spent LiCoO2 to an ultra-
stable battery cathode at high voltage. Nat. Sustain.6,
797805 (2023).
Acknowledgements
This work was supported by the Shenzhen Science and Technology
Program (Grant No. KQTD20170810150821146) [X.Z.], the Tsinghua
Shenzhen International Graduate School Interdisciplinary Innovative
Fund (JC2021006) [X.Z. and G.Z.], the Key Scientic Research Support
Project of Shanxi Energy Internet Research Institute (SXEI2023A002)
[X.Z.] and the Tsinghua Shenzhen International Graduate School-
Shenzhen Pengrui Young Faculty Program of Shenzhen Pengrui Foun-
dation(SZPR2023007) [G.Z.]. The rst author would like to thank Xin Qin
from the University of Cambridge, Zihao Zhou from the University of
Oxford, Tsaijou Wu from Jinan University, Tingwei Cao, and Zixi Zhao
from Tsinghua University for their helpful discussions in preparing the
manuscript accessible to a broad readership. The authors would like to
thank Prof. Qiang Yang from WeBank, as well as Ms. Yaxin Wang,for their
constructive insights on federated learning. The authors would like to
thank Dr. Yuliya Preger, from Sandia National Laboratory, co-founder of
batteryarchive.org, for providing the rst public-available repository for
easy comparison of lithium-ion battery degradation data across
institutions.
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Author contributions
S.T. conceptualized, designed, and performed the numerical
experiments and prepared the rst manuscript draft; H.L. discussed
the experiments and prepared the response letter to the reviewers;
C.S. contributed to the techno-environmental analysis by specifying
the details of battery recycling; H.J., G.J., Z.H., R.G., and J.M. con-
tributed to identifying the scientic issues of cathode sorting in bat-
tery recycling; R.M. and Y.C. reviewed and edited the rst and revised
manuscript draft; S.F., Y.W., and Y.S. contributed to the data curation
and discussions; Y.R. contributed to machine learning experiment
design in the revised manuscript and discussions; X.Z., G.Z., and H.S.
conceptualized, reviewed, discussed, supervised this work and
retrieved fundings.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version contains
supplementary material available at
https://doi.org/10.1038/s41467-023-43883-y.
Correspondence and requests for materials should be addressed to
Xuan Zhang, Guangmin Zhou or Hongbin Sun.
Peer review information Nature Communications thanks Zongguo
Wang, and the other, anonymous, reviewer(s) for their contribution to
the peer review of this work. A peer review le is available.
Reprints and permissions information is available at
http://www.nature.com/reprints
Publishers note Springer Nature remains neutral with regard to jur-
isdictional claims in published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if
changes were made. The images or other third party material in this
article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not
included in the articles Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2023
Article https://doi.org/10.1038/s41467-023-43883-y
Nature Communications | (2023) 14:8032 14
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Numerous battery diagnostic and prognostic methods have been proposed [10][11][12][13][14][15], including model-based methods, data-driven methods, and hybrid methods. Recently, thanks to the rapid development of artificial intelligence [16,17] and the open source of numerous battery datasets [18][19][20][21][22], data-driven methods combining machine learning (ML) algorithms and feature engineering have achieved superior performance in battery diagnosis and prognosis, showing great application potential in battery material development [23], manufacturing processes improvement [24,25], aging mechanism revelation [26], fast-charging protocol selection [27], management strategy optimization [28,29], recycling [30,31], etc. However, most data-driven methods and their feature engineering rely on laboratory data at specific stages under specific conditions, such as charge/discharge data [32,33], impedance data [34,35], and relaxation data [36]. ...
... Note that, to improve the prediction accuracy of 1 and 2 , we first divided the field SOC into multiple independent intervals. Specifically, the value of is divided into ten independent intervals, namely Specifically, in Dataset 1, according to the numerical range of 1 corresponding to 1 =10 Hz in the training set, the value of 1 is divided into four independent intervals, namely [14,18), [18,22), [22,26), and [26,30]. Therefore, a total of forty ML models are trained for the prediction of 1 corresponding to 1 =10 Hz. ...
... Therefore, a total of forty ML models are trained for the prediction of 1 corresponding to 1 =10 Hz. According to the numerical range of 2 corresponding to 2 =312.5 Hz in the training set, the value of 2 is divided into four independent intervals, namely [14,18), [18,22), [22,26), and [26,30]. Therefore, a total of forty ML models are trained for the prediction of 2 corresponding to 2 =312.5 Hz. ...
Preprint
Aiming at the dilemma that most laboratory data-driven diagnostic and prognostic methods cannot be applied to field batteries in passenger cars and energy storage systems, this paper proposes a method to bridge field data and laboratory data using machine learning. Only two field real impedances corresponding to a medium frequency and a high frequency are needed to predict laboratory real impedance curve, laboratory charge/discharge curve, and laboratory relaxation curve. Based on the predicted laboratory data, laboratory data-driven methods can be used for field battery diagnosis and prognosis. Compared with the field data-driven methods based on massive historical field data, the proposed method has the advantages of higher accuracy, lower cost, faster speed, readily available, and no use of private data. The proposed method is tested using two open-source datasets containing 249 NMC cells. For a test set containing 76 cells, the mean absolute percentage errors of laboratory real impedance curve, charge curve, and discharge curve prediction results are 0.85%, 4.72%, and 2.69%, respectively. This work fills the gap between laboratory data-driven diagnostic and prognostic methods and field battery applications, making all laboratory data-driven methods applicable to field battery diagnosis and prognosis. Furthermore, this work overturns the fixed path of developing field battery diagnostic and prognostic methods based on massive field historical data, opening up new research and breakthrough directions for field battery diagnosis and prognosis.
... As the most expensive part of the vehicle, the battery pack plays a key role in performance, range, and on-the-road costs. Therefore, precise estimation of the capacity of the battery is vital for energy management optimization, enhanced longevity, and reliability assurance within the EV [1][2][3]. One of the critical aspects of EV management is the battery capacity estimation, especially under load -this greatly impacts the range, efficiency and performance of the vehicle as a whole. ...
... New ML methods are emerging as strong tools to model these complexities. Dealing with this issue has also been tackled in several other works, such as using ANN, RF and recurrent architectures like LSTM [1,4]. ANNs were inspired by a combination of computational frameworks based on how human brains are structured. ...
... The proposed framework used voltage and current data that were obtained when the cells were connected to different loads and can accurately classify the five types of batteries under testing, including lithium nickel cobalt aluminium oxide, lithium iron phosphate, nickel-metal hydride and lithium titanate oxide. In ref. [32], the authors proposed a federated machine learning approach for battery recycling purposes, where low cathode sorting errors are achieved utilising features extracted from the end-of-life cycle among five different cathode materials, such as the peak intensity of the dV dQ curve during charging or discharging, the skewness statistics of the voltage, the kurtosis statistics of the capacity, and so on. In ref. [33], the authors proposed a supervised machine learning framework for accurately classifying different lithium-sulfur battery electrolytes. ...
Article
Full-text available
Lithium-ion batteries (LIBs) are widely used in diverse applications, ranging from portable ones to stationary ones. The appropriate handling of the immense amount of spent batteries has, therefore, become significant. Whether recycled or repurposed for second-life applications, knowing their chemistry type can lead to higher efficiency. In this paper, we propose a novel machine learning-based approach for accurate chemistry identification of the electrode materials in LIBs based on their temperature dynamics under constant current cycling using gated recurrent unit (GRU) networks. Three different chemistry types, namely lithium nickel cobalt aluminium oxide cathode with silicon-doped graphite anode (NCA-GS), nickel cobalt aluminium oxide cathode with graphite anode (NCA-G), and lithium nickel manganese cobalt oxide cathode with graphite anode (NMC-G), were examined under four conditions, 0.2 C charge, 0.2 C discharge, 1 C charge, and 1 C discharge. Experimental results showed that the unique characteristics in the surface temperature measurement during the full charge or discharge of the different chemistry types can accurately carry out the classification task in both experimental setups, where the model is trained on data under different cycling conditions separately and jointly. Furthermore, experimental results show that the proposed approach for chemistry type identification based on temperature dynamics appears to be more universal than voltage characteristics. As the proposed approach has proven to be efficient in the chemistry identification of the electrode materials LIBs in most cases, we believe it can greatly benefit the recycling and second-life application of spent LIBs in real-life applications.
... Ineffective management of this surplus could lead to significant resource wastage and environmental harm. Therefore, the development of advanced battery recycling technologies is crucial 3 . The imperative for recycling is further underscored by the necessity of reclaiming valuable materials. ...
Article
Full-text available
In recent years, deep learning techniques have been extensively used for the identification and classification of lithium-ion batteries. However, these models typically require a costly and labor-intensive labeling process, often influenced by commercial or proprietary concerns. In this study, we introduce RecyBat24, a publicly accessible image dataset for the detection and classification of three battery types: Pouch, Prismatic, and Cylindrical. Our dataset is designed to support both academic research and industrial applications, closely replicating real-world scenarios during the acquisition process and employing data augmentation techniques to simulate various external conditions. Additionally, we demonstrate how the RecyBat24’s detection-oriented annotations can be used to create a second version of RecyBat24for instance-segmentation tasks. Finally, we demonstrate that recent lightweight machine learning models achieve high accuracy, highlighting their potential for classification and segmentation applications where computational resources are constrained.
... Lithium-ion batteries (LIBs), due to their high energy density, long cycle life, and environmental friendliness, are widely used in EVs power batteries and energy storage applications [6][7][8][9]. The vast market has accelerated the technological development of LIBs, posing stricter demands on their efficiency, health, safety, and environmental friendliness [10][11][12]. ...
Article
Full-text available
The accurate state of health (SOH) estimation of lithium-ion batteries is crucial for efficient, healthy, and safe operation of battery systems. Extracting meaningful aging information from highly stochastic and noisy data segments while designing SOH estimation algorithms that efficiently handle the large-scale computational demands of cloud-based battery management systems presents a substantial challenge. In this work, we propose a quantum convolutional neural network (QCNN) model designed for accurate, robust, and generalizable SOH estimation with minimal data and parameter requirements and is compatible with quantum computing cloud platforms in the Noisy Intermediate-Scale Quantum. First, we utilize data from 4 datasets comprising 272 cells, covering 5 chemical compositions, 4 rated parameters, and 73 operating conditions. We design 5 voltage windows as small as 0.3 V for each cell from incremental capacity peaks for stochastic SOH estimation scenarios generation. We extract 3 effective health indicators (HIs) sequences and develop an automated feature fusion method using quantum rotation gate encoding, achieving an R2 of 96%. Subsequently, we design a QCNN whose convolutional layer, constructed with variational quantum circuits, comprises merely 39 parameters. Additionally, we explore the impact of training set size, using strategies, and battery materials on the model’s accuracy. Finally, the QCNN with quantum convolutional layers reduces root mean squared error by 28% and achieves an R2 exceeding 96% compared to other three commonly used algorithms. This work demonstrates the effectiveness of quantum encoding for automated feature fusion of HIs extracted from limited discharge data. It highlights the potential of QCNN in improving the accuracy, robustness, and generalization of SOH estimation while dealing with stochastic and noisy data with few parameters and simple structure. It also suggests a new paradigm for leveraging quantum computational power in SOH estimation.
... Rechargeable batteries are ubiquitous in modern industry, including electric vehicles, power grids, and portable devices [12,25,36]. Nevertheless, batteries inevitably degrade with cyclic operation due to intrinsic electrochemical mechanisms [4,7,34]. ...
Preprint
Full-text available
Battery Life Prediction (BLP), which relies on time series data produced by battery degradation tests, is crucial for battery utilization, optimization, and production. Despite impressive advancements, this research area faces three key challenges. Firstly, the limited size of existing datasets impedes insights into modern battery life data. Secondly, most datasets are restricted to small-capacity lithium-ion batteries tested under a narrow range of diversity in labs, raising concerns about the generalizability of findings. Thirdly, inconsistent and limited benchmarks across studies obscure the effectiveness of baselines and leave it unclear if models popular in other time series fields are effective for BLP. To address these challenges, we propose BatteryLife, a comprehensive dataset and benchmark for BLP. BatteryLife integrates 16 datasets, offering a 2.5 times sample size compared to the previous largest dataset, and provides the most diverse battery life resource with batteries from 8 formats, 59 chemical systems, 9 operating temperatures, and 421 charge/discharge protocols, including both laboratory and industrial tests. Notably, BatteryLife is the first to release battery life datasets of zinc-ion batteries, sodium-ion batteries, and industry-tested large-capacity lithium-ion batteries. With the comprehensive dataset, we revisit the effectiveness of baselines popular in this and other time series fields. Furthermore, we propose CyclePatch, a plug-in technique that can be employed in various neural networks. Extensive benchmarking of 18 methods reveals that models popular in other time series fields can be unsuitable for BLP, and CyclePatch consistently improves model performance establishing state-of-the-art benchmarks. Moreover, BatteryLife evaluates model performance across aging conditions and domains. BatteryLife is available at https://github.com/Ruifeng-Tan/BatteryLife.
Article
Full-text available
Photocatalysis offers an energy‐efficient and sustainable solution to environmental pollution and energy shortages. The core of this process lies in photocatalysis. However, establishing a clear relationship between their structure and performance through traditional experimental methods is often time‐intensive and labor‐intensive. Machine learning (ML) has recently gained traction in guiding photocatalyst synthesis, though it is often challenged by limited data availability. This study introduces a dynamic ML‐guided approach that iteratively optimizes experimental parameters through successive cycles of ML analysis and experimentation, effectively circumventing the need for large datasets. Applied to the synthesis of photocatalysts via microwave heating of quercetin, this method yielded optimal performance after three iterations, achieving a high hydrogen peroxide production rate. This ML approach demonstrates an effective few‐shot ML optimization strategy for catalyst synthesis.
Article
Full-text available
In the development of battery science, machine learning (ML) has been widely employed to predict material properties, monitor morphological variations, learn the underlying physical rules and simplify the material-discovery processes. However, the widespread adoption of ML in battery research has encountered limitations, such as the incomplete and unfocused databases, the low model accuracy and the difficulty in realizing experimental validation. It is significant to construct the dataset containing specific-domain knowledge with suitable ML models for battery research from the application-oriented perspective. We outline five key challenges in the field and highlight potential research directions that can unlock the full potential of ML in advancing battery technologies.
Preprint
Full-text available
Artificial intelligence (AI) has emerged as a powerful tool in the discovery and optimization of novel battery materials. However, the adoption of AI in battery cathode representation and discovery is still limited due to the complexity of optimizing multiple performance properties and the scarcity of high-fidelity data. In this study, we present a comprehensive machine-learning model (DRXNet) for battery informatics and demonstrate the application in discovery and optimization of disordered rocksalt (DRX) cathode materials. We have compiled the electrochemistry data of DRX cathodes over the past five years, resulting in a dataset of more than 30,000 discharge voltage profiles with 14 different metal species. Learning from this extensive dataset, our DRXNet model can automatically capture critical features in the cycling curves of DRX cathodes under various conditions. Illustratively, the model gives rational predictions of the discharge capacity for diverse compositions in the Li--Mn--O--F chemical space and high-entropy systems. As a universal model trained on diverse chemistries, our approach offers a data-driven solution to facilitate the rapid identification of novel cathode materials, accelerating the development of next-generation batteries for carbon neutralization.
Article
Full-text available
The continued market growth for electric vehicles globally is accelerating the transformational shift to a low-carbon transportation future. However, the sustainability of this transition hinges to a large extent on the management of waste, including end-of-life batteries where strategic elements such as lithium (Li) and cobalt (Co) are present. Different from the existing pyrometallurgical and hydrometallurgical recycling methods that involve heavy energy inputs and the use of hazardous chemicals, here we show a feasible single-step process that not only reclaims lithium cobalt oxide (LiCoO2) from waste Li-ion batteries but also upgrades it to a cathode with enhanced electrochemical properties. Our recycling process is based on a direct reaction between spent LiCoO2 and added mixture of Al2O3, MgO and Li2CO3, during which the Li vacancies aid the diffusion of Al and Mg to yield dual-doped LiCoO2. The upgraded LiCoO2 cathode possesses even better structural stability and sustains 300 cycles retaining 79.7% of its initial capacity at a voltage of 4.6 V. As evidenced by the technoeconomic analysis, the current circularity approach exhibits cost benefits and could catalyse further progress in the upcycling of different materials for batteries.
Article
Accurate capacity estimation is essential in the management of lithium-ion batteries, as it guarantees the safety and dependability of battery-powered systems. However, direct measurement of battery capacity is challenging due to the unpredictable working conditions and intricate electrochemical characteristics, which complicates the identification of battery degradation. In this work, through in-depth analysis of battery aging data, an incremental slope (IS) aided feature extraction method is proposed to obtain universal multidimensional features that adapt to different working conditions. With the extracted features, a simple multilayer perceptron (MLP) is used to achieve high-precision capacity estimation. Furthermore, a feature matching based transfer learning (FM-TL) method is proposed to automatically adapt the capacity estimation across different types of batteries that are cycled under various working conditions. 158 batteries covering five material types and 15 working conditions are used to validate the proposed method. Results suggest that the MLP model can provide an accurate capacity estimation, where the overall mean absolute percentage error (MAPE) and root mean square percentage error (RMSPE) are limited to 1.22% and 1.61%, respectively. Furthermore, compared with the traditional fine-tuning method, the overall MAPE and RMSPE under various transfer learning application scenarios respectively decrease by up to 78.23% and 75.31%, indicating that the FM-TL method is promising to construct a reliable transfer learning path, which improves the accuracy and reliability of capacity estimation when applied to various target domains.
Article
We develop an adaptive machine-learning framework that addresses cross-operation-condition battery lifetime prediction, particularly under extreme conditions. This framework uses correlation alignment to correct feature divergence under fast-charging and extremely fast-charging conditions. We report a linear correlation between feature adaptability and prediction accuracy. Higher adaptability generally leads to better prediction accuracy, aiding efficient feature engineering. Our analysis shows that the first 120 cycles provide sufficient information for lifetime prediction, and extending data to the first 320 cycles only marginally improves prediction accuracy. An early prediction using only one feature at the 20th cycle produces a 93.3% accuracy, saving up to 99.4% computation time and repetitive tests. Our quantitative adaptability evaluation enhances prediction accuracy while reducing information redundancy via proper feature and cycle selections. The proposed framework is validated under another unseen complex operation condition with a 90.3% accuracy without prior knowledge. L ithium-ion batteries (LIBs) have been broadly deployed in consumer electronics, 1 electric vehicles, 2 battery energy storage systems, 3 and smart grid applications 4 due to their high energy density, 5 wide working temperature range, 6 and mature technology ecology. However, such batteries continuously degrade during cycling, leading to severe issues such as capacity drop, 7 temperature rise, 8 cell-to-cell inconsistency, 9,10 and shortened lifetime. For safety concerns, it is therefore essential for battery management systems to accurately predict the state of health (SOH) and the remaining useful life (RUL) of batteries. In addition, accurate knowledge of the SOH and RUL helps to evaluate the batteries for next-stage decision-making, such as repurposing in second life 11 and recycling routine selection 12 at the end of life. Therefore, prediction of the SOH and RUL is critically important throughout a battery's life, while it remains challenging due to the constantly changing operation conditions. Much previous research reported mechanism-driven and semiempirical prediction methods. For the mechanism-informed methods, a pseudo-two-dimensional model, 13 a single-particle model, 14 electrochemical impedance spectros-copy, 15,16 distribution of relaxation time, 17,18 an equivalent circuit model, 19 incremental capacity analysis, 20 and differential voltage analysis 21 are advantageous in accurately predicting microscopic degradation, such as lithium plating, 22 solid-electrolyte-interphase (SEI) formation, 23 loss of lithium inventory (LLI), 24 and loss of active materials (LAM). 25 However, the diverse operation conditions, such as dynamic charging and discharging protocols, 26 state of charge, 27 and ambient temperatures, 28 can cause significant divergence in the primary degradation mechanisms, leading to poor performance in practical use. In contrast to the mechanism-driven method, semiempirical methods are developed by assuming equivalent circuit model 29 and empirical battery degradation patterns by deliberately fitting the historical usage parameters into the
Article
This paper proposes a boosted multi-task learning framework for inter-district collaborative load forecasting. The proposed framework involves two subsequent stages: in the first stage, districts would collaborate under a seamlessly-integrated federated learning scheme to capture the global load pattern; in the second stage, districts would withdraw and perform local training to capture the local load patterns. The probabilistic Gradient-Boosted Regression Tree (GBRT) is applied as the bottom-level machine learning algorithm, which would allow for an easy and intuitive embodiment of the generalized multi-task learning framework. We further propose two candidate district withdrawal mechanisms to connect the two stages: the simultaneous withdrawal, which prioritizes prediction accuracy, and the dynamic withdrawal, which prioritizes training efficiency and district incentivization. The follow-up performance analyses and the case study on 11 districts of the Zhuhai city confirm the superiority of the proposed framework and district withdrawal mechanisms.