Available via license: CC BY 4.0
Content may be subject to copyright.
biquality-learn: a Python library for Biquality
Learning
Pierre Nodet
pierre.nodet@orange.com
Orange Innovation, Université Paris-Saclay
Châtillon, France
Vincent Lemaire
vincent.lemaire@orange.com
Orange Innovation
Lannion, France
Alexis Bondu
alexis.bondu@orange.com
Orange Innovation
Châtillon, France
Antoine Cornuéjols
antoine.cornuejols@agroparistech.fr
AgroParisTech, INRAe, Université Paris-Saclay
Palaiseau, France
Abstract
The democratization of Data Mining has been widely suc-
cessful thanks in part to powerful and easy-to-use Machine
Learning libraries. These libraries have been particularly
tailored to tackle Supervised Learning. However, strong su-
pervision signals are scarce in practice, and practitioners
must resort to weak supervision. In addition to weaknesses
of supervision, dataset shifts are another kind of phenome-
non that occurs when deploying machine learning models in
the real world. That is why Biquality Learning has been pro-
posed as a machine learning framework to design algorithms
capable of handling multiple weaknesses of supervision and
dataset shifts without assumptions on their nature and level
by relying on the availability of a small trusted dataset com-
posed of cleanly labeled and representative samples. Thus
we propose biquality-learn: a Python library for Biquality
Learning with an intuitive and consistent API to learn ma-
chine learning models from biquality data, with well-proven
algorithms, accessible and easy to use for everyone, and en-
abling researchers to experiment in a reproducible way on
biquality data.
Keywords: Python, Biquality Learning, Weakly Supervised
Learning, Dataset Shift
1 Introduction
The democratization of Data Mining has been widely success-
ful thanks in part to powerful and easy to use Machine Learn-
ing libraries such as scikit-learn [
22
], weka [
29
], or caret [
14
].
These libraries have been at the core of enforcing good prac-
tices in Machine Learning and providing ecient solutions
to complex problems. These libraries have been particularly
tailored to tackle Supervised Learning and occasionally Semi-
Supervised Learning and Unsupervised Learning. However,
strong supervision signals are scarce in practice, and prac-
titioners must resort to weak supervision. Learning with
weak supervisions, or Weakly-Supervised Learning [
33
], is
a diverse eld, as diverse as the identied weaknesses of
supervision. Usually, weaknesses of supervision are divided
into three groups, namely inaccurate supervision when sam-
ples are mislabeled, inexact supervision when labels are not
adapted to the classication task, or incomplete supervision
when labels are missing which reects the inadequacy of the
available labels in the real world [
14
]. For each weakness of
supervision, algorithms have to be specically hand designed
to alleviate them. In addition to weaknesses of supervision,
dataset shifts are another kind of phenomenon that occurs
when deploying machine learning models in the real world
[
23
]. Dataset shifts happen when the data distribution ob-
served at training time diers from what is expected from
the data distribution at testing time [
16
]. Shifts in the joint
distribution of features and targets can be further divided
into four subgroups of shifts, covariate shift for shifts in the
feature distribution, prior shift for shifts in the target distri-
bution, concept drift for shifts in the decision boundary, and
class-conditional shift for shifts in the feature distribution for
a given target. Again, designing algorithms to handle dataset
shifts usually requires assumptions on the nature of the shift
[
7
]. Because of the diverse nature of possible weaknesses of
supervision and dataset shifts, and robust algorithms’ associ-
ated assumptions, it is impossible for practitioners to choose
the suited approach to their problem.
Biquality Learning is a machine learning framework that
has been proposed to design algorithms capable of handling
multiple weaknesses of supervision, and dataset shifts with-
out assumptions on their nature [
18
]. It relies on the avail-
ability of a small trusted dataset composed of cleanly labeled
and representative samples for the targeted classication
task, in addition to the usual untrusted dataset composed of
potentially corrupted and biased samples. Even though the
trusted dataset is not big or rich enough to properly learn
the targeted classication task, it is sucient to learn a map-
ping function from the untrusted distribution to the trusted
distribution to train machine learning models on corrected
untrusted samples.
Leveraging trusted data has proven to be particularly e-
cient to combat distribution shifts [
9
,
12
] especially on the
most engaging corruptions such as instance-dependant label
noise [
19
]. In many real-world scenarios, these trusted data
arXiv:2308.09643v1 [cs.LG] 18 Aug 2023
Nodet, et al.
are available or can easily be made available to use Biquality
Learning algorithms to train robust machine learning mod-
els. One occurrence is when annotating an entire dataset is
expansive to the point of being prohibitive, but labeling a
small part of the dataset is doable. In Fraud Detection and
Cyber Security, labeling samples require complex forensics
from domain experts, limiting the number of clean samples.
However, the rest of the dataset can be labeled by hand-
engineered rules [
24
] with labels that cannot properly be
trusted. Another scenario happens where data shifts happen
during the labeling process over time. It arises in MLOps
[
13
], when a model is rst learned on clean data and then
deployed in production, or when past predictions are used
to learn an updated model [
26
]. Finally, when multiple anno-
tators are responsible for dataset labeling, which happens in
NLP, the annotators’ eciency in following these guidelines
may vary. However, suppose one annotator can be trusted.
In that case, all the other annotators can be considered un-
trusted, and associating each untrusted annotator against
the trusted annotator can be viewed as a Biquality Learning
task [31].
Multiple libraries have been developed recently for the
purpose of handling covariate shift, especially for Domain
Adaptation [
8
] or for dealing with weak supervisions [
4
].
However, Biquality Learning lacks an accessible library with
an intuitive and consistent API to learn machine learning
models from Biquality Data, with well-proven algorithms.
Thus we propose biquality-learn: a Python library for Bi-
quality Learning.
2 biquality-learn
We designed the biquality-learn library following the gen-
eral design principles of scikit-learn, meaning that it pro-
vides a consistent interface for training and using biquality
learning algorithms with an easy way to compose build-
ing blocks provided by the library with other blocks from
libraries sharing these design principles [
3
]. It includes vari-
ous reweighting algorithms, plugin correctors, and functions
for simulating label noise and generating sample data to
benchmark biquality learning algorithms.
biquality-learn and its dependencies can be easily in-
stalled through pip:
pip install biquality-learn
Overall, the goal of biquality-learn is to make well-
known and proven biquality learning algorithms accessible
and easy to use for everyone and to enable researchers to
experiment in a reproducible way on biquality data.
•
Source Code: hps://github.com/biquality-learn/biquality-
learn
•
Documentation: hps://biquality-learn.readthedocs.
io/
•License: BSD 3-Clause
3 Design of the API
Scikit-learn [
22
] is a machine learning library for Python
with a design philosophy emphasizing consistency, simplic-
ity, and performance. The library provides a consistent in-
terface for various algorithms, making it easy for users to
switch between models. It also aims to make machine learn-
ing easy to get started with through user-friendly API and
precise documentation. Additionally, it is built on top of ef-
cient numerical libraries (numpy [
11
], and SciPy [
27
]) to
ensure that models can be trained and used on large datasets
in a reasonable amount of time.
In biquality-learn, we followed the same principle, im-
plementing a similar API with t,transform, and predict
methods. In addition to passing the input features
𝑋
and the
labels
𝑌
as in scikit-learn, in biquality-learn, we need to
provide information regarding whether each sample comes
from the trusted or trusted or untrusted dataset. We require
an additional sample_quality untrusted dataset: the addi-
tional sample_quality parameter serves to specify property
to specify from which dataset the sample originates. Espe-
cially from which dataset the sample originates where a
value of 0 indicates an untrusted a value of 0 indicates an
untrusted sample, and 1 indicates a trusted sample, and 1 a
trusted one.
4 Algorithms Implemented
In biquality-learn, we purposely implemented only a spe-
cic class of algorithms centered on approaches for tabular
data and classiers, thus restricting approaches that are gen-
uinely classier agnostic or implementable within scikit-
learn’s API. We did so not to break the design principles
shared with scikit-learn and not impose a particular deep
learning library such as PyTorch [
20
], or TensorFlow [
1
] on
the user.
We summarized all implemented algorithms and what
kind of corruption they can handle in the following Table.
Algorithms Dataset Weaknesses
Shifts of Supervision
EasyAdapt [6] ✓×
TrAdaBoost [5] ✓×
Unhinged (Linear/Kernel) [25] ×✓
Backward [17, 21] (with GLC [12]) ×✓
IRLNL [15, 28] (with GLC [12]) ×✓
Plugin [32] (with GLC [12]) ×✓
𝐾-KMM [9] ✓ ✓
IRBL [19] ×✓
𝐾-PDR ✓ ✓
Table 1. Algorithms Implemented in biquality-learn
biquality-learn: a Python library for Biquality Learning
5 Training Biquality Classiers
Training a biquality learning algorithm using biquality-
learn is the same procedure as training a supervised algo-
rithm with scikit-learn thanks to the design presented in
Section 3. The features
𝑋
and the targets
𝑌
of samples be-
longing to the trusted dataset
𝐷𝑇
and untrusted dataset
𝐷𝑈
must be provided as one global dataset
𝐷
. Additionally, the
indicator representing if a sample is trusted or not has to be
provided: sample_quality =
1
𝑋∈𝐷𝑇.
Here is an example of how to train a biquality classier
using the
𝐾
-KMM (
𝐾
-Kernel Mean Matching) [
9
] algorithm
from biquality-learn:
from sklearn.linear_models import LogisticRegression
from bqlearn.kdr import KKMM
kkmm = KKMM(kernel="rbf", LogisticRegression())
kkmm.fit(X, y, sample_quality=sample_quality)
kkmm.predict(X_new)
6 scikit-learn’s metadata routing
scikit-learn’s metadata routing is a Scikit Learn Enhance-
ment Proposal (SLEP006) describing a system that can be
used to seamlessly incorporate various metadata in addi-
tion to the required features and targets in scikit-learn
estimators, scorers and transformers. biquality-learn uses
this design to integrate the sample_quality property into the
training and prediction process of biquality learning algo-
rithms. It allows one to use biquality-learn’s algorithms in
a similar way to scikit-learn’s algorithms by passing the
sample_quality property as an additional argument to the
t,predict, and other methods.
Currently, the main components provided by scikit-learn
support this design and is already usable for cross-validators.
However, it will be extended to all components in the future,
and biquality-learn will signicantly benet from many
“free” features. When hps://github.com/scikit-learn/scikit-
learn/pull/24250 will be merged, it will be possible to make a
bagging ensemble of biquality classiers thanks to the Bag-
gingClassier implemented in scikit-learn without overrid-
ing its behavior on biquality data.
from sklearn.ensemble import BaggingClassifier
bag = BaggingClassifier(kkmm).fit(X, y,
sample_quality=sample_quality)
7 Cross-Validating Biquality Classiers
Any cross-validators working for usual Supervised Learn-
ing can work in the case of Biquality Learning. However,
when splitting the data into a train and test set, untrusted
samples need to be removed from the test set to avoid com-
puting supervised metrics on corrupted labels. That is why
make_biquality_cv is provided by biquality-learn to post-
process any scikit-learn compatible cross-validators.
Here is an example of how to use scikit-learn’sRan-
domizedSearchCV function to perform hyperparameter vali-
dation for a biquality learning algorithm in biquality-learn:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.utils.fixes import loguniform
from bqlearn.model_selection import make_biquality_cv
param_dist = {"final_estimator__C": loguniform(1e3, 1e5)}
n_iter=20
random_search = RandomizedSearchCV(
kkmm,
param_distributions=param_dist,
n_iter=n_iter,
cv=make_biquality_cv(X, sample_quality, cv=3)
)
random_search.fit(X, y, sample_quality=sample_quality)
8 Simulating Corruptions with the
Corruption API
The corruption module in biquality-learn provides several
functions to articially create biquality datasets by intro-
ducing synthetic corruption. These functions can be used to
simulate various types of label noise or imbalances in the
dataset. We hope to ease the benchmark of biquality learn-
ing algorithms thanks to the corruption API, with a special
touch on the reproducibility and standardization of these
benchmarks for researchers.
Here is a brief overview of the functions available in the
corruption module:
•
make_weak_labels: Adds weak labels to a dataset by
learning a classier on a subset of the dataset and
using its predictions as a new label.
•
make_label_noise: Adds noisy labels to a dataset by
randomly corrupting a specied fraction of the sam-
ples thanks to a given noise matrix.
•
make_instance_dependent_label_noise: Adds instance-
dependent noisy labels by corrupting samples with
a probability depending on the sample and a given
noise matrix.
•
uncertainty_noise_probability: Computes the proba-
bility of corrupting a sample based on the prediction
uncertainty of a given classier [19].
•
make_feature_dependent_label_noise: Adds instance-
dependent noisy labels by corrupting a specied frac-
tion of the labels with a probability depending on a
random linear map between the features space and
the labels space [30].
•
make_imbalance: Creates an imbalanced dataset by
oversampling or undersampling the minority classes
[2].
•
make_sampling_biais: Creates a sampling biais by
sampling not at random a subset of the dataset from
the original dataset. The sampling scheme follows a
Gaussian distribution with a shifted mean and scaled
variance computed from the rst principal component
of a PCA learned from the dataset [10].
Nodet, et al.
9 Conclusion
We presented biquality-learn, a Python library for Biqual-
ity Learning. We exposed the design behind its API to make
it easy to use and consistent with scikit-learn. We notably
showed the future-proof of our design by showing how well
it integrates with the future design of scikit-learn. In the
future, biquality-learn could be supported with deep learn-
ing capabilities with a twin library with a principled design,
committing to a deep learning library. Finally, the capacity
of biquality-learn could be extended to particularly needed
capabilities in real-world scenarios, such as evaluating ma-
chine learning models on untrusted data.
References
[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng
Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jerey Dean,
Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp,
Georey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz
Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat
Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster,
Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul
Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol
Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and
Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning
on Heterogeneous Systems. hps://www.tensorflow.org/ Software
available from tensorow.org.
[2]
Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. 2018. A
systematic study of the class imbalance problem in convolutional
neural networks. Neural Networks 106 (2018), 249–259.
[3]
Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, An-
dreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexan-
dre Gramfort, Jaques Grobler, et al
.
2013. API design for machine
learning software: experiences from the scikit-learn project. arXiv
preprint arXiv:1309.0238 (2013).
[4]
Andrea Campagner, Julian Lienen, Eyke Hüllermeier, and Davide
Ciucci. 2022. Scikit-Weak: A Python Library for Weakly Supervised
Machine Learning. In Rough Sets, JingTao Yao, Hamido Fujita, Xi-
aodong Yue, Duoqian Miao, Jerzy Grzymala-Busse, and Fanzhang Li
(Eds.). Springer Nature Switzerland, Cham, 57–70.
[5]
Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2007. Boosting
for transfer learning. In International Conference on Machine Learning.
193–200.
[6]
Hal Daumé III. 2009. Frustratingly easy domain adaptation. arXiv
preprint arXiv:0907.1815 (2009).
[7]
Shai Ben David, Tyler Lu, Teresa Luu, and Dávid Pál. 2010. Impossibil-
ity theorems for domain adaptation. In Proceedings of the Thirteenth
International Conference on Articial Intelligence and Statistics. JMLR
Workshop and Conference Proceedings, 129–136.
[8]
Antoine de Mathelin, François Deheeger, Guillaume Richard, Mathilde
Mougeot, and Nicolas Vayatis. 2021. ADAPT: Awesome Domain
Adaptation Python Toolbox. arXiv preprint arXiv:2107.03049 (2021).
[9]
Tongtong Fang, Nan Lu, Gang Niu, and Masashi Sugiyama. 2020. Re-
thinking Importance Weighting for Deep Learning under Distribution
Shift, In Neural Information Processing Systems. Advances in Neural
Information Processing Systems 33 (2020), 11996–12007.
[10]
Arthur Gretton, Alex Smola, Jiayuan Huang, Marcel Schmittfull,
Karsten Borgwardt, and Bernhard Schölkopf. 2009. Covariate shift by
kernel mean matching. Dataset shift in machine learning 3, 4 (2009),
5.
[11]
Charles R. Harris, K. Jarrod Millman, Stéfan J van der Walt, Ralf
Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian
Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Pi-
cus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Al-
lan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peter-
son, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren
Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant.
2020. Array programming with NumPy. Nature 585 (2020), 357–362.
hps://doi.org/10.1038/s41586-020- 2649-2
[12]
Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel.
2018. Using Trusted Data to Train Deep Networks on Labels Corrupted
by Severe Noise. In Advances in Neural Information Processing Systems,
Vol. 31. 10456–10465.
[13]
Dominik Kreuzberger, Niklas Kühl, and Sebastian Hirschl. 2022. Ma-
chine Learning Operations (MLOps): Overview, Denition, and Archi-
tecture. arXiv preprint arXiv:2205.02302 (2022).
[14]
Max Kuhn. 2008. Building Predictive Models in R Using the caret
Package. Journal of Statistical Software, Articles 28, 5 (2008), 1–26.
hps://doi.org/10.18637/jss.v028.i05
[15]
Tongliang Liu and Dacheng Tao. 2015. Classication with noisy labels
by importance reweighting. IEEE Transactions on pattern analysis and
machine intelligence 38, 3 (2015), 447–461.
[16]
Jose G Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V
Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift
in classication. Pattern recognition 45, 1 (2012), 521–530.
[17]
Nagarajan Natarajan, Inderjit S Dhillon, Pradeep Ravikumar, and Am-
buj Tewari. 2017. Cost-Sensitive Learning with Noisy Labels. J. Mach.
Learn. Res. 18, 1 (2017), 5666–5698.
[18]
Pierre Nodet, Vincent Lemaire, Alexis Bondu, Antoine Cornuéjols, and
Adam Ouorou. 2021. From Weakly Supervised Learning to Biquality
Learning: an Introduction. In International Joint Conference on Neural
Networks (IJCNN). IEEE.
[19]
Pierre Nodet, Vincent Lemaire, Alexis Bondu, Antoine Cornuejols, and
Adam Ouorou. 2021. Importance Reweighting for Biquality Learning.
In International Joint Conference on Neural Networks (IJCNN). IEEE.
[20]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James
Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia
Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward
Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chil-
amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala.
2019. PyTorch: An Imperative Style, High-Performance Deep Learn-
ing Library. In Advances in Neural Information Processing Systems 32.
8024–8035.
[21]
Giorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, and
Lizhen Qu. 2017. Making Deep Neural Networks Robust to Label
Noise: a Loss Correction Approach. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
[22]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent
Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret-
tenhofer, Ron Weiss, Vincent Dubourg, et al
.
2011. Scikit-learn: Ma-
chine learning in Python. Journal of machine learning research 12
(2011), 2825–2830.
[23]
Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer,
and Neil D Lawrence. 2008. Dataset shift in machine learning. Mit
Press.
[24]
Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen
Wu, and Christopher Ré. 2020. Snorkel: Rapid training data creation
with weak supervision. The VLDB Journal 29, 2 (2020), 709–730.
[25]
Brendan van Rooyen, Aditya Menon, and Robert C Williamson. 2015.
Learning with Symmetric Label Noise: The Importance of Being Un-
hinged. In Neural Information Processing Systems, C. Cortes, N. D.
Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). 10–18.
[26]
Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constanti-
nos Bassias, and Ke Li. 2016. AI2: Training a Big Data Machine to
Defend. In 2016 IEEE 2nd International Conference on Big Data Security
on Cloud (BigDataSecurity). 49–54.
biquality-learn: a Python library for Biquality Learning
[27]
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland,
Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson,
Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew
Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew
R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat,
Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold,
Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris,
Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van
Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental
Algorithms for Scientic Computing in Python. Nature Methods 17
(2020), 261–272. hps://doi.org/10.1038/s41592-019- 0686-2
[28]
Ruxin Wang, Tongliang Liu, and Dacheng Tao. 2017. Multiclass learn-
ing with partially corrupted labels. IEEE transactions on neural net-
works and learning systems 29, 6 (2017), 2568–2580.
[29]
Ian H Witten and Eibe Frank. 2002. Data mining: practical machine
learning tools and techniques with Java implementations. Acm Sigmod
Record 31, 1 (2002), 76–77.
[30]
Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong,
Haifeng Liu, Gang Niu, Dacheng Tao, and Masashi Sugiyama. 2020.
Part-dependent label noise: Towards instance-dependent label noise.
Advances in Neural Information Processing Systems 33 (2020), 7597–
7610.
[31]
Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. 2011. A survey
of crowdsourcing systems. In 2011 IEEE third international conference
on privacy, security, risk and trust and 2011 IEEE third international
conference on social computing. IEEE, 766–773.
[32]
Mingyuan Zhang, Jane Lee, and Shivani Agarwal. 2021. Learning from
noisy labels with no change to the training process. In International
Conference on Machine Learning. PMLR, 12468–12478.
[33]
Zhi-Hua Zhou. 2017. A brief introduction to weakly supervised learn-
ing. National Science Review 5, 1 (08 2017), 44–53.