Conference PaperPDF Available

Garbage Collection and Sorting with a Mobile Manipulator using Deep Learning and Whole-Body Control


Abstract and Figures

Domestic garbage management is an important aspect of a sustainable environment. This paper presents a novel garbage classification and localization system for grasping and placement in the correct recycling bin, integrated on a mobile manipulator. In particular, we first introduce and train a deep neural network (namely, GarbageNet) to detect different recyclable types of garbage. Secondly, we use a grasp localization method to identify a suitable grasp pose to pick the garbage from the ground. Finally, we perform grasping and sorting of the objects by the mobile robot through a whole-body control framework. We experimentally validate the method, both on visual RGB-D data and indoors on a real full- size mobile manipulator for collection and recycling of garbage items placed on the ground.
Content may be subject to copyright.
Garbage Collection and Sorting with a Mobile Manipulator
using Deep Learning and Whole-Body Control
Jingyi Liu1, Pietro Balatti2,3, Kirsty Ellis1, Denis Hadjivelichkov1,
Danail Stoyanov1, Arash Ajoudani2, and Dimitrios Kanoulas1
Abstract— Domestic garbage management is an important
aspect of a sustainable environment. This paper presents a
novel garbage classification and localization system for grasping
and placement in the correct recycling bin, integrated on
a mobile manipulator. In particular, we first introduce and
train a deep neural network (namely, GarbageNet) to detect
different recyclable types of garbage. Secondly, we use a grasp
localization method to identify a suitable grasp pose to pick
the garbage from the ground. Finally, we perform grasping
and sorting of the objects by the mobile robot through a
whole-body control framework. We experimentally validate the
method, both on visual RGB-D data and indoors on a real full-
size mobile manipulator for collection and recycling of garbage
items placed on the ground.
Rapid urbanization over the past several years resulted in
an excessive increase of waste generation per capita, from
which a third is not managed in an environmental-friendly
manner [1]. In domestic environments, a large amount of
garbage is daily thrown or left on the ground, polluting the
environment heavily and preventing it from being sustainable
and pleasant. Garbage collection and recycling (i.e., sorting
garbage into different types) is a common solution that
addresses this issue. Garbage separation is essential in this
process, however, it is a labor-intensive job that might also
affect the labors’ health. There are two different types of
garbage sorting: 1) centralized classification, where a large
amount of garbage is dumped on a conveyor and workers
sort out the recyclable waste and 2) piecemeal sorting, which
often happens outdoors, such as in parks and streets, where
sanitation workers pick up different garbage and place them
into corresponding bins. In this paper, we focus on the second
type, which significantly reduces the need of extra sorting in
the factory and reduces hazardous contact between workers
and garbage. Our intention is to allow mobile robots to
collect and sort garbage, preventing in this way workers from
physical health issues and improving the recycling efficiency.
Garbage collection from the ground (Fig. 1-left) for
the purpose of recycling is considered a challenge to be
solved using robots. It involves the integration of several
subsystems. Firstly, visual or another type of perceptual
1Department of Computer Science, University College London, Gower
Street, WC1E 6BT, London, UK. {j.liu.19, kirsty.ellis,
dennis.hadjivelichkov, danail.stoyanov,
2HRI2Lab, Istituto Italiano di Tecnologia, via Morego 30, 16163 Gen-
ova, Italy {pietro.balatti,arash.ajoudani}
3Department of Information Engineering, University of Pisa, Pisa, Italy
equal contribution
Fig. 1: Typical garbage on the ground [2] (left); the IIT-
MOCA/UCL-MPPL mobile robot (right).
sensing is required to identify the existence of garbage in the
environment and further localize them. Moreover, the type
of the garbage must be identified, given that the collection
is for recycling, and thus it needs to be placed in the right
bin. Secondly, the grasp pose of the garbage object needs to
be extracted. Lastly, a planning and control method for the
robot to grasp the garbage and place it into the right bin.
This whole process needs to be done with all garbage items
in the scene in the most efficient way.
In this paper, we introduce a novel integration of the afore-
mentioned scheme, in order to allow a mobile manipulator
to collect garbage from the ground, after identifying their
location, grasping pose, and type. An overview of the system
can be visualized in Fig. 2, while the mobile robot that was
used is visualized in Fig. 1-right and has been modified to
carry three different recycling bins (paper, metal plastic). The
process is as follows. First, RGB-D data are acquired from
the visual sensor on the robot. These data are fed to a deep
learning network (we call it GarbageNet) that is trained to
segment, classify (based on their material type), and localize
all garbage in the 2D RGB scene. The 3D location of each
garbage object can then be extracted from the associated
depth data, as well as the grasp pose of the closest one.
Last, the robot starts approaching the closest garbage and a
whole-body controller enables the robot to grasp the target
object and place it in the right recycling bin.
Next, we review related work of garbage collection and
sorting robots (Sec. I-A). Then, we present our novel in-
tegration of three subsystems, namely the deep garbage
recognition and localization, the grasp pose extraction, and
(Fig 5)
vision data
Robot state
GPD (Fig. 4)
garbage type
grasp pose
arm/base priority
GarbageNet (Fig. 3)
Fig. 2: Software architecture of the whole system.
the whole-body mobile manipulation (Sec. II). Moreover,
we demonstrate the performance of our introduced system
through experimental results (Sec. III). Finally, we conclude
with future directions (Sec. IV).
A. Related Work
The vast majority of autonomous garbage sorting robots
mainly focus on the centralized classification, i.e., an au-
tomated conveyor along with one or more arms and a
visual detection system are combined to sort garbage in the
factory. The most representative one is the sorting station
developed by ZenRobotics Recycler in Finland [3]. A high-
resolution 3D sensor is used to get an isometric 2D height
map of the conveyor, then a machine learning method
is employed for object recognition and manipulation. The
sorting efficiency of this system is as high as 98%, and
the average sorting speed is 3000 times per hour. Another
successful commercial product is the Waste Robotics devel-
oped by FANUC [4], where convolutional neural networks
are employed to classify data that are collected by RGB-D
cameras. After the model is trained successfully, the robot
arm uses suction grippers to pick the recyclable waste. A
similar approach has been recently investigated using a fast
parallel manipulator with a suction gripper, for sorting items
on a conveyor [5]. Several other similar systems have been
developed recently [6], [7], [8], and the difference from our
approach is that they usually classify items in a known
background environment (conveyor) in the factory, while
we are looking into sorting items during their collection by
grasping from the ground and placing them in the right bin.
The second type of garbage sorting (i.e., piecemeal) that
we are interested in this paper, still remains an active
research area in robotics, with several open challenges. For
instance, the potential unstructured surrounding environment
that garbage may lie in, or the fact that a robot operating
robustly and efficiently in such a task, involves many as-
pects of operations, such as object recognition, grasp pose
estimation, grasp control algorithm, path planning, etc. Even
though there is work to been done on garbage detection [9],
[10], the only mobile manipulation robotic system that has
developed a pick-up garbage method on the grass is the one
presented by Bai et al. [11]. In particular, a deep learning
method is deployed to classify the waste on the grass (i.e.,
as waste or not) and a novel navigation algorithm is presented
based on grass segmentation. However, this system does not
work in real-time and is not able to classify garbage by type
for the purpose of recycling. In this paper, we propose a
novel integration of systems that detect the type and pose of
the garbage on the floor and use state-of-the-art whole-body
control to collect them and sort them in the right bin, based
on their type.
In this section, we discuss the approaches we employ to
realize the garbage recycling robot, including finding what
and where the garbage is and how the robot can grasp it.
A. GarbageNet: Deep Garbage Recognition & Localization
While object detection methods satisfy the demands of
garbage classification and localization, by providing class
labels and bounding boxes, instance segmentation methods
have the advantage of also providing pixel-level masks.
These masks can then be projected onto a depth image and
significantly simplify the robot grasp search for a given target
Given the need to detect and localize garbage in real-
time with the mobile robot, we decided to use the YOLACT
framework introduced in [12] and train it for garbage objects.
We have named the new trained network GarbageNet. Using
this type of network structure it is possible to infer the bound-
ing box and type of an object, as well as to acquire pixel-level
object masks that could better help the robot comprehend
Fig. 3: GarbageNet: Convolutional image features are
produced and passed onto two branches - the Protonet
branch produces mask prototypes, while the other estimates
their coefficients. Both are combined into an instance-level
mask [12].
its surrounding environment. The real-time performance and
high accuracy contribute to its advantage over other types
of object segmentation methods, such as Mask R-CNN [13],
SOLO [14] and TensorMask [15]. We have integrated the
original network in a ROS wrapper, where the robot visual
sensor is used as input and the garbage object segmentation,
bounding box, type, and grasping pose messages are gen-
erated. Our framework produces instance masks and scores
them with mask coefficients. Masks are combined using Non-
Maximum Suppression (NMS) to ensure there is no overlap
between instances while retaining useful information. The
core structure is shown in Fig. 3.
1) Dataset: The original YOLACT network is trained on
the COCO [16] dataset, originally used for image recogni-
tion and does not fulfill the requirements of garbage type
characterization and segmentation. Thus, a novel dataset to
train GarbageNet for garbage identification was needed. For
this reason, we used the newly introduced TACO dataset [2],
which is specialized for garbage segmentation and classi-
fication. The dataset uses an object taxonomy that can be
directly used for garbage sorting purposes. In particular, it
includes 1500 images with 4784 annotations, 60 categories
which belong to 28 super-categories (e.g., paper, glass, metal,
carton, plastic, polypropylene, etc). Moreover, the objects’
background environment includes both indoors and outdoors
environments, such as tiles, pavements, grass, roads, etc. In
this way, even deformed garbage objects in the wild can be
classified and segmented.
2) Training: To exploit our framework, we randomly
split the TACO dataset into training (80%), cross-validation
(10%), and testing (10%) sets. We used an ImageNet [17]
pre-trained model of YOLACT to fine-tune the weights on
the TACO dataset, using a batch size of 8on two Titan XP
GPUs for 1day and 40,000 iterations (learning rate: 103,
weight decay: 5×104, momentum: 0.9). Using ResNet-50
as backbone, we achieved a mAP75 of 40.43 (mean Average
Precision with an IoU threshold of 0.75), in roughly 30
frames per second (i.e. almost the speed of the input RGB-D
sensing). This is slightly better than the original mAP75 of
YOLACT on the COCO dataset, which is 31.2, or Mask R-
CNN, which is around 37.8. Notice here that the exact mask
(a) (b)
Fig. 4: Grasps produced by GPD [19]: (a) candidate pool
and (b) axes defining each grasp.
segmentation of the object is not particularly important in this
stage, since the grasping pose is extracted from a different
process, as described in the next section.
3) Implementation: To allow the system be integrated on
our ROS-based architecture, a wrapper was used to interact
easily with the other components and the real robot through
ROS topics. In particular, an interface node subscribes to
the input point cloud and the GarbageNet-produced masks,
which in turn projects the masks onto the point cloud. The
approximate position of the closest garbage piece is produced
using these projections. The interface also filters the detected
garbage category into three super-categories: paper, metal
and plastic, based on keyword search. Finally, the interface
publishes the approximate position of the nearest object, its
projected mask points and its super-category.
B. GPD: Grasp Pose Detection
Traditional grasp pose generation methods [18] require
either the geometric properties or an exact 3D model of the
targeted object. However, litter thrown on the ground often
has a non-rigid structure with varying textures and shapes.
Providing precise models or establishing a large garbage
grasping database is impractical. Moreover, a mobile robot
dealing with cluttered scenes would only have access to
RGB-D information from a single view.
A more general solution that deals with these challenges
would be to generate grasps directly from a voxelized
point cloud. That is the principle on which Grasping Pose
Detection (GPD) [19] is based. GPD has successfully been
integrated with object detectors in cluttered environments.
1) Method: The GPD algorithm follows several steps as
briefly outlined in Algorithm 1.
Algorithm 1: Grasp Pose Detection
input : Pointcloud C;
Subset of points where the grasps are to occur S;
Grasp filtering parameters Θ;
output: Grasp Configurations G;
1) H= HandSearch(C, S);
2) G= SelectGraspConfigurations(H, C,Θ);
In Step 1, the received point cloud data Cis voxelized and
filtered. Points uniformly sampled from the subset of points
Fig. 5: Perception pipeline: Input image is passed through GarbageNet to detect garbage. In the interface, masks of detected
objects are projected onto the point cloud. The approximate position of the nearest garbage is outputted, while its mask
projection is used as sampling points for GPD. A garbage type label is also produced. Finally, GPD produces a grasp.
Sare used to produce hand candidates (see Fig. 4a) at the
axes aligned with the points’ normals. Each hand candidate
is defined by axes for approach, hand binormal and object
axis as shown in Fig. 4b. Filtering is applied to reject any
candidates that would collide with the point cloud or do not
contain at least one point in the closing region of the hand.
In Step 2, grasp candidates are produced from the hand
candidates, given some allowable angle deviation and ap-
proach restrictions Θ. The candidates are encoded into sev-
eral image embeddings, which are passed through a trained
convolutional neural network based on LeNet [20]. The
output of the network classifies the candidates as successful
grasps by assigning them a score. Finally, the grasp config-
urations Gwith the highest scores are selected as the best
2) Implementation: The pre-trained original implementa-
tion of the GPD package [21] is used within a GPD ROS
wrapper. The input to GPD is set as the RGB-D view
received from a camera, along with sampling points based
on the detected garbage instance masks to provide a region
of interest. The outputted grasp with the highest score is
selected and transformed into a ROS pose message type.
Following the aforementioned framework, a unified
garbage detection, classification, localization and grasp gen-
eration pipeline is created by connecting GarbageNet and
GPD through an interface node as shown in Fig. 5.
C. Whole-Body Mobile Manipulation Grasping
With the aim of localizing and collecting garbage items
from the ground with a robotic system, we introduce in this
section the control module that has been implemented on
the research platform IIT MOCA/UCL MPPL [22]. This
versatile cobot is composed by a Robotnik SUMMIT-XL
STEEL mobile platform (3-Degrees of Freedom (DoFs)),
and a Franka Emika Panda robotic arm (7-DoFs). Since
the control of the former is achieved through admittance
control while the robotic arm is torque-controlled, a Whole-
Body Impedance Controller has been developed to deal with
their different causalities, extending our methods introduced
in [23], [24]. The implementation of such control system
allows both to achieve the desired end-effector behavior,
and to exploit the redundant DoFs of the robot. This is a
fundamental requirement to successfully execute autonomous
and complex manipulation tasks.
Considering the mobile-manipulator with 3-DoFs (rigid
body motion) at the mobile base and n-DoFs at the ma-
nipulator, we can define the generalised coordinates q=
r]TR3+n, with qvand qrthe coordinates of the
mobile base and the manipulator. We describe the dynamics
equations of the combined system as follows, taking into
account the admittance causality of the mobile base that is
velocity controlled:
z}| {
Madm 0
z}| {
Dadm 0
where Madm R3×3and Dadm R3×3represent the
virtual inertial and virtual damping terms for the admittance
control of the mobile base, ˙qvR3is the velocity of
the generalised motion of mobile platform, Γvir
vR3are the virtual and external torques. MrRn×n
is the symmetric and positive definite inertial matrix, Cr
Rn×nis the Coriolis and centrifugal matrix, grRnis the
gravity vector, ΓrRnand Γext
rRnare the joint torque
vector and external torque vector of the robotic manipulator,
Let us consider xR6as the task coordinates in Carte-
sian space. It follows that the desired task-space dynamics
behaviour in response to the external wrench Fext R6,
(leading to the external torques Γext = [Γext
(1)), can be obtained as:
Fext =Λ(q)¨
˜x+ (µ(q) + D)˙
Fig. 6: Example output of input point clouds (left), GarbageNet mask and classification of the closest garbage (middle,
zoomed), mask projected onto the pointcloud (middle) and grasps generated via GPD (right).
where ˜x=xxdis the Cartesian error from the desired task
xd, and KR6×6and DR6×6are the desired Cartesian
stiffness and damping matrices, respectively. Λ(q)R6×6
represents the Cartesian inertial and µ(q)R6×6the
Cartesian Coriolis and centrifugal matrix, respectively. For
more details, please see our previous work on this [22].
In order to navigate through unstructured environment and
to grasp garbage items from the ground, it is crucial to
selectively assign different mobility priorities to the mobile
base or to the robotic arm, when a desired trajectory is
executed at the end-effector level. Specifically, during the
exploration of the environment, the robot movements must be
performed mostly by the mobile base, while when collecting
objects from the ground the priority needs to be set to the
arm movements.
To this end, we implemented a weighted dynamically-
consistent pseudo-inverse to achieve such behaviours. This is
done by applying the desired motion constraints through real-
time variable weighting factors. The weighted dynamically
consistent pseudo-inverse is defined as
where ΛW=JTM W MJ 1represents the weighted
Cartesian inertia, JR6×(3+n)denotes the whole-body
Jacobian matrix, MR(3+n)×(3+n)is the whole-body
inertial matrix, and WR(3+n)×(3+n)is the diago-
nal and positive-definite weight matrix defined by W=
diag [w1w2· · · wn], with wi0. Therefore, a
higher value of wiat the i-th joint will impede the motion
of that joint, and W=I3+nwill make no effect on the
motion mapping.
Finally, the whole-body Cartesian impedance controller’s
commanded torque for the main task are calculated as:
Γimp =g+¯
The robot desired poses are retrieved through the Trajec-
tory planner unit, that, once received as input a target pose,
computes the intermediate waypoints by means of a classical
fifth-order polynomial law.
In this section, we present a brief experimental analysis
of the garbage segmentation and classification (GarbageNet),
Fig. 7: GarbageNet classification and segmentation: images
with single items are classified correctly with high confidence
scores (top). Images containing multiple items are classified
with smaller confidence score due to occlusions (bottom).
grasp pose proposal (GPD), and overall system performance
that identifies and collects for recycling three different types
of garbage (paper, metal, plastic) using the whole-body
controlled mobile manipulation robot.
A. GarbageNet: Garbage Segmentation and Classification
To test the quality of GarbageNet segmentation and clas-
sification introduced in Sec. II-A, we have first validated
on the testing TACO dataset (see Sec. II-A.2), with a
resultant mAP75 of 40.43 at 30 frames per second. We
further segmented several unseen test images (roughly 1h
of recorded data, including objects from the categories into
which we will be sorting), both from a handheld RGB-D
RealSense camera and the visual sensor of the mobile robot.
It is found that instance segmentation of spread out pieces
of garbage is successful (Fig. 7-top), while in some scenes
containing a cluster of many pieces of garbage it is less
successful and needs a further research investigation (Fig. 7-
bottom). This localization failure of cluttered scenes has
been identified as one of two typical errors encountered in
mask generation by GarbageNet, the second being leakage -
noise that is included in the instance mask when a bounding
box is not accurate [12]. The success of the classification
of garbage provided by GarbageNet is influenced by the
quality of the images that are provided to the system. It is
found that in overexposed images, the algorithm struggles to
detect features that differentiate the garbage item from the
surrounding environment.
B. GPD: Garbage Grasp Proposal
An advantage of our introduced system is that the pre-
cision of the garbage mask segmentation and bounding-
box estimation does not highly influence the grasping pose
extraction, since this is estimated from the GPD method,
introduced in Sec. II-B. Items of garbage to be picked are
provided to the GPD node sequentially by order of proximity
to the robot. Some example grasp generation sequences are
shown in Fig. 6. The quality of the grasps generated by GPD
depends on the number of sample points on the item, e.g.,
a sparse point cloud can result in no grasp candidates. This
was observed in some scenes, but it was quickly rectified
by capturing new RGB-D data. With a well populated point
cloud, GPD produces very good grasp proposals with a
grasping success rate of almost 90%, tested with 50 grasps
on the robotic manipulator.
Notice that we had to restrict all grasps to be from the
top of the object, to respect the reachability constraints
of the robot manipulator. GPD parameters allow for easy
selection of approach direction as well as allowable angle
deviation from it. It was found that when generating grasps
on objects that were seen only from the side, GPD, as
expected, struggles to produce grasps from above and data
recapturing is required from a different pose. Generated
grasps have been successfully transferred from simulation
to the real robots with a two fingered mobile manipulator.
C. Whole-Body Grasping Results
Exploiting the Whole-Body impedance controller intro-
duced in Sec. II-C, we performed a set of experiments
with the IIT MOCA/UCL MPPL robotic platform (Fig. 1-
left). To describe the phases of such experiments, we follow
the control flow of the Finite State Machine (FSM) (see
Fig. 2). As in a real world scenario, the mobile robot explores
the environment, until an acknowledgment (ack) message is
provided by the visual perception module. Fig. 8 shows all
the phases taking place after this ack is triggered for three
different materials: metal (a), paper (b), and plastic (c). In the
garbage detected state (light red), the robot halts its motion,
so that GarbageNet identifies the garbage type, and GPD ex-
tracts the grasp pose. These data are sent to the FSM, that can
move on to the next phases. The grasp pose (visualized inside
Fig. 8: The grasping results performed by the IIT
MOCA/UCL MPPL robotic platform exploiting the Whole-
Body impedance controller. Images of garbage detection
(with the grasp pose in the embedded image), reach, grasp,
and disposal in the correct type of trash bin are visualized.
Three different items were identified and collected: a tomato
juice can - classified as metal (a), a lentils carton box -
classified as paper (b), and a water plastic bottle - classified
as plastic (c).
the garbage detection image in Fig. 8) is reported in the plots
with point markers at the moment of detection, and reported
until the grasp takes place with (dashed lines). Next, in the
garbage reach state (light blue), the robot moves towards
this grasp pose. During this process we can distinguish two
sub-phases. In the first one, the robot reaches the vicinity of
the goal pose, assigning a higher priority to the mobile base
through (3), i.e. setting wi= 1 to the mobile base joints and
wi= 3 to the arm joints, with the impedance parameters
set to a compliant value K=diag(500N/m). Like this,
the mobile robot can approximately reach the item pose in a
compliant way, and avoiding unnecessary movements of the
arm out of the mobile base support polygon. This guarantees
a safety interaction in case of an unexpected collision with
the environment. Subsequently, the priority is switched to
the arm through (3), i.e. setting wi= 5 to the mobile base
joints and wi= 1 to the arm joints, and the impedance
parameters are set to be stiffer with K=diag(1000N/m).
In this way, the robotic arm can reach the ground towards the
grasp pose in a precise manner. From Fig. 8, it is possible
to notice that the robot end-effector reaches the grasp pose
with a high accuracy, so that the garbage grasp state (light
green) can be performed successfully. In this state, the robot
gripper closes its finger until a force of 3Nis sensed, to
ensure the object is firmly grasped. Lastly, in the garbage
trash state (light yellow) the robot takes the garbage item to
the corresponding trash bin placed on its back, selecting it
through the garbage type message received previously.
In this work, we present a novel garbage identification and
sorting system, integrated on a mobile robot, using whole-
body control. This approach works in real-time, identifying,
localizing, and sorting garbage.
In the future, we aim at validating the integrated system
outdoors in the wild, under various forecast conditions, and
work further on the path planning and exploitation part of
the method. In particular, the problem of where to look for
garbage in a big outdoors space and how to collect them
in an energy and time efficient way are our next steps to
address the problem.
This work was supported by the UCL Global Engagement
Funds 2020/21 and the EU H2020 SOPHIA project (no
871237). The Titan Xp GPUs were donated by the NVIDIA
[1] S. Kaza, L. C. Yao, P. Bhada-Tata, and F. Van Woerden, “What a
Waste 2.0: A Global Snapshot of Solid Waste Management to 2050,
The World Bank, Washington DC, Tech. Rep., 2018.
[2] P. F. Proenc¸a and P. Sim˜
oes, “TACO: Trash Annotations in Context
for Litter Detection,” arXiv preprint arXiv:2003.06975, 2020.
[3] D. T. J. Lukka, D. T. Tossavainen, D. J. V. Kujala, and D. T. Raiko,
“ZenRobotics Recycler – Robotic Sorting using Machine Learning,”
ZenRobotics Recycler, Helsinki, Finland, Tech. Rep., 2014.
[4] W. Liu, H. Qian, and Z. Pan, “Dispersion multi-object robot sorting
method in material frame based on deep learning,” China Patent
2 017 111 944 941, June 08, 2018.
[5] F. Raptopoulos, M. Koskinopoulou, and M. Maniadakis, “Robotic
Pick-and-Toss Facilitates Urban Waste Sorting,” in 2020 IEEE 16th
International Conference on Automation Science and Engineering
(CASE), 2020, pp. 1149–1154.
[6] I. Vegas, K. Broos, P. Nielsen, O. Lambertz, and A. Lisbona, “Up-
grading the quality of mixed recycled aggregates from construction
and demolition waste by using near-infrared sorting technology,
Construction and Building Materials, vol. 75, pp. 121–128, 2015.
[7] A. Shaukat, Y. Gao, J. A. Kuo, B. A. Bowen, and P. E. Mort,
“Visual classification of waste material for nuclear decommissioning,
Robotics and Autonomous Systems, vol. 75, pp. 365–378, 2016.
[8] G. SP, H. S, and T. A, “Multi-material classification of dry recyclables
from municipal solid waste based on thermal imaging,” Waste Man-
agement, vol. 70, pp. 13–21, 2017.
[9] R. Sultana, R. D. Adams, Y. Yan, P. M. Yanik, and M. L. Tanaka,
“Trash and Recycled Material Identification using Convolutional Neu-
ral Networks (CNN),” in 2020 SoutheastCon, 2020, pp. 1–8.
[10] X. Li, M. Tian, S. Kong, L. Wu, and J. Yu, “A modified YOLOv3
detection method for vision-based water surface garbage capture
robot,” International Journal of Advanced Robotic Systems, vol. 17,
no. 3, p. 1729881420932715, 2020.
[11] J. Bai, S. Lian, Z. Liu, K. Wang, and D. Liu, “Deep Learning Based
Robot for Automatically Picking Up Garbage on the Grass,” IEEE
Transactions on Consumer Electronics, vol. 64, no. 3, pp. 382–389,
[12] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-Time
Instance Segmentation,” in 2019 IEEE/CVF International Conference
on Computer Vision (ICCV), 2019, pp. 9156–9165.
[13] K. He, G. Gkioxari, P. Doll´
ar, and R. Girshick, “Mask R-CNN,”
in 2017 IEEE International Conference on Computer Vision (ICCV),
2017, pp. 2980–2988.
[14] X. Wang, T. Kong, C. Shen, Y. Jiang, and L. Li, “SOLO: Segmenting
Objects by Locations,” 2019.
[15] X. Chen, R. Girshick, K. He, and P. Dollar, “TensorMask: A Founda-
tion for Dense Object Segmentation,” in 2019 IEEE/CVF International
Conference on Computer Vision (ICCV), 2019, pp. 2061–2069.
[16] T.-Y. Lin, Y. Cui, G. Paterr, and etc, “Coco: Common objects
in context,” [EB/OL], Accessed
September 14, 2020.
[17] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei, “ImageNet:
A large-scale hierarchical image database,” in 2009 IEEE Conference
on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[18] D. Kanoulas, J. Lee, D. G. Caldwell, and N. G. Tsagarakis, “Visual
Grasp Affordance Localization in Point Clouds Using Curved Contact
Patches,” International Journal of Humanoid Robotics, vol. 14, no. 01,
p. 1650028, 2017.
[19] A. ten Pas, M. Gualtieri, K. Saenko, and R. Platt, “Grasp Pose
Detection in Point Clouds,” The International Journal of Robotics
Research, vol. 36, no. 13-14, pp. 1455–1473, 2017. [Online].
[20] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going
deeper with convolutions,” in 2015 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
[21] A. ten Pas, “Grasp Pose Estimation,” [EB/OL],
atenpas/gpd Accessed September 1, 2020.
[22] Y. Wu, P. Balatti, M. Lorenzini, F. Zhao, W. Kim, and A. Ajoudani,
“A teleoperation interface for loco-manipulation control of mobile col-
laborative robotic assistant,IEEE Robotics and Automation Letters,
vol. 4, no. 4, pp. 3593–3600, 2019.
[23] P. Balatti, D. Kanoulas, G. F. Rigano, L. Muratore, N. G. Tsagarakis,
and A. Ajoudani, “A Self-Tuning Impedance Controller for Au-
tonomous Robotic Manipulation,” in IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), 2018, pp. 5885–5891.
[24] P. Balatti, D. Kanoulas, N. G. Tsagarakis, and A. Ajoudani, “Towards
Robot Interaction Autonomy: Explore, Identify, and Interact,” in
International Conference on Robotics and Automation (ICRA), 2019,
pp. 9523–9529.
... Sequential navigation and manipulation: Due to the difficulties of planning in the conjoint space of the mobile manipulator base and arm, many existing approaches restrict themselves to sequential movements of the base followed by static manipulations with the arm. This decomposition has been popular across approaches based on reachability [8], planning [1], [19], [20], impedance control [21], and reinforcement learning [14], [22]. ...
Despite its importance in both industrial and service robotics, mobile manipulation remains a significant challenge as it requires a seamless integration of end-effector trajectory generation with navigation skills as well as reasoning over long-horizons. Existing methods struggle to control the large configuration space, and to navigate dynamic and unknown environments. In previous work, we proposed to decompose mobile manipulation tasks into a simplified motion generator for the end-effector in task space and a trained reinforcement learning agent for the mobile base to account for kinematic feasibility of the motion. In this work, we introduce Neural Navigation for Mobile Manipulation (N$^2$M$^2$) which extends this decomposition to complex obstacle environments and enables it to tackle a broad range of tasks in real world settings. The resulting approach can perform unseen, long-horizon tasks in unexplored environments while instantly reacting to dynamic obstacles and environmental changes. At the same time, it provides a simple way to define new mobile manipulation tasks. We demonstrate the capabilities of our proposed approach in extensive simulation and real-world experiments on multiple kinematically diverse mobile manipulators. Code and videos are publicly available at
... Research shows that in the past three decades, the proportion of organic matter has stayed almost stagnant at 41%, but recyclables have risen from 9.56% to 17.18%, as seen in table 4.2 below. Garbage is projected to contain approximately 45-75% biodegradable waste in Indian cities (as opposed to 25 percent of united state city garbage) with 50-55% moisture; 35-45% food biomass, fruit and vegetable; and 8-15% non-organic materials such as metal, stones, glass, plastic (Liu et al. 2021). ...
Solid waste supervision is one of the most severe environmental issues in developing countries, where services are frequently inadequate, especially in low-income areas. These communities frequently account for a significant amount of the city's land and population. Rapid urbanization, rising population density, traffic congestion, air and water pollution, rising per capita solid waste creation, and a lack of garbage disposal land all contribute to the worsening of solid waste management issues. With the expansion of the residential, commercial, and industrial sectors, as well as increased economic development, future demand will rise. The purpose of this chapter is to look into the present solid waste management practices, understanding the expertise and skills of the different elements of urban solid waste
... Their advantage is that the current understanding of physical systems is refined and works well on fully actuated robots. Most methods focus on WBC for quadrupeds [14], [15], [16], humanoids [17], [18], [19], [20], [21], animaloids [22], [23], [24], or mobile manipulators [17], [25]. Model Predictive Control methods are popular with works such as Minniti et al. [26] showing success in WBC pose-tracking and interaction tasks. ...
Many robotic path planning problems are continuous, stochastic, and high-dimensional. The ability of a mobile manipulator to coordinate its base and manipulator in order to control its whole-body online is particularly challenging when self and environment collision avoidance is required. Reinforcement Learning techniques have the potential to solve such problems through their ability to generalise over environments. We study joint penalties and joint limits of a state-of-the-art mobile manipulator whole-body controller that uses LIDAR sensing for obstacle collision avoidance. We propose directions to improve the reinforcement learning method. Our agent achieves significantly higher success rates than the baseline in a goal-reaching environment and it can solve environments that require coordinated whole-body control which the baseline fails.
... Setup: To showcase the applicability of the model on a real robot we set up the following experiment. We use the mobile manipulator MPPL's [23] Franka Emika Panda arm equipped with a RealSense D435i RGB-D camera. Firstly, a point is selected on an image of an object from the training set, in this case, a shoe or a hat. ...
Full-text available
We address the problem of inferring self-supervised dense semantic correspondences between objects in multi-object scenes. The method introduces learning of class-aware dense object descriptors by providing either unsupervised discrete labels or confidence in object similarities. We quantitatively and qualitatively show that the introduced method outperforms previous techniques with more robust pixel-to-pixel matches. An example robotic application is also shown~- grasping of objects in clutter based on corresponding points.
... Setup: To showcase the applicability of the model on a real robot we set up the following experiment. We use the mobile manipulator MPPL's [23] Franka Emika Panda arm equipped with a RealSense D435i RGB-D camera. Firstly, a point is selected on an image of an object from the training set, in this case, a shoe or a hat. ...
Conference Paper
Full-text available
We address the problem of inferring self-supervised dense semantic correspondences between objects in multi-object scenes. The method introduces learning of class-aware dense object descriptors by providing either unsupervised discrete labels or confidence in object similarities. We quantitatively and qualitatively show that the introduced method outperforms previous techniques with more robust pixel-to-pixel matches. An example robotic application is also shown-grasping of objects in clutter based on corresponding points.
Robotic manipulators have widespread applications in the industry, medical surgery, space exploration, and many more. It is an essential requirement for a manipulator to have high precision in positioning the end-effector at the desired location. The solution of this inverse kinematic problem is complex and computationally expensive. In this paper, a boundary restricted particle swarm optimization (BRPSO) is proposed for obtaining the solution of the inverse kinematic problem of a 7 degree of freedom (DOF) robotic manipulator. The boundary restriction method for the decision variables of the PSO algorithm is important as violations of boundaries may lead to infeasible solutions. Results show the proposed method is suitable for solving the inverse kinematic problem while producing feasible solutions within the respective physical limits of the joint angles of the considered 7 DOF manipulator. Furthermore, a comparative study of the positional errors of the end-effector obtained by the proposed method along with results obtained by algorithms like ABC, GSA, and different variants of PSO is presented. The results and the comparative analysis testify that the proposed method surpass some of the popular methods available in the literature.
Conference Paper
Full-text available
The aim of this research is to improve municipal trash collection using image processing algorithms and deep learning technologies for detecting trash in public spaces. This research will help to improve trash management systems and help to create a smart city. Two Convolutional Neural Networks (CNN), both based on the AlexNet network architecture, were developed to search for trash objects in an image and separate recyclable items from the landfill trash objects, respectively. The two-stage CNN system was first trained and tested on the benchmark TrashNet indoor image dataset and achieved great performance to prove the concept. Then the system was trained and tested on outdoor images taken by the authors in the intended usage environment. Using the outdoor image dataset, the first CNN achieved a preliminary 93.6% accuracy to identify trash and non-trash items on an image database of assorted trash items. A second CNN was then trained to distinguish trash that will go to a landfill from the recyclable items with an accuracy ranging from 89.7% to 93.4% and overall 92%. A future goal is to integrate this image processing based trash identification system in a smart trash can robot with a camera to take real-time photos that can detect and collect the trash all around it.
Conference Paper
Full-text available
Incorporating robots into industrial settings is not a new concept, but their use in the waste recycling industry is critical. Recently AI-assisted robots are used to support waste sorting and improve the quantity and quality of recovered materials. This article aims to study and apply a new transfer paradigm for recyclable sorting using Delta robots, which is based on replacing the usual Pick-and-Place process with the much faster Pick-and-Toss process. Current robotic sorting systems can sort one item per second, Pick-and-Toss intends to significantly advance this score. We quantitatively and qualitatively assess the tossing approach by comparing it to Pick-and-Place, in terms of accuracy and robustness, both in simulation and on a real waste sorting lab-setup equipped with an ABB-IRB360 Delta robot. Overall, the Pick-and-Toss approach proves to be a powerful mechanism that succeeds faster sorting of waste streams in comparison to the standard Pick-and-Place procedure.
Full-text available
To tackle the water surface pollution problem, a vision-based water surface garbage capture robot has been developed in our lab. In this article, we present a modified you only look once v3-based garbage detection method, allowing real-time and high-precision object detection in dynamic aquatic environments. More specifically, to improve the real-time detection performance, the detection scales of you only look once v3 are simplified from 3 to 2. Besides, to guarantee the accuracy of detection, the anchor boxes of our training data set are reclustered for replacing some of the original you only look once v3 prior anchor boxes that are not appropriate to our data set. By virtue of the proposed detection method, the capture robot has the capability of cleaning floating garbage in the field. Experimental results demonstrate that both detection speed and accuracy of the modified you only look once v3 are better than those of other object detection algorithms. The obtained results provide valuable insight into the high-speed detection and grasping of dynamic objects in complex aquatic environments autonomously and intelligently.
Conference Paper
Full-text available
Nowadays, robots are expected to enter in various application scenarios and interact with unknown and dynamically changing environments. This highlights the need for creating autonomous robot behaviours to explore such environments, identify their characteristics and adapt, and build knowledge for future interactions. To respond to this need, in this paper we present a novel framework that integrates multiple components to achieve a context-aware and adaptive interaction between the robot and uncertain environments. The core of this framework is a novel self-tuning impedance controller that regulates robot quasi-static parameters, i.e., stiffness and damping, based on the robot sensory data and vision. The tuning of the parameters is achieved only in the direction(s) of interaction or movement, by distinguishing expected interactions from external disturbances. A vision module is developed to recognize the environmental characteristics and to associate them to the previously/newly identified interaction parameters, with the robot always being able to adapt to the new changes or unexpected situations. This enables a faster robot adaptability, starting from better initial interaction parameters. The framework is evaluated experimentally in an agricultural task, where the robot effectively interacts with various deformable environments.
Conference Paper
Full-text available
Complex interactions with unstructured environments require the application of appropriate restoring forces in response to the imposed displacements. Impedance control techniques provide effective solutions to achieve this, however, their quasi-static performance is highly dependent on the choice of parameters, i.e. stiffness and damping. In most cases, such parameters are previously selected by robot programmers to achieve a desired response, which limits the adaptation capability of robots to varying task conditions. To improve the generality of interaction planning through task-dependent regulation of the parameters, this paper introduces a novel self-regulating impedance controller. The regulation of the parameters is achieved based on the robot's local sensory data, and on an interaction expectancy value. This value combines the interaction values from the robot state machine and visual feedback, to authorize the autonomous tuning of the impedance parameters in selective Cartesian axes. The effectiveness of the proposed method is validated experimentally in a debris removal task.
Full-text available
This paper presents a novel garbage pickup robot which operates on the grass. The robot is able to detect the garbage accurately and autonomously by using a deep neural network for garbage recognition. In addition, with the ground segmentation using a deep neural network, a novel navigation strategy is proposed to guide the robot to move around. With the garbage recognition and automatic navigation functions, the robot can clean garbage on the ground in places like parks or schools efficiently and autonomously. Experimental results show that the garbage recognition accuracy can reach as high as 95%, and even without path planning, the navigation strategy can reach almost the same cleaning efficiency with traditional methods. Thus, the proposed robot can serve as a good assistance to relieve dustman’s physical labor on garbage cleaning tasks. IEEE
We present a new, embarrassingly simple approach to instance segmentation. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the “detect-then-segment” strategy (e.g., Mask R-CNN), or predict embedding vectors first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of “instance categories”, which assigns categories to each pixel within an instance according to the instance’s location and size, thus nicely converting instance segmentation into a single-shot classification-solvable problem. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent single-shot instance segmenters in accuracy. We hope that this simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation. Code is available at
This paper presents a novel teleoperation interface that enables remote loco-manipulation control of a MObile Collaborative robotic Assistant (MOCA). MOCA is a new research platform developed at IIT, which is composed by a lightweight manipulator arm, a Pisa/IIT SoftHand, and a mobile platform driven by four Omni-directional wheels. A whole-body impedance controller is consequently developed to ensure accurate tracking of the impedance and position trajectories at MOCA end-effector by considering the causal interactions in such a dynamic system. The proposed teleoperation interface provides the user with two control modes: Locomotion and Manipulation. The Locomotion mode receives inputs from a personalised human Center-of-Pressure model, which enables real-time navigation of MOCA mobile base in the environment. The Manipulation mode receives inputs from a tele-impedance interface, which tracks human arm endpoint stiffness and trajectory profiles in real-time and replicates them using the MOCA's whole-body impedance controller. To evaluate the performance of the proposed teleoperation interface in the execution of remote tasks with dynamic uncertainties, a sequence of challenging actions, i.e., navigation, door opening, and wall drilling, has been considered in the experimental setup.