Fig 4 - uploaded by Jônatas Wehrmann
Content may be subject to copyright.
Results after training on the no-background dataset. Top row: real images (manually censored for protecting the reader). Middle row: results using 9-Blocks ResNet generator. Bottom row: results using a U-Net 256 generator (blurring applied to unsatisfactory results). 

Results after training on the no-background dataset. Top row: real images (manually censored for protecting the reader). Middle row: results using 9-Blocks ResNet generator. Bottom row: results using a U-Net 256 generator (blurring applied to unsatisfactory results). 

Source publication
Conference Paper
Full-text available
The easy access and widespread of the Internet makes it easier than ever to reach content of any kind at any moment, and while that poses several advantages, there is also the fact that sensitive audiences may be inadvertently exposed to nudity content they did not ask for. Virtually every work on nudity and pornography censorship focus solely on p...

Contexts in source publication

Context 1
... built such a dataset version by segmenting the people in all images with the aid of Mask R-CNN [53], the state- of-the-art approach for semantic and instance segmentation. In a nutshell, Mask R-CNN's basic structure is quite similar to Faster R-CNN, the difference being that it predicts binary masks for each RoI (Region of Interest) to allow pixel-level segmentation. In most cases, this background removal strategy successfully removed the backgrounds of the images in our dataset. However, we noticed some error cases in which Mask R-CNN was unable to find any person, or performed incor- rect segmentation. Given that such miss-segmented instances introduce a controlled amount of noise for both image classes, we decided to keep those imperfect images in this dataset. Figure 4 shows images generated by our approach trained over the no-background version of the dataset. Note that these results are arguably more consistent than those provided by models trained with the original dataset in Figure 3. Once again, one can observe that the ResNet-based model outper- formed U-Net one, by generating images with the sensitive parts properly covered with real-looking bikinis. In addition, it introduced much less distortion than its ...
Context 2
... that such miss-segmented instances introduce a controlled amount of noise for both image classes, we decided to keep those imperfect images in this dataset. Figure 4 shows images generated by our approach trained over the no-background version of the dataset. Note that these results are arguably more consistent than those provided by models trained with the original dataset in Figure 3. Once again, one can observe that the ResNet-based model outper- formed U-Net one, by generating images with the sensitive parts properly covered with real-looking bikinis. ...

Citations

... Consider another example, from the field of computer vision. Oft-used tasks have included applying makeup to images of female faces [47,117,144], changing women's clothes from pants to mini-skirts [164,228], and censoring nude women's bodies by, e.g., covering breasts with a bikini top [169,200]. Such tasks are ethically problematic because they perpetuate gendered biases and stereotypes, thus reinforcing harmful systems of sexism and misogyny [154]. ...
Article
Full-text available
Benchmarks are seen as the cornerstone for measuring technical progress in artificial intelligence (AI) research and have been developed for a variety of tasks ranging from question answering to emotion recognition. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the ‘ethicality’ of an AI system. In this paper, drawing upon research in moral philosophy and metaethics, we argue that it is impossible to develop such a benchmark. As such, alternative mechanisms are necessary for evaluating whether an AI system is ‘ethical’. This is especially pressing in light of the prevalence of applied, industrial AI research. We argue that it makes more sense to talk about ‘values’ (and ‘value alignment’) rather than ‘ethics’ when considering the possible actions of present and future AI systems. We further highlight that, because values are unambiguously relative, focusing on values forces us to consider explicitly what the values are and whose values they are. Shifting the emphasis from ethics to values therefore gives rise to several new ways of understanding how researchers might advance research programmes for robustly safe or beneficial AI.
... Current ANN-based anonymization techniques have not been developed with the aspect of platform constraints in mind; That is: edge devices like those that we use for this research cannot run state of the art anonymization models since their weights do not fit into device memory. As an example, similar approaches like that of Yang et al. [2] and More et al. [3] use graphics cards with 11GB of memory for their experiments, while we manage to reduce this requirement down to 4GB of a Jetson Nano. Furthermore, edge devices may not come with hardware implementations of the employed codecs, adding to the latency of ANN anonymization. ...
... Leaving the field of compression shows that there has already been progress made considering the task of hiding sensitive data in an image. One use-case is nudity censorship using adversarial training, as shown by Simões et al. [22] and More et al. [3]. The task of face anonymization has thus far been achieved by face generation and replacement or inpainting [23], [24], [25], [2]. ...
Preprint
Full-text available
The use of AI in public spaces continually raises concerns about privacy and the protection of sensitive data. An example is the deployment of detection and recognition methods on humans, where images are provided by surveillance cameras. This results in the acquisition of great amounts of sensitive data, since the capture and transmission of images taken by such cameras happens unaltered, for them to be received by a server on the network. However, many applications do not explicitly require the identity of a given person in a scene; An anonymized representation containing information of the person's position while preserving the context of them in the scene suffices. We show how using a customized loss function on region of interests (ROI) can achieve sufficient anonymization such that human faces become unrecognizable while persons are kept detectable, by training an end-to-end optimized autoencoder for learned image compression that utilizes the flexibility of the learned analysis and reconstruction transforms for the task of mutating parts of the compression result. This approach enables compression and anonymization in one step on the capture device, instead of transmitting sensitive, nonanonymized data over the network. Additionally, we evaluate how this anonymization impacts the average precision of pre-trained foundation models on detecting faces (MTCNN) and humans (YOLOv8) in comparison to non-ANN based methods, while considering compression rate and latency.
... The field of pornography detection encompasses a wide and varied set of applications and use cases, ranging from image and video content filtering (Kelly et al., 2008;Behrad et al., 2012;Moreira et al., 2016), automatic censoring of pornographic material (via removing, obfuscating, or blurring the offending material) (More et al., 2018;de Freitas et al., 2019;Mallmann et al., 2020), to pornography type classification (Oronowicz-Jaskowiak, 2018). ...
Article
The detection and ranking of pornographic material is a challenging task, especially when it comes to videos, due to factors such as the definition of what is pornographic and its severity level, the volumes of data that need to be processed, as well as temporal ambiguities between the benign and pornographic portions of a video. In this paper we propose a video-based pornographic detection system consisting of a convolutional neural network (CNN) for automatic feature extraction, followed by a recurrent neural network (RNN) in order to exploit the temporal information present in videos. We describe how our system can be used for both video-level labelling as well as for localising pornographic content within videos. Given pornographic video segments, we describe an efficient method for finding sexual objects within the segments, and how the types of detected sexual objects can be used to generate an estimate of the severity (‘harmfulness’) of the pornographic content. This estimate is then utilised for ranking videos based on their severity, a common requirement of law enforcement agencies (LEAs) when it comes to categorising pornographic content. We evaluate our proposed system against a benchmark dataset, achieving results on par with the state of the art, while providing additional benefits such as ranking videos according to their severity level, something which to the best of our knowledge has not been attempted before. We perform further investigations into model generalisability by performing an out-of-distribution (o.o.d.) test, investigate whether our model is making use of shortcut learning, and address the issue of explainability. The results obtained indicate that our model is using strong learning, thus further validating our proposed approach and the results obtained.
... Consider another example, from the field of computer vision. Oft-used tasks have included applying makeup to images of female faces (Jiang et al., 2020;Li et al., 2018;Chang et al., 2018), changing women's clothes from pants to mini-skirts (Mo et al., 2018;Yang et al., 2014), and censoring nude women's bodies by, e.g., covering breasts with a bikini top (Simões et al., 2019;More et al., 2018). Such tasks are ethically problematic because they perpetuate gendered biases and stereotypes, thus reinforcing harmful systems of sexism and misogyny (Manne, 2018). ...
Preprint
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research and have been developed for a variety of tasks ranging from question answering to facial recognition. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system. In this paper, drawing upon research in moral philosophy and metaethics, we argue that it is impossible to develop such a benchmark. As such, alternative mechanisms are necessary for evaluating whether an AI system is 'ethical'. This is especially pressing in light of the prevalence of applied, industrial AI research. We argue that it makes more sense to talk about 'values' (and 'value alignment') rather than 'ethics' when considering the possible actions of present and future AI systems. We further highlight that, because values are unambiguously relative, focusing on values forces us to consider explicitly what the values are and whose values they are. Shifting the emphasis from ethics to values therefore gives rise to several new ways of understanding how researchers might advance research programmes for robustly safe or beneficial AI. We conclude by highlighting a number of possible ways forward for the field as a whole, and we advocate for different approaches towards more value-aligned AI research.
... One major challenge is the collection of a representative dataset. This has been approached by collecting data from multiple domains, i.e., selecting positive data from various media platforms and negative data from different sources [2,3,4,5]. This strategy allows for collecting large datasets, but leads to a mismatch of training and test distributions owing to differences in scene environments and camera views [6,7]. ...
... Although there have been several attempts to address the problem of pornography recognition 1336 Acoustic Pornography Recognition using Fused Pitch and Mel-Frequency Cepstrum Coefficients (Caetano et al., 2016;Geng et al., 2016;Moreira et al., 2016;Nian, et al., 2016;Zhou et al., 2016;Jin et al., 2018;More et al., 2018;Nurhadiyatna et al., 2018;Shen et al., 2018;), almost all of them have utilized visual content to automate the target task of sensitive content detection. ...
Article
Full-text available
The main objective of this paper is pornography recognition using audio features. Unlike most of the previous attempts, which have concentrated on the visual content of pornography images or videos, we propose to take advantage of sounds. Using sounds is particularly important in cases in which the visual features are not adequately informative of the contents (e.g., cluttered scenes, dark scenes, scenes with a covered body). To this end, our hypothesis is grounded in the assumption that scenes with pornographic content encompass audios with features specific to those scenes; these sounds can be in the form of speech or voice. More specifically, we propose to extract two types of features, (I) pitch and (II) mel-frequency cepstrum coefficients (MFCC), in order to train five different variations of the k-nearest neighbor (KNN) supervised classification models based on the fusion of these features. Later, the correctness of our hypothesis was investigated by conducting a set of evaluations based on a porno-sound dataset created based on an existing pornography video dataset. The experimental results confirm the feasibility of the proposed acoustic-driven approach by demonstrating an accuracy of 88.40%, an F-score of 85.20%, and an area under the curve (AUC) of 95% in the task of pornography recognition.
... In an attempt to develop a non-intrusive approach for pornography censorship, More et al. [3] have addressed the problem as an image-to-image translation task, where images from domain A (naked women) are converted to another domain B (women wearing bikinis). Such a method has the advantage of translating images with no explicit supervision (bounding boxes or segmentation masks) and does not require paired training examples (e.g., the same person with and without a bikini). ...
... That method addresses the lack of instance level supervision by using two domain sets, each one representing the concepts of A and B. Thus, one needs to train a generator to map G : A → B, which will then be capable of transforming naked-women images into their counterparts (women in bikinis). Another contribution of [3] is the construction of a novel unaligned dataset containing either nude women or women wearing bikini. ...
... The main motivation of the work in More et al. [3] is to avoid ruining the user experience while consuming content that may occasionally contain nudity. Their solution workflow was inspired by CycleGan [4], though the authors had to remove the background of the input images to bring the generator focus to the specific subject in order to achieve better-looking images. ...
Conference Paper
Full-text available
The amount of digital pornographic content over the Internet grows daily and accessing such a content has become increasingly easier. Hence, there is a real need for mechanisms that can protect particularly-vulnerable audiences (e.g., children) from browsing the web. Recently, object detection methods based on deep neural networks such as CNNs have improved the effectiveness and efficiency of identifying and blocking pornographic content. Even though improvements in detecting intimate parts have been significant, the occlusion of the content is still primarily done by either blurring or removing regions of the image in an intrusive fashion. A recent study has addressed the problem of censoring the pornographic content in a non-intrusive way by generating the so-called seamless censorship via cycle-consistent generative adversarial networks. Such an approach has managed to automatically add bikinis to naked women without explicit supervision or paired training data. In this paper, we extend that method by designing a novel cycle-consistency framework that leverages sensitive information from an attention-based multi-label convolutional neural network. We evaluate the quality of our novel generative model by conducting a web survey with over 1000 opinions regarding the resulting images from our method and from baseline approaches. Results of the survey show that our method considerably improves the state-of-the-art on the seamless censorship task.
Chapter
In the information age, massive Internet data brings convenience to us. But there is some inappropriate visual content (pornography, violence, politics, terrorism, etc.), among which the dissemination of pornographic content has an adverse influence, especially for children and minors. Therefore, we present an inappropriate visual content detection method based on the joint training strategy in an end-to-end manner, which realizes the identification and location of inappropriate visual content while retaining the base class (80 categories in the COCO dataset) detection. To solve the difficulty of sample labeling, in this paper we propose a combined training strategy of detection and classification. And the Focal loss is used to improve the sample imbalance in the network sharing training. The algorithm can achieve multi-label output and has good recognition accuracy. Finally, a more challenging dataset INVC of inappropriate visual content is proposed, which includes three types of sample data in complex backgrounds at different scales, such as indoor, beach, street, etc.KeywordsInappropriate visual content detectionJoint trainingFocal loss