
Lifang Wu- Beijing University of Technology
Lifang Wu
- Beijing University of Technology
About
164
Publications
12,842
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,170
Citations
Current institution
Publications
Publications (164)
The prevalent recommendation techniques explore the graph structure of interactions to alleviate the interaction sparsity issue for inferring users’ interests. These graph models focus on extracting local structural signals to model users’ interests, introducing grid-like distortion and ignoring the hierarchical tree-like structure when learning fr...
Existing weakly supervised group activity recognition methods rely on object detectors or attention mechanisms to capture key areas automatically. However, they overlook the semantic information associated with captured areas, which may adversely affect the recognition performance. In this paper, we propose a novel framework named Visual Conceptual...
Existing methods for detecting anomalies in digital light processing (DLP) 3D printing and performing in-situ repairs can reduce most defects and improve success rates. However, since printing control parameters cannot adapt to real-time printing conditions, anomalies may persist across successive layers, and continuous repairs could ultimately lea...
Recent Face Anti-Spoofing (FAS) methods have improved generalization to unseen domains by leveraging domain generalization techniques. However, they overlooked the semantic relationships between local features, resulting in suboptimal feature alignment and limited performance. To this end, pixel-wise supervision has been introduced to offer context...
Group activity recognition is a challenging task because it involves diverse individual actions and complex relations. Most existing methods enhance individual representation by introducing relation inference using appearance features. Some methods utilize extra knowledge, such as action labels, to enhance relation inference and refine the individu...
Binary-based feature representation methods have received increasing attention in palmprint recognition due to their high efficiency and great robustness to illumination variation. However, most of them are hand-designed descriptors that generally require much prior knowledge in their design. On the other hand, conventional single-view palmprint re...
Face Anti-Spoofing (FAS) plays a critical role in safeguarding face recognition systems, while previous FAS methods suffer from poor generalization when applied to unseen domains. Although recent methods have made progress via domain generalization technology, they are still sensitive to variations in face quality caused by task-irrelevant factors...
Previous recommendation models build interest embeddings heavily relying on the observed interactions and optimize the embeddings with a contrast between the interactions and randomly sampled negative instances. To our knowledge, the negative interest signals remain unexplored in interest encoding, which merely serves losses for backpropagation. Be...
Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective...
Recently, recommender systems have witnessed the fast evolution of Internet services. However, it suffers hugely from inherent bias and sparsity issues in interactions. The conventional uniform embedding learning policies fail to utilize the imbalanced interaction clue and produce suboptimal representations to users and items for recommendation. To...
Weakly supervised group activity recognition deals with the dependence on individual-level annotations during understanding scenes involving multiple individuals, which is a challenging task. Existing methods either take the trained detectors to extract individual features or utilize the attention mechanisms for partial context encoding, followed b...
Due to their high reliability, security, and anti-counterfeiting, finger-based biometrics (such as finger vein and finger knuckle print) have recently received considerable attention. Despite recent advances in finger-based biometrics, most of these approaches leverage much prior information and are non-robust for different modalities or different...
Recommender systems play a crucial role in providing personalized services but face significant challenges from data sparsity and long-tail bias. Researchers have sought to address these issues using self-supervised contrastive learning. Current contrastive learning primarily relies on self-supervised signals to enhance embedding quality. Despite p...
The information era brings both opportunities and challenges to information services. Confronting information overload, recommendation technology is dedicated to filtering personalized content to meet users’ requirements. The extremely sparse interaction records and their imbalanced distribution become a big obstacle to building a high-quality reco...
In recent times, pre-training models of a large scale have achieved notable success in various downstream tasks by relying on contrastive image-text pairs to learn high-quality visual general representations from natural language supervision. However, these models typically disregard sentiment knowledge during the pre-training phase, subsequently h...
With the outbreak of COVID-19 and various influenza diseases, it is necessary to wear masks properly in crowded public places to prevent the spread of the virus. Therefore, detecting mask-wearing efficiently and accurately is essential for people’s physical health and safety. In this paper, we present a novel one-stage mask detection method, named...
The objective of group activity recognition is to identify behaviors performed by multiple individuals within a given scene. However, current weakly supervised approaches often rely on object detectors or use self-attention mechanisms. The former approach is susceptible to background clutter and entails high computational costs, while the latter me...
Group activity recognition refers to the process of comprehending the activity performed by multi-person in a video. However, most methods need predefined individual labels during training or testing, which is impractical and lacks intelligence. Moreover, they only consider visual features and ignore corresponding semantic information. To address t...
The personalized recommendation has already taken a crucial role in online services to alleviate information overload. However, most existing works pay their attention to user interest modeling with a uniform embedding, which inevitably results in suboptimal recommendations. We argue that users’ diverse and mixed interests are positively related to...
The conventional uniform embeddings lack diversity to infer users’ interests and make suboptimal recommendations for users. Fortunately, users’ interactions imply a complex and hybrid composition of users’ interests with multiple compatible intents. Therefore, this work strives to investigate fine-grained interest modeling from the diversified comp...
On the users’ interaction graph, neighbors have been widely explored in the embedding function of collaborative filtering to address the sparsity issue. However, the embedding learning models are highly subject to the following pairwise interaction function on interest prediction. We argue that the core of personalized recommendation locates intera...
Most existing group activity recognition methods construct spatial-temporal relations merely based on visual representation. Some methods introduce extra knowledge, such as action labels, to build semantic relations and use them to refine the visual presentation. However, the knowledge they explored just stay at the semantic-level, which is insuffi...
In recent years, a series of continuous fabrication technologies based on digital light processing (DLP) 3D printing have
emerged, which have significantly improved the speed of 3D printing. However, limited by the resin filling speed, those
technologies are only suitable to print hollow structures. In this paper, an optimized protocol for developi...
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems. Previous approaches usually learn spoofing features from a single perspective, in which only universal cues shared by all attack types are explored. However, such single-perspective based approaches ignore the differences among various attacks and commonness between c...
The core of recommendation systems is to explore users’ preferences from users’ historical records and accordingly recommend items to meet users’ interests. Previous works explore interaction graph to capture multi-order collaborative signals and derive high-quality representations of users and items, which effectively alleviates the interaction sp...
Behaviorally similar neighbors in the interaction graph have been actively explored to facilitate the collaboration between users and items and address the interaction sparsity issue. We investigate homogenous neighbors between users or items to mine collaborative signals for embedding learning. In the case of multiple and complex composition of us...
Group activity recognition that infers the activity of a group of people is a challenging task and has received a great deal of interest in recent years. Different from individual action recognition, group activity recognition needs to model not only the visual cues of individuals but also the relationships between them. The existing approaches inf...
In anchor-free object detection, the center regions of bounding boxes are often highly weighted to enhance detection quality. However, the central area may become less significant in some situations. In this paper, we propose a novel dual attention-based approach for the adaptive weight assignment within bounding boxes. The proposed improved dual a...
Visual sentiment is subjective and abstract, and it is very challenging to locate the sentiment features from images accurately. Some researchers devote themselves to extracting visual features but ignore the relation features. However, sentiment reaction is a comprehensive action of visual content, and regions may express different emotions and co...
Recent works for personalized recommendation typically emphasize their efforts on learning users’ interests from interactions. However, users make decisions depending on multiple factors, especially various attributes of items like appearance, reviews, price, etc. Therefore, in the case of image recommendation, we strive to unveil users’ interests...
The core of cross-modal hashing methods is to map high dimensional features into binary hash codes, which can then efficiently utilize the Hamming distance metric to enhance retrieval efficiency. Recent development emphasizes the advantages of the unsupervised cross-modal hashing technique, since it only relies on relevant information of the paired...
Image emotion classification is an important computer vision task to extract emotions from images. The state-of-the-art methods for image emotion classification are primarily based on proposing new architectures and fine-tuning them on pre-trained Convolutional Neural Networks. Recently, learning transferable visual models from natural language sup...
Key frame extraction is an important manner of video summarization. It can be used to interpret video content quickly. Existing approaches first partition the entire video into video clips by shot boundary detection, and then, extract key frames by frame clustering. However, in most team-sport videos, a video clip usually includes many events, and...
Sentiment is a high-level abstraction, and it is a challenging task to accurately extract sentimental features from visual contents due to the “affective gap”. Previous works focus on extracting more concrete sentimental features of individual objects by introducing saliency detection or instance segmentation into their models, neglecting the inter...
The growing complex user intention gap and information overload are obstacles for users to access the desired content. User interactions and the involved content indicate rich evidence of users’ interests. It is required to investigate interaction characters over user interest and information distribution, and this alleviates information overload f...
Recommender systems help users filter items they may be interested in from massive multimedia content to alleviate information overload. Collaborative filtering-based models perform recommendation relying on users’ historical interactions, which meets great difficulty in modeling users’ interests with extremely sparse interactions. Fortunately, the...
Information overload makes a big obstacle for multimedia services. To alleviate the burden, collaborative filtering has been actively studied in the recommendation field to help users find satisfactory content. However, current methods fail to comprehensively predict users’ interactions since users’ interests are complex and multifaceted. We argue...
In ultrasonic nondestructive testing, the low resolution of ultrasound images possibly lead to misinterpretation of defects in the image. At present, there is no special data set for ultrasonic nondestructive testing images in super-resolution, and the performance of numerous existing models depends on the learning of general data sets. In this pap...
Deep supervised hashing hash has been widely utilized in large-scale image retrieval due to its lightweight storage and fast search speed. The distribution of features from the existing hashing methods has been inevitably distorted from the original feature distribution, which resulted in the performance decline on image retrieval. With the constra...
Contrastive Language-Image Pre-training (CLIP) represents the latest incarnation of pre-trained vision-language models. Although CLIP has recently shown its superior power on a wide range of downstream vision-language tasks like Visual Question Answering, it is still underexplored for Image Emotion Classification (IEC). Adapting CLIP to the IEC tas...
Hamming space retrieval is a hot area of research in deep hashing because it is effective for large-scale image retrieval. Existing hashing algorithms have not fully used the absolute boundary to discriminate the data inside and outside the Hamming ball, and the performance is not satisfying. In this paper, a boundary-aware contrastive loss is desi...
Image emotion classification is an important computer vision task to extract emotions from images. The methods for image emotion classification (IEC) are primarily based on label or distribution as a supervision signal, which neither has enough accessibility nor diversity, limiting the development of IEC research. Inspired by psychology research an...
Group activity recognition aims to recognize behaviors characterized by multiple individuals within a scene. Existing schemes rely on individual relation inference and usually take the individuals as tokens. Essentially they select the most relevant region of the group activity from the entire image while filtering out irrelevant background noises....
Face detection has been deployed on edge devices as the basis for face applications, but the devices cannot store large-scale models and have low computing power. The existing anchor-based face detection schemes cannot cover face images over a continuous size range, and their performance is not satisfactory. Obviously, good performances are accompa...
Group activity recognition has received significant interest due to its widely practical applications in sports analysis, intelligent surveillance and abnormal behavior detection. In a complex multi-person scenario, only a few key actors participate in the overall group activity and others may bring irrelevant information for recognition. However,...
This paper describes the system proposed by the BIT-Event team for NLPCC 2021 shared task on Subevent Identification. The task includes two settings, and these settings face less reliable labeled data and the dilemma about selecting the most valid data to annotate, respectively. Without the luxury of training data, we propose a hybrid system based...
Personalized recommendation refers to identifying items that satisfy users’ interests from large-scale item databases according to users’ habits and preferences. The task is very challenging due to the complexity of user interests. Previous works use a uniform representation to model user interests, neglecting the diversity of user preferences when...
Fused deposition model (FDM) 3D printing technology, which directly manufactures physical objects from digital 3D models, has become a research hotspot in the manufacturing field in recent years. The current 3D printing has problems such as single printing direction and monotonous printing color. These problems usually waste more human and material...
Face anti-spoofing technology is a vital part of the face recognition system. For a quick response, many single-frame-based methods have been studied and made remarkable progress. However, some researchers improve performance by learning temporal features from video sequences without considering efficiency. Although the additional temporal features...
Hamming space retrieval enables efficient constant-time search through hash table lookups constructed by hash codes, where in response to each query, all data points within a small given Hamming radius are returned as relevant data. However, in Hamming space retrieval, the search performance of the existed hashing schemes based on linear scan dropp...
Purpose
This paper aims to address the problem of uncertain product quality in digital light processing (DLP) three-dimensional (3D) printing, a scheme is proposed to qualitatively estimate whether a layer is printed with the qualified quality or not cured .
Design/methodology/approach
A thermochromic pigment whose color fades at 45°C is prepared...
Abstract Motion information has been widely exploited for group activity recognition in sports video. However, in order to model and extract the various motion information between the adjacent frames, existing algorithms only use the coarse video‐level labels as supervision cues. This may lead to the ambiguity of extracted features and the omission...
As the Internet confronts the multimedia explosion, it becomes urgent to investigate personalized recommendation for alleviating information overload and improving users’ experience. Most personalized recommendation approaches pay their attention to collaborative filtering over users’ interactions, which suffers greatly from the highly sparse inter...
Deep supervised hashing takes prominent advantages of low storage cost, high computational efficiency and good retrieval performance, which draws attention in the field of large-scale image retrieval. However, similarity-preserving, quantization errors and imbalanced data are still great challenges in deep supervised hashing. This paper proposes a...
In object detection of remote sensing images, anchor-free detectors often suffer from false boxes and sample imbalance, due to the use of single oriented features and the key point-based boxing strategy. This paper presents a simple and effective anchor-free approach-RatioNet with less parameters and higher accuracy for sensing images, which assign...
Mingui Wang Di Cui Lifang Wu- [...]
Xu Liu
Weakly-supervised video object localization is a challenging yet important task. The system should spatially localize the object of interest in videos, where only the descriptive sentences and their corresponding video segments are given in the training stage. Recent efforts propose to apply image-based Multiple Instance Learning (MIL) theory in th...
Motion information used in the existed video action recognition schemes is mixing of global motion(GM) and local motion(LM). In fact, GM & LM have their respective semantic concepts. Thus, it is promising to decouple GM and LM from the mixed motions. Numerous efforts have been made on the design of global motion models for video encoding, video dej...
With the popularity of online opinion expressing, automatic sentiment analysis of images has gained considerable attention. Most methods focus on effectively extracting the sentimental features of images, such as enhancing local features through saliency detection or instance segmentation tools. However, as a high-level abstraction, the sentiment i...
Face presentation attack detection (PAD) has become a key component in face-based application systems. Typical face de-spoofing algorithms estimate the noise pattern of a spoof image to detect presentation attacks. These algorithms are device-independent and have good generalization ability. However, the noise modeling is not very effective because...
The popularity of online social curation networks takes benefits from its convenience to retrieve, collect, sort and share multimedia contents among users. With increasing content and user intent gap, effective recommendation becomes highly desirable for its further development. In this paper, we propose a content-based bipartite graph for image re...
Content curation social networks (CCSNs), such as Pinterest and Huaban, are interest driven and content centric. On CCSNs, user interests are represented by a set of boards, and a board is composed of various pins. A pin is an image with a description. All entities, such as users, boards, and categories, can be represented as a set of pins. Therefo...
Many semantic events in team sport activities e.g. basketball often involve both group activities and the outcome (score or not). Motion patterns can be an effective means to identify different activities. Global and local motions have their respective emphasis on different activities, which are difficult to capture from the optical flow due to the...
Many semantic events in team sport activities e.g. basketball often involve both group activities and the outcome (score or not). Motion patterns can be an effective means to identify different activities. Global and local motions have their respective emphasis on different activities, which are difficult to capture from the optical flow due to the...
With the development of visual social networks, the sentiment analysis of images has quickly emerged for opinion mining. Based on the observation that the sentiments conveyed by some images are related to salient objects in them, we propose a scheme for visual sentiment analysis that combines global and local information. First, the sentiment is pr...
Dezhong Xu Heng Fu Lifang Wu- [...]
Xu Liu
Group activity recognition has received a great deal of interest because of its broader applications in sports analysis, autonomous vehicles, CCTV surveillance systems and video summarization systems. Most existing methods typically use appearance features and they seldom consider underlying interaction information. In this work, a technology of no...