Figure 11 - uploaded by Shuo Jiang
Content may be subject to copyright.
Evaluation scores of different image groups.

Evaluation scores of different image groups.

Source publication
Conference Paper
Full-text available
The patent database is often used in searches of inspirational stimuli for innovative design opportunities because of its large size, extensive variety and rich design information in patent documents. However, most patent mining research only focuses on textual information and ignores visual information. Herein, we propose a convolutional neural ne...

Similar publications

Preprint
Full-text available
Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes. Existing approaches to solving this challenging task learn a mapping from the (reference image, modification text)-pair to an image embedding that is then matched against a large...

Citations

... Early approaches used Fisher vectors [32] with linear classifiers [8]. Recent methods use deep learning techniques such as CNN [18,20]. For instance, Jiang [18] classify patent figures by type and IPC class using Dual VGG19 (Visual Geometry Group) network [41]. ...
... Recent methods use deep learning techniques such as CNN [18,20]. For instance, Jiang [18] classify patent figures by type and IPC class using Dual VGG19 (Visual Geometry Group) network [41]. ...
Preprint
Full-text available
Patent figure classification facilitates faceted search in patent retrieval systems, enabling efficient prior art search. Existing approaches have explored patent figure classification for only a single aspect and for aspects with a limited number of concepts. In recent years, large vision-language models (LVLMs) have shown tremendous performance across numerous computer vision downstream tasks, however, they remain unexplored for patent figure classification. Our work explores the efficacy of LVLMs in patent figure visual question answering (VQA) and classification, focusing on zero-shot and few-shot learning scenarios. For this purpose, we introduce new datasets, PatFigVQA and PatFigCLS, for fine-tuning and evaluation regarding multiple aspects of patent figures~(i.e., type, projection, patent class, and objects). For a computational-effective handling of a large number of classes using LVLM, we propose a novel tournament-style classification strategy that leverages a series of multiple-choice questions. Experimental results and comparisons of multiple classification approaches based on LVLMs and Convolutional Neural Networks (CNNs) in few-shot settings show the feasibility of the proposed approaches.
... Some studies leverage the efficient feature learning and representation capabilities of neural networks to extract feature vectors for tasks such as image retrieval and patent recommendation. Jiang et al. [51] used a novel neural network architecture named Dual-VGG to perform two tasks on patent images: visual material type prediction and IPC category label prediction. Higuchi and Yanai [24] applied an architecture combining Transformer and metric learning to a patent image dataset for retrieval. ...
Article
Full-text available
Patent infringement analysis (PIA) represents a critical task in patent circumvention design. It aims to identify the likelihood of infringement for target technologies to enhance product innovation, serving as a crucial measure for technical protection. Traditional PIA processes rely on examiners’ subjective judgments on the patent’s technical advantages and textual similarity, leading to the infringement results heavily dependent on personal experience. Pervious similarity calculation models based on keywords or textual content lack the utilization of knowledge of functions, structures, and features within patents and their interrelations, resulting in inaccurate assessments of technological infringement for similar patents. Furthermore, the potential of patent images in infringement analysis remains underutilized. To overcome the issues, a patent knowledge graph (PKG) driven patent similarity calculation model fusing graph similarity (GS) and image similarity (IS) is proposed to achieve the prediction of the probability of patent infringement using patent text and structure images as multimodal data. First, an ontology model based on requirement-function-structure-location features (RFSL) is constructed. Four entity types are extracted from patent texts and images using a fine-tuned named entity recognition (NER) model combined with semantic relation analysis. Second, syntactic matching rules for eight types of entity relationships are constructed to extract triples, mapping patent texts into graph networks via the PKG. Finally, the integrated Graphical Neural Network (GNN) and Convolutional Neural Network (CNN) are integrated to calculate the overall similarity between the newly filed patent and the comparison patent to output the infringement probability. A case study of steel pipe welding device design is used to validate the proposed approach, then the comparison results confirm the potential of fusion GS and IS in the application in PIA.
... Sarada et al., 2019;Balsmeier et al., 2018). Regarding the matching of the patent families, automated approaches are used with patent data in similar task such as measuring patent similarity in patent text (Kelly et al. 2021;Seegmiller et al., 2023;Helmers et al., 2019) or patent drawings (Jiang et al., 2020, historical image analysis in general, see Wevers and Smits, 2019). The patent drawings are also language independent which enables to avoid the problems with translations. ...
... Details of the inventions proposed in patents are typically presented using text and images [14]. Different visualization types are used to efficiently convey information, e.g., block diagrams, graphs, and technical drawings [9]. In some cases, technical drawings are illustrated in more than one perspective (e.g., top or front view) to depict details [32]. ...
... According to a recent survey paper on patent analysis [14], there has been a lot of progress in tasks like patent retrieval [24,35,30,23] and patent image classification [9,32,15] due to the advancements in deep learning. We mainly focus on image classification since visualizations contain important information about patents [14,5,11]. ...
... So far, there have been some approaches for image type classification in scientific documents [19,10] but the application domain and images differ in terms of style, structure, etc. compared to patent images. For patents specifically, Jiang et al. [9] suggested a deep learning model for image type classification and applied it to the CLEF-IP 2011 dataset [23]. However, existing datasets [23,9,15] on image type classification contain different classes that miss some important types used in patents. ...
Preprint
Full-text available
Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
... Kwon et al. explored the use of image-based search to find visually similar examples to aid alternative-use concept generation (2019). Visual information, along with topic-level international patent classification (IPC) labels, have also been used in the retrieval of images from patent documents (Jiang et al., 2020(Jiang et al., , 2021. Jiang et al. used a convolutional neural network-based method to perform imagebased search using visual similarity and shared domain knowledge. ...
Article
Full-text available
Inspirational stimuli are known to be effective in supporting ideation during early-stage design. However, prior work has predominantly constrained designers to using text-only queries when searching for stimuli, which is not consistent with real-world design behavior where fluidity across modalities (e.g., visual, semantic, etc.) is standard practice. In the current work, we introduce a multi-modal search platform that retrieves inspirational stimuli in the form of 3D-model parts using text, appearance, and function-based search inputs. Computational methods leveraging a deep-learning approach are presented for designing and supporting this platform, which relies on deep-neural networks trained on a large dataset of 3D-model parts. This work further presents the results of a cognitive study ( n = 21) where the aforementioned search platform was used to find parts to inspire solutions to a design challenge. Participants engaged with three different search modalities: by keywords, 3D parts, and user-assembled 3D parts in their workspace. When searching by parts that are selected or in their workspace, participants had additional control over the similarity of appearance and function of results relative to the input. The results of this study demonstrate that the modality used impacts search behavior, such as in search frequency, how retrieved search results are engaged with, and how broadly the search space is covered. Specific results link interactions with the interface to search strategies participants may have used during the task. Findings suggest that when searching for inspirational stimuli, desired results can be achieved both by direct search inputs (e.g., by keyword) as well as by more randomly discovered examples, where a specific goal was not defined. Both search processes are found to be important to enable when designing search platforms for inspirational stimuli retrieval.
... Here, the extraction of salient features is especially cumbersome and making use of deep learning to learn suitable representations is extremely beneficial. As input serve the raw pixels from images of sizes between 100 x 100 and 300 x 300 [48,60,41]. ...
... CNNs have less parameters to train than more complex architectures such as RNNs, which makes them attractive also for textual data. In the context of patent analysis, CNNs were deployed to solve different tasks related to image data [48,60,41] as well as text data [61,58,59,67,57,2,75,105,24,63]. ...
... Even when solving the problem of the non-differentiable selection of the next token to generate a sentence with an RNN [100], the output of these GAN models is far away from genuine-looking texts. Besides [48,60,41] generating texts, GANs can also be used to generate other types of data. There is an approach using GANs to generate the features of artificial samples to create more training data for standard machine learning approaches in the patent domain [104]. ...
Article
Full-text available
Patent document collections are an immense source of knowledge for research and innovation communities worldwide. The rapid growth of the number of patent documents poses an enormous challenge for retrieving and analyzing information from this source in an effective manner. Based on deep learning methods for natural language processing, novel approaches have been developed in the field of patent analysis. The goal of these approaches is to reduce costs by automating tasks that previously only domain experts could solve. In this article, we provide a comprehensive survey of the application of deep learning for patent analysis. We summarize the state-of-the-art techniques and describe how they are applied to various tasks in the patent domain. In a detailed discussion, we categorize 40 papers based on the dataset, the representation, and the deep learning architecture that were used, as well as the patent analysis task that was targeted. With our survey, we aim to foster future research at the intersection of patent analysis and deep learning and we conclude by listing promising paths for future work.
... The correct pairwise between problems and corresponding solutions has great value for engineers to capture the hidden inventive details in patents. Recent years, most of researchers focus on making use of images [13,9], tabulations [13] or novel proposals [19] in patent documents to facilitate TRIZ or R&D activities. Nevertheless, only a few of research works notice the hidden value of relation between problems and corresponding solutions in IDM-related knowledge. ...
Chapter
Full-text available
Inventive Design Method mostly relies on the presence of exploitable knowledge. It has been elaborated to formalize some aspects of TRIZ being expert-dependent. Patents are appropriate candidates since they contain problems and their corresponding partial solutions. When associated with patents of different fields, problems and partial solutions constitute a potential inventive solution scheme for a target problem. Nevertheless, our study found that links between these two major components are worth studying further. We postulate that problem-solution effectively matching contains a hidden value to automate the solution retrieval and uncover inventive details in patents in order to facilitate R&D activities. In this paper, we assimilate this challenge to the field of the Question Answering system instead of the traditional syntactic analysis approaches and proposed a model called IDM-Matching. Technically, a state-of-the-art neural network model named XLNet in the Natural Language Processing field is combined into our IDM-Matching to capture the corresponding partial solution for the given query that we masked using the related problem. Then we construct links between these problems and solutions. The final experimental results on the real-world U.S. patent dataset illustrates our model’s ability to effectively match IDM-related knowledge with each other. A detailed case study is demonstrated to prove the usage and latent perspective of our proposal in the TRIZ field.
Chapter
We introduce a new large-scale patent dataset termed PDTW150K for patent drawing retrieval. The dataset contains more than 150,000 patents associated with text metadata and over 850,000 patent drawings. We also provide a set of bounding box positions of individual drawing views to support constructing object detection models. We design some experiments to demonstrate the possible ways of using PDTW150K, including image retrieval, cross-modal retrieval, and object detection tasks. PDTW150K is available for download on GitHub [1].
Article
Design artifacts provide a mechanism for illustrating design information and concepts, but their effectiveness relies on alignment across design agents in what these artifacts represent. This work investigates the agreement between multi-modal representations of design artifacts by humans and artificial intelligence (AI). Design artifacts are considered to constitute stimuli designers interact with to become inspired (i.e., inspirational stimuli), for which retrieval often relies on computational methods using AI. To facilitate this process for multi-modal stimuli, a better understanding of human perspectives of non-semantic representations of design information, e.g., by form or function-based features, is motivated. This work compares and evaluates human and AI-based representations of 3D-model parts by visual and functional features. Humans and AI were found to share consistent representations of visual and functional similarities, which aligned well to coarse, but not more granular, levels of similarity. Human-AI alignment was higher for identifying low compared to high similarity parts, suggesting mutual representation of features underlying more obvious than nuanced differences. Human evaluation of part relationships in terms of belonging to same or different categories revealed that human and AI-derived relationships similarly reflect concepts of “near” and “far”. However, levels of similarity corresponding to “near” and “far” differed depending on the criteria evaluated, where “far” was associated with nearer visually than functionally related stimuli. These findings contribute to a fundamental understanding of human evaluation of information conveyed by AI-represented design artifacts needed for successful human-AI collaboration in design.