Pinar Duygulu

Pinar Duygulu
Hacettepe University · Department of Computer Engineering

PhD

About

128
Publications
31,326
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,502
Citations
Additional affiliations
May 2015 - present
Hacettepe University
Position
  • Professor (Associate)
February 2014 - July 2015
Carnegie Mellon University
Position
  • Research Associate
September 1996 - February 2003
Middle East Technical University
Position
  • Research Assistant

Publications

Publications (128)
Article
Full-text available
A significant bottleneck in building large-scale systems for image and video categorization is the requirement of labeled data. Manual labeling effort could be overcome by using the massive amount of web data. However, this type of data is collected through searching on the category names and is likely to inherit noise. In this study, (1) the prima...
Chapter
Assessing surgical skills is an essential part of medical performance evaluation and expert training. Since it is typically conducted as a subjective task by individuals, it may lead to misinterpretations of the skill performance and hence lead to suboptimal training and organization of the surgical activities. Therefore, objective assessment of su...
Preprint
With the introduction of large-scale datasets and deep learning models capable of learning complex representations, impressive advances have emerged in face detection and recognition tasks. Despite such advances, existing datasets do not capture the difficulty of face recognition in the wildest scenarios, such as hostile disputes or fights. Further...
Conference Paper
This paper is motivated from a young boy's capability to recognize an illustrator's style in a totally different context. In the book "We are All Born Free" [1], composed of selected rights from the Universal Declaration of Human Rights interpreted by different illustrators, the boy was surprised to see a picture similar to the ones in the "Winnie...
Article
Full-text available
This paper is motivated from a young boy's capability to recognize an illustrator's style in a totally different context. In the book "We are All Born Free" [1], composed of selected rights from the Universal Declaration of Human Rights interpreted by different illustrators, the boy was surprised to see a picture similar to the ones in the "Winnie...
Conference Paper
The automatic recognition of emotional states are quite researched problem. In this work, emotion recognition problem was investigated and the system has been developed that can recognize expressions of happy, sad, surprised, angry, afraid, disgusted and neutral from an image. Previously, a feature vector was extracted for the detected face and the...
Conference Paper
In this paper, a framework for human hand movement recognition is implemented for humans to play rock-paper-scissors game interactively against computer using a webcam. The game is the classic rock-paper-scissors game; however, first player is human and second player is computer. Human hand movements are recognized using a webcam, and the game is f...
Conference Paper
Recognizing sign language is an important interest area since there are many speech and hearing impaired people in the world. They need to be understood by other people and understand them as well. Unfortunately, the number of people who have the knowledge of sign language is not many. In order to communicate with handicapped people, existence of s...
Article
Full-text available
In this study, we address the problem of matching patterns in Kufic calligraphy images. Being used as a decorative element, Kufic images have been designed in a way that makes it difficult to be read by non-experts. Therefore, available methods for handwriting recognition are not easily applicable to the recognition of Kufic patterns. In this study...
Article
Tracking multiple players is crucial to analyze soccer videos in real time. Yet, rapid illumination changes and occlusions among players who look similar from a distance make tracking in soccer very difficult. Particle-filter-based approaches have been utilized for their ability in tracking under occlusion and rapid motions. Unlike the common pract...
Article
Full-text available
Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are difficult to segment into words without...
Conference Paper
Full-text available
We attack the problem of learning concepts automatically from noisy Web image search results. The idea is based on discovering common characteris-tics shared among subsets of images by posing a method that is able to organise the data while eliminating irrelevant instances. We propose a novel clustering and outlier detection method, namely Concept...
Conference Paper
Due to increasing hospital costs and traveling time, more and more patients decide to use medical devices at home without traveling to the hospital. However, these devices are not always very straight-forward for usage, and the recent reports show that there are many injuries and even deaths caused by the wrong use of these devices. Since human sup...
Data
Full-text available
Article
Full-text available
We attack the problem of learning face models for public faces from weakly-labelled images collected from web through querying a name. The data is very noisy even after face detection, with several irrelevant faces corresponding to other people. We propose a novel method, Face Association through Model Evolution (FAME), that is able to prune the da...
Article
Full-text available
Recognizing fonts has become an important task in document analysis, due to the increasing number of available digital documents in different fonts and emphases. A generic font-recognition system independent of language, script and content is desirable for processing various types of documents. At the same time, categorizing calligraphy styles in h...
Article
Full-text available
Unusual events are important as being possible indicators of undesired consequences. Moreover, unusualness in everyday life activities may also be amusing to watch as proven by the popularity of such videos shared in social media. Discovery of unusual events in videos is generally attacked as a problem of finding usual patterns, and then separating...
Conference Paper
Full-text available
We introduce ConceptVision, a method that aims for high accuracy in categorizing large number of scenes, while keeping the model relatively simpler and efficient for scalability. The proposed method combines the advantages of both low-level representations and high-level semantic categories, and eliminates the distinctions between different levels...
Article
This paper explains the approach proposed by Bilkent - RETINA team for the Retrieving Diverse Social Images task of MediaEval 2014 [1]. We develop a framework which rst removes outliers using one-class support vector machines (SVM) to improve relevance. Second it clusters the eliminated set and retrieves the centroids to diversify the results. We t...
Article
Full-text available
We attack the problem of learning concepts automatically from noisy web image search results. Going beyond low level attributes, such as colour and texture, we explore weakly-labelled datasets for the learning of higher level concepts, such as scene categories. The idea is based on discovering common characteristics shared among subsets of images b...
Conference Paper
In this paper, we introduce a model to classify cooking activities using their visual and temporal coherence information. We fuse multiple feature descriptors for fine-grained activity recognition as we would need every single detail to catch even subtle differences between classes with low inter-class variance. Considering the observation that dai...
Conference Paper
In this study, we propose a system for organizing personal photo collections. Motivated with the fact that people related queries are the most desired ones, we propose a method for labeling faces in photographs. After representing the detected faces based on the descriptors extracted around facial features, the similarities between all faces in the...
Conference Paper
We propose a method to recognize the scene of an image by finding the objects and the colors it contains. We approach this problem by creating a binary vector of detected objects and a histogram of the colors that the image contains. We then use these features to train a random forest classifier in order to determine the scene of each image. For cl...
Conference Paper
Analyzing and interpreting human actions is an important and challenging area of computer vision. Different solutions are used for representing human actions; we prefer to use spatio-temporal interest points for motion descriptors. Besides, the space-time interest point feature space is considerably high-dimensional and it is hard to eliminate the...
Conference Paper
In this work, we study the task of recognizing human actions from noisy videos and effects of noise to recognition performance and propose a possible solution. Datasets available in computer vision literature are relatively small and could include noise due to labeling source. For new and relatively big datasets, noise amount would possible increas...
Article
Authorship attribution and identifying time period of literary works are fundamental problems in quantitative analysis of languages. We investigate two fundamentally different machine learning text categorization methods, Support Vector Machines (SVM) and Naive Bayes (NB), and several style markers in the categorization of Ottoman poems according t...
Conference Paper
This paper is motivated by a book in which artists and illustrators from all over the world offer their personal interpretations of the declaration of human rights in pictures [1]. It was enthusiastic for a young reader to see an illustration of an artist that he already knows from his books . The characters were different, the topic was irrelevant...
Conference Paper
This paper presents an approach for text line segmentation which combines connected component based and projection based information to take advantage of aspects of both methods. The proposed system finds baselines of each connected component. Lines are detected by grouping baselines of connected components belonging to each line by projection info...
Article
Computer vision based athlete tracking systems use different methods to segment players from the background and then track them automatically throughout the video. It is insufficient to know a player's position on the image plane if we want to extract performance analysis of the player. Furthermore, image plane coordinates need to be transformed to...
Article
Within Ottoman Text Archive Project a web interface to aid in uploading, binarization, line and word segmentation, labeling, recognition and testing of the Ottoman Turkish texts has been developed. It became possible to retrieve expert knowledge of scholars working with Ottoman archives through this interface, and apply this knowledge in developing...
Article
Many researches and historians from all around the world are interested in historical Ottoman archives. However, translation of these documents requires competent historians which is not a feasible method in terms of time and cost. Thus, automatic translation of these documents are required. In this paper, preprocessing steps of accessing the Ottom...
Article
In daily life, the selection of a hand tool for a job depends on appereance of the tool and its effect on the objects. The effect determines the affordance of the chosen tool. Aim of this work is to determine the affordances of hand tools based only on their appereance and to build a basis for simple tool usage of humanoid robots. Towards this end,...
Conference Paper
In this study, we propose a weakly-supervised multiple instance learning (MIL) method to improve the results of text-based image search engines. In this approach, ranked image list of search engine for a keyword query is treated as weak-positive input data, and with additional negative input data, multiple instance learning bags are constructed. Th...
Article
In this study, we propose a new method for retrieving and recognizing words in historical documents. We represent word images with a set of line segments. Then we provide a criterion for word matching based on matching the lines. We carry out experiments on a benchmark dataset consisting of manuscripts by George Washington, as well as on Ottoman ma...
Article
Full-text available
In this paper, two image matching methods are adapted to retrieve words in Ottoman documents. The first method is based on Dynamic Time Warping (DTW) method proposed in [7], while the second method is based on the Shape Context descriptor [10]. Firstly, all sub-words in a given Ottoman document are extracted. In the first method, a 4-variant featur...
Article
We address the problem of recognizing actions from arbitrary views for a multi-camera system. We argue that poses are important for understanding human actions and the strength of the pose representation affects the overall performance of the action recognition system. Based on this idea, we present a new view-independent representation for human p...
Conference Paper
Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications ranging from historical investigations to literary analyses. In this work, we use transcribed version of Ottoman literary texts in the Latin alphabet a...
Article
Full-text available
In this paper we present an automatic photo tag expansion method de- signed for photo sharing websites. The purpose of the method is to suggest tags that are relevant to the visual content of a given photo at upload time. Both textual and visual cues are used in the process of tag expansion. When a photo is to be uploaded, the system asks for a cou...
Conference Paper
Full-text available
Repeated patterns, rhymes and redifs, are among the fundamental building blocks of Ottoman Divan poetry. They provide integrity of a poem by connecting its parts and bring a melody to its voice. In Ottoman literature, poets wrote their works by making use of the rhymes and redifs of previous poems according to the nazire (creative imitation) tradit...
Conference Paper
In this paper, we explore the idea of using only pose, without utilizing any temporal information, for human action recognition. In contrast to the other studies using complex action representations, we propose a simple method, which relies on extracting “key poses” from action sequences. Our contribution is two-fold. Firstly, representing the pose...
Article
In this study, we propose a method for finding people in large news photograph and video collections. Our method exploits the multi-modal nature of these data sets to recognize people and does not require any supervisory input. It first uses the name of the person to populate an initial set of candidate faces. From this set, which is likely to incl...
Article
In this paper, as a first step to an easy and convenient way to access the manuscripts of Atatürk with a word based search engine, the preprocessing of digitalized documents and their line and word segmentation is studied. The techniques that are applied on printed documents may not yield satisfactory results. Due to this fact, more developed techn...
Conference Paper
Full-text available
In this study, we present a representation based on a new 3D search technique for volumetric human poses which is then used to recognize actions in three dimensional video sequences. We generate a set of cylinder like 3D kernels in various sizes and orientations. These kernels are searched over 3D volumes to find high response regions. The distribu...
Conference Paper
Full-text available
We propose a graph based method in order to recognize the faces that appear on the web using a small training set. First, relevant pictures of the desired people are collected by querying the name in a text based search engine in order to construct the data set. Then, detected faces in these photographs are represented using SIFT features extracted...
Article
We developed an inexpensive computer vision-based method utilizing an algorithm which differentiates drug-induced behavioral alterations. The mice were observed in an open-field arena and their activity was recorded for 100 min. For each animal the first 50 min of observation were regarded as the drug-free period. Each animal was exposed to only on...
Article
In this study, we are aiming to name faces of people in photographs which are collected from Web. In photographs from the Web, some people appear more frequently than some other people. In addition to that, since there are so many examples of more frequently appearing people, naming of these people is simpler than naming of other less frequently ap...
Article
In this paper, an overview of an application, which aims to make significant improvements on access methods to the online shopping catalogs, is presented. In current online shopping sites, only browsing and semantic based retrieval are provided to the users. In this work, a system is constructed on content based retrieval methods in order to allow...
Article
In pharmacological experiments behavior pattern of laboratory mice, which are under the influence of psychotherapeutic drugs, reveals important clues about efffects of the drug. Behavior analysis of laboratory mice by video processing saves both time and labor. In this work a method which was previously used to recognize human behaviors is adapted...
Article
In this paper, we propose a method to match similar faces despite photos, which are taken from different sources on Internet, could have different scenes, illumination and posing. Interest points are used to recognize faces, and some points are eliminated in order to find best matching points which pair the similar face. Difference between two matc...
Conference Paper
Full-text available
We present a compact representation for human action recognition in videos using line and optical flow histograms. We introduce a new shape descriptor based on the distribution of lines which are fitted to boundaries of human figures. By using an entropy-based approach, we apply feature selection to densify our feature representation, thus, minimiz...
Conference Paper
Full-text available
We propose a method to improve the results of image search engines on the Internet to satisfy users who desire to see relevant images in the first few pages. The method re-ranks the results of text based systems by incorporating visual similarity of the resulting images. We observe that, together with many unrelated ones, results of text based syst...
Conference Paper
Full-text available
In this paper, we approach the problem of understanding human actions from still images. Our method involves representing the pose with a spatial and orientational histogramming of rectangular regions on a parse probability map. We use LDA to obtain a more compact and discriminative feature representation and binary SVMs for classification. Our res...
Conference Paper
In this study, we are aiming to name faces of people in photographs which are collected from Web. In photographs from the Web, some people appear more frequently than some other people. In addition to that, since there are so many examples of more frequently appearing people, naming of these people is simpler than naming of other less frequently ap...
Conference Paper
Full-text available
In this paper, we propose an automatic photo tag expansion system for the community photo collections, such as Flickr. Our aim is to suggest relevant tags for a target photograph uploaded to the system by a user, by incorporating the visual and textual cues from other related photographs. As the first step, the system requires the user to add only...
Conference Paper
Full-text available
We propose a method for recognizing human actions in videos. Inspired from the recent bag-of-words ap- proaches, we represent actions as documents consisting of words, where a word refers to the pose in a frame. Histogram of oriented gradients (HOG) features are used to describe poses, which are then vector quan- tized to obtain pose-words. As an a...
Conference Paper
We present new methods to retrieve words in historical handwritten documents. With the assumption that the words can be seen as images, we used the word spotting idea and search for the words in the documents using image retrieval techniques. Specifically, we proposed two methods, one based on the histogram of gradient orientations and one based on...
Article
Full-text available
Although one of the most common usages of Internet is searching, especõally in image search, the users are not satisfied due to many irrelevant results. In this paper we present a method to identify irrelevant results of image search on thenternet and re-rank the results so that the relevant results will have a higher priority within the list. The...
Article
Although one of the most common usages of Internet is searching, especially in image search, the users are not satisfied due to many irrelevant results. In this paper we present a method to identify irrelevant results of image search on the Internet and re-rank the results so that the relevant results will have a higher priority within the list. Th...
Article
Semantic labeling of large volumes of image and video archives is difficult, if not impossible, with the traditional methods due to the huge amount of human effort required for manual labeling used in a supervised setting. Recently, semi-supervised techniques which make use of annotated image and video collections are proposed as an alternative to...
Conference Paper
Advertising is a powerful tool for promoting the products. Tracking of the commercials is important for planning marketing but performing this process manually is time consuming and error prone. In this study, we propose a method to detect and track commercials in news broadcasts. We classify the commercials with recall and precision values over 90...
Conference Paper
Full-text available
Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory...
Article
People arc the most important subjects in news videos and for proper retrieval of people images; face detection is a very crucial step. However, face detection and recognition in news videos is a very challenging task due to the huge irregularities and high noise level in the data. In addition to that, with different face detection algorithms, the...
Conference Paper
Full-text available
We describe our fourth participation, that includes two high-level feature extraction runs, and one manual search run, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual information, consisting of color, texture and edge-based low-level features, was used. 1
Conference Paper
Full-text available
We describe a "bag-of-rectangles" method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distribution of these rectangular patches. In or- der to...
Conference Paper
Full-text available
In this paper, we propose a new method to search dier- ent instances of a video sequence inside a long video and/or video collection. The proposed method is robust to view point and illumination changes which may occur since the sequences are captured in dieren t times with dieren t cam- eras, and to the dierences in the order and the number of fra...
Conference Paper
Full-text available
Large archives of Ottoman documents are challenging to many his- torians all over the world. However, these archives remain inacces- sible since manual transcription of such a huge volume is difcult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfacto...
Article
Full-text available
Multimedia objects like video clips or captioned images contain data of various modalities such as image, audio, and transcript text. Correlations across dieren t modalities provide infor- mation about the multimedia content, and are useful in applications ranging from summarization to semantic captioning. For discovering cross-modal correlations,...
Conference Paper
Full-text available
We propose a graph based method to improve the perfor- mance of person queries in large news video collections. The method benefits from the multi-modal structure of videos and integrates text and face information. Using the idea that a person appears more frequently when his/her name is mentioned, we first use the speech transcript text to limit o...
Conference Paper
We propose a method to associate names and faces for querying people in large news photo collections. On the assumption that a person’s face is likely to appear when his/her name is mentioned in the caption, first all the faces associated with the query name are selected. Among these faces, there could be many faces corresponding to the queried per...
Conference Paper
Full-text available
We present a new approach to the object recognition prob-lem, motivated by the recent availability of large annotated image and video collections. This approach considers object recognition as the trans-lation of visual elements to words, similar to the translation of text from one language to another. The visual elements represented in feature spa...
Conference Paper
Full-text available
There is a growing need to access historical Ottoman docu- ments stored in large archives and therefore managing tools for automatic searching, indexing and transcription of these documents is required. In this paper, we present a method for the retrieval of Ottoman documents based on word match- ing. The method first successfully segments the docu...
Conference Paper
Full-text available
We propose a new approach to recognize objects and scenes in news videos motivated by the availability of large video collections. This approach considers the recognition problem as the translation of visual elements to words. The correspondences between visual elements and words are learned using the methods adapted from statistical machine transl...
Article
Full-text available
We propose a method to associate names and faces for querying people in large news photo collections. On the assumption that a person's face is likely to appear when his/her name is mentioned in the caption, first all the faces associated with the query name are selected. Among these faces, there could be many faces corresponding to the queried per...
Article
We propose a new approach to object recognition problem motivated by the availability of large annotated image and video collections. Similar to translation from one language to another, this approach considers the object recognition problem as the translation of visual elements to words. The visual elements represented in feature space are first c...
Article
We describe our third participation, that includes one high-level feature extraction run, and two manual and one interactive search runs, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual and textual information were used where visual information consisted of co...
Conference Paper
Full-text available
In this study, we present a systematic evaluation of machine transla- tion methods applied to the image annotation problem. We used the well-studied Corel data set and the broadcast news videos used by TRECVID 2003 as our dataset. We experimented with different models of machine translation with dif- ferent parameters. The results showed that the s...
Conference Paper
Full-text available
In this study, we present a method to extensively reduce the number of retrieved images and increase the retrieval performance for the person queries on the broadcast news videos. A multi-modal approach which integrates face and text information is proposed. A state-of-the-art face detection algorithm is improved using a skin color based method to...
Conference Paper
Full-text available
In this paper we describe a novel approach for jointly model-ing the text and the visual components of multimedia docu-ments for the purpose of information retrieval(IR). We pro-pose a novel framework where individual components are de-veloped to model different relationships between documents and queries and then combined into a joint retrieval fr...
Conference Paper
Full-text available
Detection and removal of commercials plays an important role when searching for important broadcast news video material. Two novel approaches are proposed based on two distinctive characteristics of commercials, namely, repetitive use of commercials over time and distinctive color and audio features. Furthermore, proposed strategies for combining t...
Article
Full-text available
Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-moda...
Conference Paper
Full-text available
We examine the problem of automatic image captioning. Given a training set of captioned images, we want to discover correlations between image features and keywords, so that we can automatically find good keywords for a new image. We experiment thoroughly with multiple design alternatives on large datasets of various content styles, and our propose...