[show abstract][hide abstract] ABSTRACT: In this paper, we analyze the performance of an end-to-end Mortgage Origination (MO) process. The process begins with the submission of a mortgage application by an applicant to a lender and ends with one of the following outcomes: closing, i.e., loan approved by the lender and accepted by the applicant or non-closing, i.e., loan either rejected by the lender, or approved by the lender and not accepted by the applicant. Ranking mortgage applications by their predicted likelihood of closing at various steps in the process is useful for process efficiency and identification of actionable insights to convert applications likely to non-close into those that are likely to close. To build models for ranking applications at any step of the MO process, we take into account customer and product specific attributes of the applications as well as environment attributes and the history of the applications or workflow.The large state-space of the workflow makes the ranking problem challenging. We propose two workflow attributes, each with a state-space of dimension one, based on the number of visits to any step and a particular step (re-work) respectively. We find that incorporating these workflow attributes into the density modeling technique that we develop results in improvement of 4:8 percent in Average Precision over models that only incorporate customer, product and environment attributes. The simple and scalable density modeling technique allows for easy identification of applications that are likely to non-close and consequent corrective action such as change in the attributes of the mortgage product being offered. Further, our results indicate that the model is comparable to Support Vector Machines and superior to Logistic Regression for ranking.
IEEE International Conference on Cloud Computing, CLOUD 2009, Bangalore, India, 21-25 September, 2009; 01/2009
[show abstract][hide abstract] ABSTRACT: Among the various types of semantic concepts modeled, events pose the greatest challenge in terms of computational power needed to represent the event and accuracy that can be achieved in modeling it. We introduce a novel low-level visual feature that summarizes motion in a shot. This feature leverages motion vectors from MPEG-encoded video, and aggregates local motion vectors over time in a matrix, which we refer to as a motion image. The resulting motion image is representative of the overall motion in a video shot, having compressed the temporal dimension while preserving spatial ordering. Building motion models using this feature permits us to combine the power of discriminant modeling with the dynamics of the motion in video shots that cannot be accomplished by building generative models over a time series of motion features from multiple frames in the video shot. Evaluation of models built using several motion image features in the TRECVID 2005 dataset shows that use of this novel motion feature results an average improvement in concept detection performance by 140% over existing motion features. Furthermore, experiments also reveal that when this motion feature is combined with static feature representations of a single keyframe from the shot such as color and texture features, the fused detection results in an improvement between 4 to 12% over the fusion across the static features alone.
Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR 2007, Amsterdam, The Netherlands, July 9-11, 2007; 01/2007
[show abstract][hide abstract] ABSTRACT: We propose a greedy performance driven algorithm for learning how to fuse across multiple classification and search systems. We assume a scenario when many such systems need to be fused to generate the final ranking. The algorithm is inspired from Ensemble Learning but takes that idea further for improving generalization capability. Fusion learning is applied to leverage text, visual and model based modalities for 2005 TRECVID query retrieval task. Experiments using the well established retrieval effectiveness measure of mean average precision reveal that our proposed algorithm improves over naive baseline (fusion with equal weights) as well as over Caruana's original algorithm (NACHOS) by 36% and 46% respectively.
Image Processing, 2007. ICIP 2007. IEEE International Conference on; 01/2007
[show abstract][hide abstract] ABSTRACT: Fusion of multimedia streams for enhanced performance is a critical problem for retrieval. However, fusion performance tends to easily overt the hillclimb set used to learn fusion rules. In this paper, we perform fusion learning for multimedia streams using a greedy performance driven algorithm. In our fusion learning paradigm, fused output is a linear combination of multiple classiers or ranked streams. The algorithm is inspired from Ensemble Learning (2) but takes that idea further for improving generalization capability. A key application of our fusion learning algorithm, described in this work, is semantics reinforcement using an ensemble of classiers built using the same training dataset but groundtruth corresponding to dieren t concepts. We expect that classiers built for semantically close concepts should reinforce each other's performance and fusion learning is an excellent post-classication way to reinforce semantics and performance. Fusion learning experiments have been performed on TRECVID 2005 test set. Experiments using the well established retrieval eectiv eness measure of mean average precision reveal that our proposed algorithm improves over the best classier (oracle) by 3.8%. We also present and discuss some interesting and intuitive semantic reinforcement trends observed during fusion learning.
Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR 2007, Amsterdam, The Netherlands, July 9-11, 2007; 01/2007
[show abstract][hide abstract] ABSTRACT: The explosion in multimodal content availability underlines the necessity for content management at a semantic level. We have
cast the problem of detecting semantics in multimedia content as a pattern classification problem and the problem of building
models of multimodal semantics as a learning problem. Recent trends show increasing use of statistical machine learning providing
a computational framework for mapping low level media features to high level semantic concepts. In this chapter we expose
the challenges that these techniques face. We show that if a lexicon of visual concepts is identified a priori, a statistical
framework can be used to build visual feature models for the concepts in the lexicon. Using support vector machine (SVM) classification
we build models for 34 semantic concepts for the TREC 2002 benchmark corpus. We study the effect of number of examples available
for training with respect to their impact on detection. We also examine low level feature fusion as well as parameter sensitivity
with SVM classifiers.
[show abstract][hide abstract] ABSTRACT: We present methods for improving text search retrieval of visual multimedia content by applying a set of visual models of semantic concepts from a lexicon of concepts deemed relevant for the collection. Text search is performed via queries of words or fully qualified sentences, and results are returned in the form of ranked video clips. Our approach involves a query expansion stage, in which query terms are compared to the visual concepts for which we independently build classifier models. We leverage a synonym dictionary and WordNet similarities during expansion. Results over each query are aggregated across the expanded terms and ranked. We validate our approach on the TRECVID 2005 broadcast news data with 39 concepts specifically designed for this genre of video. We observe that concept models improve search results by nearly 50% after model-based re-ranking of text-only search. We also observe that purely model-based retrieval significantly outperforms text-based retrieval on non-named entity queries. ‡
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, ICME 2006, July 9-12 2006, Toronto, Ontario, Canada; 01/2006
[show abstract][hide abstract] ABSTRACT: The popularity of digital media (images, video, audio) is growing in all segments of the market including consumer, media
enterprise, traditional enterprise and Web. Its tremendous growth is a result of the convergence of many factors, including
the pervasive increase in bandwidth to users, general affordability of multimedia-ready devices throughout the digital media
value chain (creation, management, and distribution), growing ease and affordability of creating digital media content, and
growing expectation of the value of digital media in enhancing traditional unstructured and structured information. However,
while digital media content is being created and distributed at far greater amounts than ever before, significant technical
challenges remain for realizing its full business potential. This paper examines some of the research challenges for industry
towards harnessing the full value of digital media.
[show abstract][hide abstract] ABSTRACT: In this paper we unify two supposedly distinct tasks in multimedia retrieval. One task involves answering queries with a few examples. The other involves learning models for semantic concepts, also with a few examples. In our view these two tasks are identical with the only differentiation being the number of examples that are available for training. Once we adopt this unified view, we then apply identical techniques for solving both problems and evaluate the performance using the NIST TRECVID benchmark evaluation data . We propose a combination hypothesis of two complementary classes of techniques, a nearest neighbor model using only positive examples and a discriminative support vector machine model using both positive and negative examples. In case of queries, where negative examples are rarely provided to seed the search, we create pseudo-negative samples. We then combine the ranked lists generated by evaluating the test database using both methods, to create a final ranked list of retrieved multimedia items. We evaluate this approach for rare concept and query topic modeling using the NIST TRECVID video corpus.In both tasks we find that applying the combination hypothesis across both modeling techniques and a variety of features results in enhanced performance over any of the baseline models, as well as in improved robustness with respect to training examples and visual features. In particular, we observe an improvement of 6% for rare concept detection and 17% for the search task.
Proceedings of the 13th ACM International Conference on Multimedia, Singapore, November 6-11, 2005; 01/2005
[show abstract][hide abstract] ABSTRACT: This paper describes multimodal systems for ad-hoc search constructed by IBM for the TRECVID 2003 benchmark of search systems for broadcast video. These systems all use a late fusion of independently developed speech-based and visual content-based retrieval systems and outperform our individual retrieval systems on both manual and interactive search tasks. For the manual task, our best system used a query-dependent linear weighting between speech-based and image-based retrieval systems. This system has mean average precision (MAP) performance 20% above our best unimodal system for manual search. For the interactive task, where the user has full knowledge of the query topic and the performance of the individual search systems, our best system used an interlacing approach. The user determines the (subjectively) optimal weights A and B for the speech-based and image-based systems, where the multimodal result set is aggregated by combining the top A documents from system A followed by top B documents of system B and then repeating this process until the desired result set size is achieved. This multimodal interactive search has MAP 40% above our best unimodal interactive search system.
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on; 06/2004 · 4.63 Impact Factor
[show abstract][hide abstract] ABSTRACT: Semantic understanding of multimedia content is critical in enabling effective access to all forms of digital media data. By making large media repositories searchable, semantic content descriptions greatly enhance the value of such data. Automatic semantic understanding is a very challenging problem and most media databases resort to describing content in terms of low-level features or using manually ascribed annotations. Recent techniques focus on detecting semantic concepts in video, such as indoor, outdoor, face, people, nature, etc. This approach works for a fixed lexicon for which annotated training examples exist. In this paper we consider the problem of using such semantic concept detection to map the video clips into semantic spaces. This is done by constructing a model vector that acts as a compact semantic representation of the underlying content. We then present experiments in the semantic spaces leveraging such information for enhanced semantic retrieval, classification, visualization, and data mining purposes. We evaluate these ideas using a large video corpus and demonstrate significant performance gains in retrieval effectiveness.
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004; 01/2004
[show abstract][hide abstract] ABSTRACT: Semantic multimedia management is necessary for the effective and widespread utilization of multimedia repositories and realizing the potential that lies untapped in the rich multimodal information content. This challenge has driven researchers to devise new algorithms and systems that enable automatic or semi-automatic tagging of large scale multimedia content with rich semantics. An emerging research area is the detection of a predetermined set of semantic concepts that can act as semantic filters and aid in search, and manipulation. The NIST TRECVID benchmark has responded by creating a task that has evaluated the performance of concept detection. Within the scope of this benchmark task, this paper studies trends in the emerging concept detection systems, architectures and algorithms. It also analyzes strategies that have yielded reasonable success, and challenges and gaps that lie ahead.
Proceedings of the 12th ACM International Conference on Multimedia, New York, NY, USA, October 10-16, 2004; 01/2004
[show abstract][hide abstract] ABSTRACT: Media analysis for video indexing is witnessing an increasing influence of statistical techniques. Examples of these techniques include the use of generative models as well as discriminant techniques for video structuring, classification, summarization, indexing, and retrieval. There is increasing emphasis on reducing the amount of supervision and user interaction needed to construct and utilize the semantic models. This paper highlights the statistical learning techniques in semantic multimedia indexing and retrieval. In particular the gamut of techniques from supervised to unsupervised systems will be demonstrated.
Journal of Visual Communication and Image Representation. 01/2004;