
Liang Zhang- Doctor of Philosophy in Statistics
- Linkedin
Liang Zhang
- Doctor of Philosophy in Statistics
About
28
Publications
6,635
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
556
Citations
Introduction
Current institution
Publications
Publications (28)
Many search systems work with large amounts of natural language data, e.g., search queries, user profiles, and documents. Building a successful search system requires a thorough understanding of textual data semantics, where deep learning based natural language processing techniques (deep NLP) can be of great help. In this paper, we introduce a com...
Many search systems work with large amounts of natural language data, e.g., search queries, user profiles and documents, where deep learning based natural language processing techniques (deep NLP) can be of great help. In this paper, we introduce a comprehensive study of applying deep NLP techniques to five representative tasks in search engines. T...
Cluster-and-aggregate techniques such as Vector of Locally Aggregated Descriptors (VLAD), and their end-to-end discriminatively trained equivalents like NetVLAD have recently been popular for video classification and action recognition tasks. These techniques operate by assigning video frames to clusters and then representing the video by aggregati...
Ranking is the most important component in a search system. Mostsearch systems deal with large amounts of natural language data,hence an effective ranking system requires a deep understandingof text semantics. Recently, deep learning based natural languageprocessing (deep NLP) models have generated promising results onranking systems. BERT is one o...
Search and recommender systems share many fundamental components including language understanding, retrieval and ranking, and language generation. Building powerful search and recommender systems requires processing natural language effectively and efficiently. Recent rapid growth of deep learning technologies has presented both opportunities and c...
Web-based ranking problems involve ordering different kinds of items in a list or grid to be displayed in mediums like a website or a mobile app. In most cases, there are multiple objectives or metrics like clicks, viral actions, job applications, advertising revenue and others that we want to balance. Constructing a serving algorithm that achieves...
The LinkedIn Salary product was launched in late 2016 with the goal of providing insights on compensation distribution to job seekers, so that they can make more informed decisions when discovering and assessing career opportunities. The compensation insights are provided based on data collected from LinkedIn members and aggregated in a privacy-pre...
The recently launched LinkedIn Salary product has been designed with the goal of providing compensation insights to the world's professionals and thereby helping them optimize their earning potential. We describe the overall design and architecture of the statistical modeling system underlying this product. We focus on the unique data mining challe...
Online professional social networks such as LinkedIn serve as a marketplace, wherein job seekers can find right career opportunities and job providers can reach out to potential candidates. LinkedIn's job recommendations product is a key vehicle for efficient matching between potential candidates and job postings. However, we have observed in pract...
The recently launched LinkedIn Salary product has been designed to realize the vision of helping the world's professionals optimize their earning potential through salary transparency. We describe the overall design and architecture of the salary modeling system underlying this product. We focus on the unique data mining challenges in designing and...
Generalized linear model (GLM) is a widely used class of models for statistical inference and response prediction problems. For instance, in order to recommend relevant content to a user or optimize for revenue, many web companies use logistic regression models to predict the probability of the user's clicking on an item (e.g., ad, news article, jo...
This paper describes, Scout, a statistical modeling driven approach to automatically recommend new Point of Presence (PoP) centers for web sites. PoPs help reduce a website’s page download time dramatically. However, where to build the new PoP centers given the current assets of existing ones is a problem that has rarely been studied in a quantitat...
LinkedIn dynamically delivers update activities from a user's interpersonal network to more than 300 million members in the personalized feed that ranks activities according their "relevance" to the user. This paper discloses the implementation details behind this personalized feed system at LinkedIn which can not be found from related work, and ad...
This paper considers an application of showing promotional widgets to web users on the homepage of a major professional social network site. The types of widgets include address book invitation, group join, friends' skill endorsement and so forth. The objective is to optimize user engagement under certain business constraints. User actions on each...
Embodiments are directed towards clustering cookies for identifying unique mobile devices for associating activities over a network with a given mobile device. The cookies are clustered based on a Bayes Factor similarity model that is trained from cookie features of known mobile devices. The clusters may be used to determine the number of unique mo...
Users on an online social network site generate a large number of heterogeneous activities, ranging from connecting with other users, to sharing content, to updating their profiles. The set of activities within a user's network neighborhood forms a stream of updates for the user's consumption. In this paper, we report our experience with the proble...
Embodiments presented herein provide methods, systems and computer program products for determining a count of network users. One method identifies one or more login access requests, from one or more server logs. Each of the one or more login access requests comprises a login cookie, and a user identifier. The method then forms one or more connecte...
Content recommendation on a webpage involves recommending content links (items) on multiple slots for each user visit to maximize some objective function, typically the click-through rate (CTR) which is the probability of clicking on an item for a given user visit. Most existing approaches to this problem assume user's response (click/no click) on...
We describe LASER, a scalable response prediction platform currently used as part of a social network advertising system. LASER enables the familiar logistic regression model to be applied to very large scale response prediction problems, including ones beyond advertising. Though the underlying model is well understood, we apply a whole-system appr...
We consider the problem of building online machine-learned models for detecting auction frauds in e-commence web sites. Since the emergence of the world wide web, online shopping and online auction have gained more and more popularity. While people are enjoying the benefits from online trading, criminals are also taking advantages to conduct fraudu...
Predicting user affinity to items is an important problem in applications
like content optimization, computational advertising, and many more. While
bilinear random effect models (matrix factorization) provide state-of-the-art
performance when minimizing RMSE through a Gaussian response model on explicit
ratings data, applying it to imbalanced bina...
Many large Internet websites are accessed by users anonymously, without requiring registration or logging-in. However, to provide personalized service these sites build anonymous, yet persistent, user models based on repeated user visits. Cookies, issued when a web browser first visits a site, are typically employed to anonymously associate a websi...
We consider the problem of algorithmically recommending items to users on a
Yahoo! front page module. Our approach is based on a novel multilevel
hierarchical model that we refer to as a User Profile Model with Graphical
Lasso (UPG). The UPG provides a personalized recommendation to users by
simultaneously incorporating both user covariates and his...
Online auction and shopping are gaining popularity with the growth of web-based eCommerce. Criminals are also taking advantage of these opportunities to conduct fraudulent activities against honest parties with the purpose of deception and illegal profit. In practice, proactive moderation systems are deployed to detect suspicious events for further...
Predicting user "ratings" on items is a crucial task in recommender systems. Matrix factorization methods that computes a low-rank approximation of the incomplete user-item rating matrix provide state-of-the-art performance, especially for users and items with several past ratings (warm starts). However, it is a challenge to generalize such methods...
Multi-level hierarchical models provide an attractive framework for incorporating correlations induced in a response variable that is organized hierarchically. Model fitting is challenging, especially for a hierarchy with a large number of nodes. We provide a novel algorithmbased on amulti-scale Kalman filter that is both scalable and easy to imple...