# IBM Research

• Yorktown Heights, New York, United States
Recent publications
Conditional statements are an important part of procedural knowledge as they determine the decision points in the control flow. In order to bootstrap conversation bots and automation tools automatically from natural language procedure documents, it is important to be able to classify the conditional statements accurately and separate the condition and effect correctly. This paper aims at exploring three different techniques to classify and analyze conditional statements and discusses the advantages and drawbacks of each of them. This paper also aims at understanding the drawbacks of the three techniques and overcoming them by building models with better performance.
Binary polynomial optimization is equivalent to the problem of minimizing a linear function over the intersection of the multilinear set with a polyhedron. Many families of valid inequalities for the multilinear set are available in the literature, though giving a polyhedral characterization of the convex hull is not tractable in general as binary polynomial optimization is NP-hard. In this paper, we study the cardinality constrained multilinear set in the special case when the number of monomials is exactly two. We give an extended formulation, with two more auxiliary variables and exponentially many inequalities, of the convex hull of solutions of the standard linearization of this problem. We also show that the separation problem can be solved efficiently.
Non-equilibrium quasiparticles are possible sources for decoherence in superconducting qubits because they can lead to energy decay or dephasing upon tunneling across Josephson junctions (JJs). Here, we investigate the impact of the intrinsic properties of two-dimensional transmon qubits on quasiparticle tunneling (QPT) and discuss how we can use quasiparticle dynamics to gain critical information about the quality of JJ barrier. We find the tunneling rate of the non-equilibrium quasiparticles to be sensitive to the choice of the shunting capacitor material and their geometry in qubits. In some devices, we observe an anomalous temperature dependence of the QPT rate below 100 mK that deviates from a constant background associated with non-equilibrium quasiparticles. We speculate that this behavior is caused by high transmission sites/defects within the oxide barriers of the JJs, leading to spatially localized subgap states. We model this by assuming that such defects generate regions with a smaller effective gap. Our results present a unique in situ characterization tool to assess the uniformity of tunnel barriers in qubit junctions and shed light on how quasiparticles can interact with various elements of the qubit circuit.
We study a family of line segment visibility problems, related to classical art gallery problems, which are motivated by monitoring requirements in commercial data centers. Given a collection of non-overlapping line segments in the interior of a rectangle, and a requirement to monitor the segments from one side or the other, we examine the problem of finding a minimal set of point guards. Guards may be placed anywhere in the interior of the rectangle but not on a line segment. We consider combinatorial bounds of problem variants where the problem solver gets to decide which side of the segments to guard or the problem poser gets to decide which side to guard, and many others. We show that virtually all variants are NP-Hard to solve exactly, and then provide heuristics and experimental results to give insight into the associated practical problems. Finally we describe a program for using experiments to guide the search for optimal combinatorial bounds.
The performance of next-generation, nanoelectronic devices relies on a precise understanding of strain within the constituent materials. However, the increased flexibility inherent to these three-dimensional device geometries necessitates direct measurement of their deformation. Here we report synchrotron x-ray diffraction-based non-destructive nanoscale mapping of Si/SiGe nanosheets for gate-all-around structures. We identified two competing mechanisms at different length scales contributing to the deformation. One is consistent with the in-plane elastic relaxation due to the Ge lattice mismatch with the surrounding Si. The second is associated with the out-of-plane layering of the Si and SiGe regions at a length scale of film thickness. Complementary mechanical modeling corroborated the qualitative aspects of the deformation profiles observed across a variety of nanosheet sample widths. However, greater deformation is observed in the SiGe layers of the nanosheets than the predicted distributions. These insights could play a role in predicting carrier mobilities of future devices.
The firing rates of individual neurons displaying mixed selectivity are modulated by multiple task variables. When mixed selectivity is nonlinear, it confers an advantage by generating a high-dimensional neural representation that can be flexibly decoded by linear classifiers. Although the advantages of this coding scheme are well accepted, the means of designing an experiment and analyzing the data to test for and characterize mixed selectivity remain unclear. With the growing number of large datasets collected during complex tasks, the mixed selectivity is increasingly observed and is challenging to interpret correctly. We review recent approaches for analyzing and interpreting neural datasets and clarify the theoretical implications of mixed selectivity in the variety of forms that have been reported in the literature. We also aim to provide a practical guide for determining whether a neural population has linear or nonlinear mixed selectivity and whether this mixing leads to a categorical or category-free representation.
An alternative method to fabricate T- and Γ-gates used for special geometry compound semiconductor high electron mobility transistors is presented. This method utilizes an acrylate/methylstyrene triple resist stack, a single ternary developer consisting of an acetate/alcohol/water mixture, and a proximity effect correction (PEC) image superposition approach that treats the exposed regions in the different resists as independent images and combines them afterward with weighted factors. In the past, most available options required multiple developers or ebeam exposures to form the resist structure of the gate. In this paper, we present a single developer capable of discriminating among three different resists to form the optimal structure for T- and Γ-gates. The PEC image superposition approach approximates that the exposed regions in each resist layer (or image) can be PEC corrected independently from the other images. The use of a gap between images allows for critical dimension control as image edges are not double exposed due to beam spread. Following gap formation and PEC, the corrected images are superimposed on each other after selectively removing areas of common exposure, using the highest dose as the determining dose. This allows a flexible means to accurately provide PEC to complex structures beyond “simple” T-gates and Γ-gates, as demonstrated in this paper.
Patents show how technology evolves in most scientific fields over time. The best way to use this valuable knowledge base is to use efficient and effective information retrieval and searches for related prior art. Patent classification, i.e., assigning a patent to one or more predefined categories, is a fundamental step towards synthesizing the information content of an invention. To this end, architectures based on Transformers, especially those derived from the BERT family have already been proposed in the literature and they have shown remarkable results by setting a new state-of-the-art performance for the classification task. Here, we study how domain adaptation can push the performance boundaries in patent classification by rigorously evaluating and implementing a collection of recent transfer learning techniques, e.g., domain-adaptive pretraining and adapters. Our analysis shows how leveraging these advancements enables the development of state-of-the-art models with increased precision, recall, and F1-score. We base our evaluation on both standard patent classification datasets derived from patent offices-defined code hierarchies and more practical real-world use-case scenarios containing labels from the agrochemical industrial domain. The application of these domain adapted techniques to patent classification in a multilingual setting is also examined and evaluated.
We establish L 𝔮 {L^{\mathfrak{q}}} convergence for Hamiltonian Monte Carlo (HMC) algorithms. More specifically, under mild conditions for the associated Hamiltonian motion, we show that the outputs of the algorithms converge (strongly for 2 ≤ 𝔮 < ∞ {2\leq\mathfrak{q}<\infty} and weakly for 1 < 𝔮 < 2 {1<\mathfrak{q}<2} ) to the desired target distribution. In addition, we establish a general convergence rate for an L 𝔮 {L^{\mathfrak{q}}} convergence given a convergence rate at a specific q * {q^{*}} , and apply this result to conclude geometric convergence in the Euclidean space for HMC with uniformly strongly logarithmic concave target and auxiliary distributions. We also present the results of experiments to illustrate convergence in L 𝔮 {L^{\mathfrak{q}}} .
VM consolidation has been proposed as an effective solution to improve resource utilization and energy efficiency through VM migration. Improper VM placement during consolidation may cause frequent VM migrations and constant on–off switching of PMs, which can significantly hurt QoS and increase energy consumption. Most existing algorithms on efficient VM placement suffer the problem of easily falling into a sub-optimum prematurely since they are heuristic. Also, they do not achieve a good balance between multiple different goals, such as resource utilization, QoS, and energy efficiency. To address this problem, we propose an effective and efficient VM placement approach called MOEA/D-based VM placement, with the goal of optimizing energy efficiency and resource utilization. We develop an improved MOEA/D algorithm to search for a Pareto-compromise solution for VM placement. Our experiment results demonstrate that the proposed multi-objective optimization (MOO) model and VM placement solution have immense potential as it offers significant cost savings and a significant improvement in energy efficiency and resource utilization under dynamic workload scenarios.
Chemical reactions can be classified into distinct categories that encapsulate concepts for how one molecule is transformed into another. One can encode these concepts in rules specifying the set of atoms and bonds that change during a transformation, which is commonly known as a reaction template. While there exist multiple possibilities to represent a chemical reaction in a vector representation, or fingerprint, this is not the case for reaction templates. As a consequence, methods to navigate the space of reaction templates are limited. In this work, we introduce the first reaction template fingerprint. To this end, we follow a data-driven approach relying on a masked language modelling task on SMIRKS strings. We combine unsupervised pre-training with fine-tuning on the classification of templates according to the RXNO ontology, for which we achieve up to 98.4% classification accuracy. We highlight how the learned embeddings can be extracted and used in downstream applications.
Large-scale storage systems employ erasure-coding redundancy schemes to protect against device failures. The adverse effect of latent sector errors on the Mean Time to Data Loss ( $$\text {MTTDL}$$ ) and the Expected Annual Fraction of Data Loss ( $$\text {EAFDL}$$ ) reliability metrics is evaluated. A theoretical model capturing the effect of latent errors and device failures is developed, and closed-form expressions for the metrics of interest are derived. The $$\text {MTTDL}$$ and $$\text {EAFDL}$$ of erasure-coded systems are obtained analytically for (i) the entire range of bit error rates, (ii) the symmetric, clustered, and declustered data placement schemes, and (iii) arbitrary device failure and rebuild time distributions under network rebuild bandwidth constraints. The range of error rates that deteriorate system reliability is derived analytically. For realistic values of sector error rates, the results obtained demonstrate that $$\text {MTTDL}$$ degrades whereas, for moderate erasure codes, $$\text {EAFDL}$$ remains practically unaffected. It is demonstrated that, in the range of typical sector error rates and for very powerful erasure codes, $$\text {EAFDL}$$ degrades as well. It is also shown that the declustered data placement scheme offers superior reliability.
Conversational interfaces require two types of curation: data curation by data science workers and content curation by domain experts. Recent years have seen the possibilities for content curators to instruct conversational machines in the customer service domain (i.e., Machine Teaching). The activities of curating specialized data are time-consuming. These activities have a learning curve for the domain expert, and they rely on collaborators beyond the domain experts, including product owners, technology expert curators, management, marketing, and communication employees. However, recent research has looked at making this task easier for domain experts with a lack of knowledge in the Machine Learning system, and few papers have investigated the work practices and collaborations involved in this role. This paper aims to fill this gap, presenting and unveiling practices extracted from eleven semi-structured interviews and four design workshops with experts in Banking, Technical support, Humans Resources, Telecommunications, and Automotive sectors. First, we investigate the articulation work of the content curators and tech curators in training conversational machines. Second, we inspect the curatorial and collaboration strategies they use, which are not afforded by current conversational platforms. Third, we draw the design implications and possibilities to support individual and collaboration curating practices. We reflect on how those practices rely on self and collaboration with others for curation, trust, and data tracking and ownership.
Background: Health care and well-being are 2 main interconnected application areas of conversational agents (CAs). There is a significant increase in research, development, and commercial implementations in this area. In parallel to the increasing interest, new challenges in designing and evaluating CAs have emerged. Objective: This study aims to identify key design, development, and evaluation challenges of CAs in health care and well-being research. The focus is on the very recent projects with their emerging challenges. Methods: A review study was conducted with 17 invited studies, most of which were presented at the ACM (Association for Computing Machinery) CHI 2020 conference workshop on CAs for health and well-being. Eligibility criteria required the studies to involve a CA applied to a health or well-being project (ongoing or recently finished). The participating studies were asked to report on their projects’ design and evaluation challenges. We used thematic analysis to review the studies. Results: The findings include a range of topics from primary care to caring for older adults to health coaching. We identified 4 major themes: (1) Domain Information and Integration, (2) User-System Interaction and Partnership, (3) Evaluation, and (4) Conversational Competence. Conclusions: CAs proved their worth during the pandemic as health screening tools, and are expected to stay to further support various health care domains, especially personal health care. Growth in investment in CAs also shows the value as a personal assistant. Our study shows that while some challenges are shared with other CA application areas, safety and privacy remain the major challenges in the health care and well-being domains. An increased level of collaboration across different institutions and entities may be a promising direction to address some of the major challenges that otherwise would be too complex to be addressed by the projects with their limited scope and budget.
With the growing amount of chemical data stored digitally, it has become crucial to represent chemical compounds consistently. Harmonized representations facilitate the extraction of insightful information from datasets, and are advantageous for machine learning applications. Compound standardization is typically accomplished using rule-based algorithms that modify undesirable descriptions of functional groups, resulting in a consistent representation throughout the dataset. Here, we present the first deep-learning model for molecular standardization. We enable custom schemes based solely on data, which also support standardization options that are difficult to encode into rules. Our model achieves >98% accuracy in learning two popular rule-based protocols. When fine-tuned on a relatively small dataset of catalysts (for which there is currently no automated standardization practice), the model predicts the expected standardized molecular format with a test accuracy of 62% on average. We show that our model learns not only the grammar and syntax of molecular representations, but also the details of atom ordering, types of bonds, and representations of charged species. In addition, we demonstrate the model's ability to reproduce a canonicalization algorithm with a 95.6% success rate.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
295 members
• Cognitive Healthcare and Life Sciences
• Soft Matter Science
• Thomas J. Watson Research Center
Information