November 2024
·
2 Reads
Constraints
Solving combinatorial optimization problems involves a two-stage process that follows the model-and-run approach. First, a user is responsible for formulating the problem at hand as an optimization model, and then, given the model, a solver is responsible for finding the solution. While optimization technology has enjoyed tremendous theoretical and practical advances, the process has remained unchanged for decades. To date, transforming problem descriptions into optimization models remains a barrier to entry. To alleviate users from the cognitive task of modeling, we study named entity recognition to capture components of optimization models such as the objective, variables, and constraints from free-form natural language text, and coin this problem as Ner4Opt. We show how to solve Ner4Opt using classical techniques based on morphological and grammatical properties and modern methods leveraging pre-trained large language models and fine-tuning transformers architecture with optimization-specific corpora. For best performance, we present their hybridization combined with feature engineering and data augmentation to exploit the language of optimization problems. We improve over the state-of-the-art for annotated linear programming word problems. Large-language models (LLMs) are not yet versatile enough to turn text into optimization models or extract optimization entities. Still, when augmented with Ner4Opt annotations, the compilation accuracy of LLM-generated models improves significantly. We open-source our Ner4Opt library, release our training and fine-tuning procedures, and share our trained artifacts. We identify several next steps and discuss important open problems toward automated modeling.