Conference Paper

Auto-Suggestive Real-Time Classification of Driller Memos into Activity Codes for Invisible Lost Time Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Activity codes recorded by drillers are very useful for quantifying invisible lost time (ILT). However, classifying more than 100 activity codes accurately and consistently across various rig operations becomes infeasible for human operators. We propose an auto-suggestive system that guides the drillers to the correct codes based on memos they enter into the system. This aims to both eliminate manual classification errors and improve memo entry. The method for extracting activity codes from memos can be broken into the following steps. The first step consists of filtering unnecessary text and vectorizing the memos. The vectors are then re-weighted using the term frequency-inverse document frequency (TFIDF) statistical measure. Next, data resampling helps to create a uniform set of labels for the training data, because there are quite a few important activity codes that appear infrequently with respect to others. Finally, a classifier is trained. It is shown that the finalized model can be used as a real-time auto-suggestive mechanism during the drillers’ data input process. Moreover, its use for cleaning up historical datasets is also explored. This method was implemented on a large historical dataset consisting of 150 wells, and ILT analysis was performed with the original dataset and with the auto-classified dataset. Comparing these results clearly showed that performing analysis on a dataset that has not been properly classified can lead to incorrect and misleading conclusions. Also, this method did not require a manual re-labeling of the dataset for model training. This makes the algorithm readily applicable for any end-user, irrespective of the number of activity codes used. Various classifiers including logistic regression, support vector machine, random forests, naïve Bayes, and multi-layered perceptron were implemented and tested. Given comparable performances, we conclude that a simple and interpretable logistic regression model is best for real-time classification. Tests were also performed to see how many typed words in a memo would be needed before the correct activity code was identified. The results are detailed in this paper. This is the first body of work that has taken drillers’ memos and converted them into activity codes, without the need for a human-classified training dataset. The real-time classifier is very powerful in ensuring clean data at the source and will be particularly useful when implemented on reporting systems for classifying rig activities by IADC activity codes. We further demonstrate the use of the classifier for cleansing historical datasets such that ILT analysis can be done more accurately.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... They address common rig operations, such as drilling, reaming, and coring, and common rig activities, such as pickup, lay-down, and connection. These codes were primarily used for manual reporting, and recent papers have focused on the natural language processing of these digitized reports (Ucherek et al. 2020) to improve usability (and standardization). ...
Article
Full-text available
Automation and digitalization of drilling require shared knowledge about the state of the drilling process: Is the bit on-bottom drilling or is the string in-slips; is there an overpull or is there a formation fluid influx? The research question addressed here is whether it is possible to define clear, sharable, and usable definitions of what a drilling process state is, and an agreed method to calculate it. The method to define the drilling process state originates from the fact that a drilling operation can be described by a set of partial differential equations, respecting boundary conditions. Therefore, the set of possible discrete changes of boundary conditions defines the set of all possible drilling process states. The possible state values for each of these boundary conditions can be clearly defined by a set of logical expressions utilizing boundary values at the partial differential equations. Each boundary condition is called a microstate. If the set of microstates is linearly independent and complete, then the overall state of the drilling process is uniquely described by the state of each of the microstates. The boundary values are either measured or estimated using a digital twin of the drilling process. In either case, an uncertainty is associated with the boundary value. It is therefore possible to estimate the probability of being in one state or another for each of the microstates. This is an important property as often the actual state of the drilling process is uncertain. If several digital twins or measurements are available, it is also possible to use sensor fusion to update the uncertainty of the boundary value. A common drilling process interpretation engine and well-defined drilling process states may help with the coordination of multiple advisors participating in the control of the drilling process. An example is given showing how an event-based drill-a-stand procedure involving several external advisors is automatically executed using a common source for the interpretation of the drilling process state. A shared definition and method of calculation of the drilling process state is a fundamental element of an infrastructure to enable interoperability at the rigsite. This work is part of the Drilling and Wells Interoperability Standard (D-WIS) initiative. D-WIS is a cross-industry work group providing the industry with solutions facilitating interoperability of computer systems at the rigsite.
... They address common rig operations, such as drilling, reaming, and coring, and common rig activities, such as pickup, lay-down, and connection. These codes were primarily used for manual reporting, and recent papers have focused on the natural language processing of these digitized reports (Ucherek et al. 2020) to improve usability (and standardization). ...
Conference Paper
Automation and digitalization of drilling requires shared knowledge about the state of the drilling process: is the bit on-bottom drilling or is the driller making a connection; is the borehole in good condition or is it sloughing? Yet there is no shared, clear and usable definition of what a drilling process state is, nor an agreed method to calculate it. In this paper, we propose a method to clarify the concept of drilling process state. A set of partial differential equations, respecting boundary conditions, can describe drilling operations. The set of all possible discrete changes of boundary conditions, therefore, defines the set of all possible drilling process states. Equality or inequality of logical expressions of at most two boundary values characterizes a discrete change of a boundary condition. For instance, if forces applied to the bit by the formation are zero, this corresponds to an off-bottom condition, while forces greater than zero means that the bit is on-bottom. Such simple logical conditions are microstates, and an orthogonal set of microstates defines a drilling process state. An analysis of the drilling process from the perspective of these microstates defines an orthogonal basis of microstates. It is possible to define uniquely any drilling process state in this orthogonal basis. There are a finite number of possibilities to move from one state to a different state by changing only one single microstate, which leads to the construction of an implicit graph of possible states. In this implicit state graph, the change from one state to another state that corresponds to more than one modification of the microstates corresponds to a path in the graph. However, the microstate basis depends on the type of drilling process. The paper will provide examples of different microstate bases for conventional drilling, backpressure managed pressure drilling, and dual-gradient managed pressure drilling. Microstates also cover abnormal drilling conditions, such as hanging on a ledge, or flow obstruction in the annulus by a pack-off. They are, therefore, more powerful descriptors than "rig activity codes". The required fidelity of the drilling process state depends on its use, for example for controlling drilling equipment (process control), for calculating key performance indicators (process statistics), or for user feedback (human factors engineering). This work is part of the D-WIS initiative (Drilling and Wells Interoperability Standard). D-WIS is a cross-industry workgroup providing the industry with solutions facilitating interoperability of computer systems at the rig site. The definition of a microstate is a simple logical statement, easily implemented in computer software. The paper provides an example of a simple algorithm, which will enable others to leverage the work in the commercial, interoperable, environment.
... For these tasks, it is necessary to analyze manually entered activity memos and code to determine the source of inefficiency. Here ML techniques are very helpful in cleaning this error-prone human entered data, so that the automated analysis can provide accurate insights (Ucherek et al. 2020). ML models are also available to automatically detect drilling dysfunctions and enable higher ROP (Ambrus et al. 2017) (Saini et al. 2020a). ...
Conference Paper
Deep closed-loop geothermal systems (DCLGS) are introduced as an alternative to traditional enhanced geothermal systems (EGS) for green energy production that is globally scalable and dispatchable. Recent modeling work shows that DCLGS can generate an amount of power that is similar to EGS, while overcoming many of the downsides of EGS (such as induced seismicity, emissions to air, mineral scaling etc.). DCLGS wells can be constructed by leveraging and extending oil and gas extended reach drilling (ERD) and high-pressure high-temperature (HPHT) drilling expertise in particular. The objectives of this paper are two-fold. First, we demonstrate that DCLGS wells can generate power / electricity on a scale that is comparable to EGS, i.e. on the order of 40-55 MW per well. To this extent, we have developed a coupled hydraulic-thermal model, validated using oil and gas well cases, that can simulate various DCLGS well configurations. Secondly, we highlight the technology gaps and needs that still exist for economically drilling DCLGS wells, showing that it is possible to extend oil and gas technology, expertise and experience in ERD and HPHT drilling to construct complex DCLGS wells. Our coupled hydraulic-thermal sensitivity analyses show that there are key well drilling and design parameters that will ultimately affect DCLGS operating efficiency, including strategic deployment of managed pressure drilling / operation (MPD/MPO) technology, the use of vacuum-insulated tubing (VIT), and the selection of the completion in the high-temperature rock zones. Results show that optimum design and execution can boost initial geothermal power generation to 50 MW and beyond. In addition, historical ERD and HPHT well experience is reviewed to establish the current state-of-the-art in complex well construction and highlight what specific technology developments require attention and investment to make DCLGS a reality in the near-future (with a time horizon of ∼10 years). A main conclusion is that DCLGS is a realistic and viable alternative to EGS, with effective mitigation of many of the (potentially show-stopping) downsides of EGS. Oil and gas companies are currently highly interested in green, sustainable energy to meet their environmental goals. DCLGS well construction allows them to actively develop a sustainable energy field in which they already have extensive domain expertise. DCLGS offers oil and gas companies a new direction for profitable business development while meeting environmental goals, and at the same time enables workforce retention, retraining and re-deployment using the highly transferable skills of oil and gas workers.
Conference Paper
Novel methods are presented that update a real-time cloud-based kick-detection system introduced in SPE-208770-MS to handle false kick identifications caused by rig operations. A common weakness in kick-detection systems is false indications of kicks due to rig operations and drilling practices that cause changes in tank volumes. In this work, we will discuss the modifications made to the existing real-time kick-detection system to handle rig operational practices and reduce false positives. The existing kick-detection system analyzes the trends in the drilling data such as tank volumes, flow rates, and pump rates to detect well control events. Extensive field use of this system showed that the rig operations such as transfers between tanks, tank swaps, and adding material to the active tanks have a severe impact on the false-positive rates. Two approaches were developed to handle such operational practices: - Transfer identification: identify transfers between monitored tanks - Comment watcher: Evaluate the rig-memos to check if they might identify an operation that explains the variation in the tank volumes. These approaches were tested with historical wells and live wells. Transfers were identified in several historical wells with help from the operator subject matter experts (SMEs). Thresholds such as the rate of transfer and the window size were tuned to optimally identify transfers. The tuned algorithms correctly identified transfers between monitored tanks with more than 85% accuracy. This workflow was added to the existing kick-detection framework. The efficiency of the kick detection logic depends on dynamically adjusting various thresholds. If any transfers were identified, the thresholds were reset which helped further reduce the false positives by 20% - 25%. For the comment watcher, a keyword library was developed with help from the operator SMEs. This library contained a list of keywords that the rig crew frequently uses in the rig memos to describe the operations. Each keyword from the library was mapped to an alarm type to be suppressed. A workflow was implemented to identify if a rig memo contains a keyword and suppress the respective alarm. The comment watcher feature was then implemented on historical wells along with the transfer identification. These updates resulted in a 40% reduction in false positives while maintaining a 100% true positive identification rate. This work improves the accuracy and efficacy of a previously presented (SPE 208770) real-time cloud- based kick-identification system by detecting and avoiding the impact of rig operations. Features such as transfer identification and comment watcher are added to determine if the changes in the tank volumes can be attributed to rig operations. This update was tested on historical wells and live wells. Working together, these features helped reduce the false positives by up to 40%.
Conference Paper
The well completion planning process can be more effective by moving to a digital platform which enables automation of many frequent decisions, repetitive tasks and calculations performed by completion engineers. This can potentially reduce manhours, improve quality of the completion programs, improve HSE and performance in operations. As the industry moves towards automation of the rig equipment, digitalizing the well planning process can further aid rig automation coupling the digital completion program and procedures with the rig equipment execution platform. The initial step in the project was to establish the Basis of Design and completion operational decisions into programmed machine logic. This logic is based on digital experience developed from standards and industry best practices. This work shows how digital and automated well completion planning can be applied to improve the well completion selection workflow.
Conference Paper
Digital transformation is becoming a major goal for oil and gas (O&G) companies, and one major component of digital transformation initiatives is utilizing artificial intelligence (AI) and machine learning (ML) algorithms/techniques to automate critical business processes for the sake of consistency and objectivity. Drilling operations classification and coding is always challenging due to the subjectivity of end users; however, using machine learning and big data to automate operations classification based on natural language operations descriptions provided by drilling personnel can ensure objective and consistent classification. In this paper, a new approach is introduced for using ML algorithms to classify drilling operations based on the operation description provided by rig personnel in their morning reports, with time broken down using natural language. The new ML predictive model learns from historical data how to define the proper coding of drilling operations based on operation description to minimize human interaction with the coding system for drilling operations. The approach utilizes a set of prediction models to predict the proper code combinations that classify the activity carried out on the rig floor. Each model of these proposed models will feed its prediction into the next model to define the next level of code predictions. The model was trained using around 800,000 records to define the coding patterns based on operation remarks and then predicted a multi-level operational coding for any given operational remarks. The model has been tested and validated with around 800,000 records from datasets from multiple years, and showed significate results when compared to recent reports and subject matter expert evaluations showing a very good level of consistency and level of accuracy of more than 80%. The result of this work is an ML classification model that can reduce daily morning report data entry by 40% and improve the operational classification quality and consistency significantly. This approach also improves internal processes such as services invoicing verification, bit selection and end-of-well reporting.
Conference Paper
Full-text available
The daily drilling report (DDR) contains the daily activities and parameters during drilling and completion (D&C) operations that can be used to identify the bottlenecks and improve efficiency. However, the datasets are large, unstructured, text heavy, not correlated to other datasets, and contain numerous gaps and errors. Thus, conducting any meaningful drilling analytics becomes cumbersome. In this paper, an innovative method is introduced to automatically clean the data and extract intelligent analytics and opportunities from these reports. Natural language processing (NLP) and deep neural network (DNN) models are developed to extract information from unstructured DDRs. Numbers of interest (such as depths, hole sizes, casing sizes, setting depths, etc.) are extracted from text. Drilling phase, non-productive time (NPT) and the associated types are predicted with DNN models. With 30% of the dataset for training, accuracies achieved on the remaining data include 87.5% for drilling phase, 90.7% for time classifications (productive or non-productive), and 89% for associated NPT types. Then, the D&C datasets are integrated with other data sources such as production, geology, reservoir, etc. to generate a set of crucial drilling and reservoir management metrics. The proposed method was successfully applied to several major oil fields (with total of more than 2,000 wells) in the Middle East, North America, and South America. Here, a case study is presented in which the developed method was applied to more than 200 wells drilled from 2012 to 2016 in a major oil field. By using the proposed method, the data processing and aggregation time that used to take months to accomplish was reduced to only a few days. As a result, major types of NPT were rapidly identified, which include rig-related issues such as repair and maintenance (30%), followed by stuck pipe (23%), hole/mud related issues (such as wellbore stability, mud loss, shale swelling, etc.) (20%), and downhole equipment failures and maintenance (14%). Drilling solutions such as contractual advices, improving the mud formulations, and drilling with a rotary steerable system (RSS) were proposed to possibly mitigate the NPT and improve drilling efficiency. Implementation of the proposed solutions eventually resulted in reducing the drilling time and improving capital efficiency. Novel technologies such as NLP, data mining, and machine learning are applied to rapidly QC, mine, integrate and analyze large volumes of D&C data. In addition, this novel approach assists D&C obstacles identification and future plan optimization with evident benefits for improving performance and capital efficiency from a reservoir management perspective.
Article
Full-text available
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Conference Paper
Stopword removal has traditionally been an integral step in information retrieval pre-processing. In this paper, we question the utility of this step in retrieving relevant documents for verbose queries on standard datasets. We show that stopword removal does not lead to noticeable difference in retrieval performance as opposed to not removing them. We observe this phenomenon in 7 FIRE test collections for 4 Indian languages, Bangla, Hindi, Gujarati and Marathi, as well as for European languages such as Czech (CLEF 2007) and Hungarian (CLEF 2005 to 2007). Since these languages are inflective, the stopword lists are not significant. More interestingly, for languages such as English (TREC678 Ad Hoc) and French (CLEF 2005 to 2007), stopword removal leads to a statistically significant drop in performance. This is due to using a generic stopword list that does not suit in many document retrieval tasks.
Article
We consider the long-standing problem of the automatic generation of regular expressions for text extraction, based solely on examples of the desired behavior. We investigate several active learning approaches in which the user annotates only one desired extraction and then merely answers extraction queries generated by the system. The resulting framework is attractive because it is the system, not the user, which digs out the data in search of the samples most suitable to the specific learning task. We tailor our proposals to a state-of-the-art learner based on Genetic Programming and we assess them experimentally on a number of challenging tasks of realistic complexity. The results indicate that active learning is indeed a viable framework in this application domain and may thus significantly decrease the amount of costly annotation effort required. We consider the long-standing problem of the automatic generation of regular expressions for text extraction, based solely on examples of the desired behavior. We investigate several active learning approaches in which the user anno- tates only one desired extraction and then merely answers extraction queries generated by the system. The resulting framework is attractive because it is the sys- tem, not the user, which digs out the data in search of the samples most suitable to the specific learning task. We tailor our proposals to a state-of-the-art learner based on Genetic Programming and we assess them experimentally on a num- ber of challenging tasks of realistic complexity. The results indicate that active learning is indeed a viable framework in this application domain and may thus significantly decrease the amount of costly annotation effort required.
Article
Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this article, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalizing lexical variants. Our method uses a classifier to detect lexical variants, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn't require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter.
Article
This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of mwes; the intuition is that a noun within a mwe cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributional similarity measures), which gives us clusters of semantically related nouns. Next, a number of statistical measures -- based on selectional preferences --- is developed that formalize the intuition of non-compositionality. Our approach has been tested on Dutch, and automatically evaluated using Dutch lexical resources.
Article
Although the majority of conceptlearning systems previously designed usually assume that their training sets are well-balanced, this assumption is not necessarily correct. Indeed, there exist many domains for which one class is represented by a large number of examples while the other is represented by only a few. The purpose of this paper is 1) to demonstrate experimentally that, at least in the case of connectionist systems, class imbalances hinder the performance of standard classifiers and 2) to compare the performance of several approaches previously proposed to deal with the problem. 1 Introduction As the field of machine learning makes a rapid transition from the status of "academic discipline " to that of "applied science", a myriad of new issues, not previously considered by the machine learning community, is now coming into light. One such issue is the class imbalance problem. The class imbalance problem corresponds to domains for which one class is represented by a large n...