Activity codes recorded by drillers are very useful for quantifying invisible lost time (ILT). However, classifying more than 100 activity codes accurately and consistently across various rig operations becomes infeasible for human operators. We propose an auto-suggestive system that guides the drillers to the correct codes based on memos they enter into the system. This aims to both eliminate manual classification errors and improve memo entry.
The method for extracting activity codes from memos can be broken into the following steps. The first step consists of filtering unnecessary text and vectorizing the memos. The vectors are then re-weighted using the term frequency-inverse document frequency (TFIDF) statistical measure. Next, data resampling helps to create a uniform set of labels for the training data, because there are quite a few important activity codes that appear infrequently with respect to others. Finally, a classifier is trained. It is shown that the finalized model can be used as a real-time auto-suggestive mechanism during the drillers’ data input process. Moreover, its use for cleaning up historical datasets is also explored.
This method was implemented on a large historical dataset consisting of 150 wells, and ILT analysis was performed with the original dataset and with the auto-classified dataset. Comparing these results clearly showed that performing analysis on a dataset that has not been properly classified can lead to incorrect and misleading conclusions. Also, this method did not require a manual re-labeling of the dataset for model training. This makes the algorithm readily applicable for any end-user, irrespective of the number of activity codes used. Various classifiers including logistic regression, support vector machine, random forests, naïve Bayes, and multi-layered perceptron were implemented and tested. Given comparable performances, we conclude that a simple and interpretable logistic regression model is best for real-time classification. Tests were also performed to see how many typed words in a memo would be needed before the correct activity code was identified. The results are detailed in this paper.
This is the first body of work that has taken drillers’ memos and converted them into activity codes, without the need for a human-classified training dataset. The real-time classifier is very powerful in ensuring clean data at the source and will be particularly useful when implemented on reporting systems for classifying rig activities by IADC activity codes. We further demonstrate the use of the classifier for cleansing historical datasets such that ILT analysis can be done more accurately.