
Jon Patrick- BSc, MSc, PhD, Dip LS, Grad Dip Health Psych
- CEO at innovative Clinical Information Management Systems (iCIMS)
Jon Patrick
- BSc, MSc, PhD, Dip LS, Grad Dip Health Psych
- CEO at innovative Clinical Information Management Systems (iCIMS)
About
143
Publications
32,109
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,803
Citations
Introduction
IN 2012 I left the University to start up two companies working in clinical IT. They are Health Language Analytics and iCIMS .
Current institution
innovative Clinical Information Management Systems (iCIMS)
Current position
- CEO
Additional affiliations
January 1981 - June 1994
July 2012 - present
Health Language Analytics
Position
- CEO
Description
- I set up HLA to bring my ideas on clonal language processing to use in health organisations
July 2012 - present
innovative Clinical Information Management Systems (iCIMS)
Position
- CEO
Description
- I set up iCIMS to demonstrate that there is very different way to create clinical information systems that are much more useful that the current methodologies.
Education
January 1974 - December 1977
Publications
Publications (143)
Background:
Data, particularly 'big' data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data.
Objective:
To design a suitable, accurate method for converting large volumes of narrative diagnoses from Australian general practice recor...
Objective This project examined and produced a general practice (GP) based decision support tool (DST), namely POLAR Diversion, to predict a patient's risk of emergency department (ED) presentation. The tool was built using both GP/family practice and ED data, but is designed to operate on GP data alone.
Methods GP data from 50 practices during a d...
Data from health records has always been used for research. However, increasingly attention is turning to the potential of interrogating pre-existing large data sets. These come with a particular set of challenges as the data is usually collected for purposes other than research; often collected for other reasons, and potentially from different sou...
Background:
Every day, patients are admitted to the hospital with conditions that could have been effectively managed in the primary care sector. These admissions are expensive and in many cases are possible to avoid if early intervention occurs. General practitioners are in the best position to identify those at risk of imminent hospital presenta...
Text mining in clinical domain is usually more difficult than general domains (e.g. newswire reports and scientific literature) because of the high level of noise in both the corpus and training data for machine learning (ML). A large number of unknown word, non-word and poor grammatical sentences made up the noise in the clinical corpus. Unknown w...
The application of Natural Language Processing (NLP) methods and resources to clinical and biomedical text has received increased attention over the past years, but progress has been limited by difficulties to access shared tools and resources, difficulties partly caused by clinical data confidentiality requirements. Efforts to increase sharing and...
Automatic Coding and content extraction from imaging reports for cancer registries by TumourTeXtract
Jon Patrick, Pooyan Asgari, Min Li, Health Language Analytics
Introduction
Automation of the process of discovering cancer reports at a radiology service is a precursor to delivery and extraction of content from imaging reports from imaging service...
This paper shows that getting bilingual health-records is possible by means of developing a prototype. We developed the prototype by creating a Clinical Care Information System using a software designed for that purpose. We created the terminology servers for Basque and Spanish that are the base of the system, and we adapted a Basque spell-checker...
To detect negations of medical entities in free-text pathology reports with different approaches, and evaluate their performances.
Three different approaches were applied for negation detection: the lexicon-based approach was a rule-based method, relying on trigger terms and termination clues; the syntax-based approach was also a rule-based method,...
Purpose:
To elevate the level of care to the community it is essential to provide usable tools for healthcare professionals to extract knowledge from clinical data. In this paper a generic translation algorithm is proposed to translate a restricted natural language query (RNLQ) to a standard query language like SQL (Structured Query Language).
Me...
Study objective:
This investigation was initiated after the introduction of a new information system into the Nepean Hospital Emergency Department. A retrospective study determined that the problems introduced by the new system led to reduced efficiency of the clinical staff, demonstrated by deterioration in the emergency department's (ED's) perfo...
Objective:
This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry....
Patient safety is the buzz word in health care. Incident Information Management System (IIMS) is electronic software that stores clinical mishaps narratives in places where patients are treated. It is estimated that in one state alone over one million electronic text documents are available in IIMS. In this paper we investigate the data density ava...
Patient safety is the buzz word in healthcare. Incident Information Management System (IIMS) is electronic software that stores clinical mishaps narratives in places where patients are treated. It is estimated that in one state alone over one million electronic text documents are available in IIMS. In this paper we investigate the data density avai...
A Natural Language processing (NLP) classifier has been developed for the Victorian and NSW Cancer Registries with the purpose of automatically identifying cancer reports from imaging services, transmitting them to the Registries and then extracting pertinent cancer information. Large scale trials conducted on over 40,000 reports show the sensitivi...
The proposal of a special purpose language for Clinical Data Analytics (CliniDAL) is presented along with a general model for expressing temporal events in the language. The temporal dimension of clinical data needs to be addressed from at least five different points of view. Firstly, how to attach the knowledge of time based constraints to queries...
This paper reports on the issues in mapping the terms of a query to the field names of the schema of an Entity Relationship (ER) model or to the data part of the Entity Attribute Value (EAV) model using similarity based Top-K algorithm in clinical information system together with an extension of EAV mapping for medication names. In addition, the de...
Extracting knowledge from data is essential in clinical research, decision making and hypothesis testing. So, providing a general solution to create analytical tools is of prime importance. The objective of this paper is to introduce a special purpose query language, Clinical Data Analytics Language (CliniDAL), based on features in an earlier Clini...
A complete system of Cancer Information Extraction for a population based Cancer Registry is introduced. The analysis involves the classification and annotation of radiology imaging reports to identify the components needed to complete cancer staging and recurrence extraction. Besides traditional supervised learning methods such as Conditional Rand...
When processing a noisy corpus such as clinical texts, the corpus usually contains a large number of misspelt words, abbreviations and acronyms while many ambiguous and irregular language usages can also be found in training data needed for supervised learning. These are two frequent kinds of noise that can affect the overall performance of machine...
A method for automatic extraction of clinical temporal information would be of significant practical importance for deep medical language understanding, and a key to creating many successful applications, such as medical decision making, medical question and answering, etc. This paper proposes a rich statistical model for extracting temporal inform...
This paper is a study of patient notes from an emergency department and a determination of their consistency for correct SNOMED CT encoding of the diagnosis, and comparison with a clinical language processing (CLP) system that determined the SCT diagnoses. Three set of notes were reviewed by a clinician with 500 records in each where: the clinician...
Data that has been annotated by linguists is often considered a gold standard on many tasks in the NLP field. However, linguists are expensive so researchers seek automatic techniques that correlate well with human performance. Linguists working on the ScamSeek project were given the task of deciding how many and which document classes existed in t...
This paper proposes an approach to sentence- level paraphrase identification by text canon- icalization. The source sentence pairs are first converted into surface text that approxi- mates canonical forms. A decision tree learn- ing module which employs simple lexical match- ing features then takes the output canonical- ized texts as its input for...
Recent studies of sentiment classification (determining whether a text is "positive" or "negative") using Appraisal theory have provided mixed results. While some good results have been obtained, it is difficult to tell what aspects of Appraisal are particularly useful for this task. In this paper, we present a series of experiments to isolate feat...
In this study we develop some linguistic bases for mapping between terminologies and demonstrate their application on mapping ICPC-2 PLUS to SNOMED CT (SCT). The Unified Medical Language System (UMLS) metathesau- rus mapping, which utilises the links between ICPC-2 PLUS and SCT terms in the UMLS library mapped 46.5% of ICPC-2 PLUS terms to SCT. Lex...
Many studies have been completed on question classification in the open domain, however only limited work focuses on the medical domain. As well, to the best of our knowledge, most of these medical question classifications were designed for literature based question and answering systems. This paper focuses on a new direction, which is to design a...
Information extraction and classification of clinical data are current challenges in natural language processing. This paper presents a cascaded method to deal with three different extractions and classifications in clinical data: concept annotation, assertion classification and relation classification.
A pipeline system was developed for clinical...
Medical scores and measurements are a very important part of clinical notes as clinical staff infer a patient’s state by analysing
them, especially their variation over time. We have devised an active learning process for rapid training of an engine for
detecting regular patterns of scores, measurements and people and places in clinical texts. Ther...
Misspellings, abbreviations and acronyms are very popular in clinical notes and can be an obstacle to high quality information extraction and classification. In addition, another important part of narrative reports is clinical scores and measurements as doctors infer a patient"s status by analyzing them. We introduce a knowledge discovery process t...
This paper presents a rationale, created from first principles, for the design criteria for the architecture of clinical information systems. The criteria are developed according to the heuristic axiom of Ockham's Razor, presented here for the first time and operationalised in the form of three principles; Generalization, Minimalization and Coverag...
This paper, reports on the results of research which is based originally on the 2009 criteria and corpus of "The Obesity Challenge", defined by Informatics for Integrating Biology and the Bedside (i2b2), a National Center for Biomedical Computing. In the original task, i2b2 asked participants to build software systems that could process a corpus of...
This paper, reports on the results of research which is based originally on the 2009 criteria and corpus of ''The Obesity Challenge", defined by Informatics for Integrating Biology and the Bedside (i2b2), a National Center for Biomedical Computing. In the original task, i2b2 asked participants to build software systems that could process a corpus o...
Medication information comprises a most valuable source of data in clinical records. This paper describes use of a cascade of machine learners that automatically extract medication information from clinical records.
Authors developed a novel supervised learning model that incorporates two machine learning algorithms and several rule-based engines....
The assumption that tacit knowledge cannot be articulated remains dominant in knowledge elicitation. This paper, however, claims that linguistic theory does not support such a position and that language should not be factored out of accounts of tacit knowledge. We argue that Polanyi's (1966, p. 4) widely cited notion that ‘we know more than we can...
Information retrieval and information extraction are significant issues in the medical and health care domains where the accuracy of the retrieved information and obtaining it in a time critical situation are extremely important. In this paper, we propose a novel system, intelligent clinical notes system (ICNS) to help doctors extract needed inform...
There is a great demand for highly accurate and timely Information Retrieval and Information Extraction in medicine and health care. To meet this need, we have developed a novel system, Intelligent Clinical Notes System (ICNS) to assist doctors in retrieving clinical notes based on concept searching. This has required dealing with the both the soft...
The process of identifying specific content in prose is known as Information Extraction and falls in the field of Natural Language Processing (NLP). The process of adopting structured reports throughout pathology could be eased if the required content could be automatically extracted from the prose reports. A study under the aegis of the Quality Us...
There have been few studies of large corpora of narrative notes collected from the health clinicians working at the point of care. This chapter describes the principle issues in analysing a corpus of 44 million words of clinical notes drawn from the Intensive Care Service of a Sydney hospital. The study identifies many of the processing difficultie...
Clinical named entities convey great deal of knowledge in clinical notes. This paper investigates named entity recognition from clinical notes using machine learning approaches. We present a cascading system that uses a Conditional Random Fields model, a Support Vector Machine and a Maximum Entropy to reclassify the identified entities in order to...
Information Extraction, from the electronic clinical record is a comparatively new topic for computational linguists. In order to utilize the records to improve the efficiency and quality of health care, the knowledge content should be automatically encoded; however this poses a number of challenges for Natural Language Processing (NLP). In this pa...
The fast growing content of online articles of clinical case studies provides a useful source for extracting domain-specific knowledge for improving healthcare systems. However, current studies are more focused on the abstract of a published case study which contains little information about the detailed case profiles of a patient, such as symptoms...
There have been few studies of large corpora of narrative notes collected from the health clinicians working at the point of care. This chapter describes the principle issues in analysing a corpus of 44 million words of clinical notes drawn from the Intensive Care Service of a Sydney hospital. The study identifies many of the processing difficultie...
There have been few studies of large corpora of narrative notes collected from the health clinicians working at the point of care. This chapter describes the principle issues in analysing a corpus of 44 million words of clinical notes drawn from the Intensive Care Service of a Sydney hospital. The study identifies many of the processing difficultie...
The emphasis in information systems research is typically on converting tacit knowledge into explicit knowledge (Hershel, Nemati, & Steiger, 2001). Attention is also given to setting up a dichotomy of tacit and explicit knowledge in terms of articulation (can it be carried in language?), codification (can it be turned into an artifact?) or judgment...
A great challenge in sharing data across information systems in general practice is the lack of interoperability between different terminologies or coding schema used in the information systems. Mapping of medical vocabularies to a standardised terminology is needed to solve data interoperability problems.
We present a system to automatically map a...
In order to achieve the full potential of information technology in healthcare, information systems must have the ability to share and exchange data, which requires the support of standardised medical terminology, such as Systematised Nomenclature of Medicine--Clinical Terms (SNOMED-CT). The Royal Prince Alfred Intensive Care Service and the School...
SNOMED CT (SCT) has been designed and implemented in an era when health computer systems generally required terminology representations in the form of singular pre- coordinated concepts. Consequently, much of SCT content represents pre-coordinated concepts and their relationships. In this conceptual paper the role of pre- and post-coordinated termi...
Clinicians write the reports in natural language which contains a large amount of informal medical term. Automating conversion of text into clinical terminologies allows reliable retrieval and analysis of the clinical notes. We have created an algorithm that maps medical expressions in clinical notes into a medical terminology. This algorithm index...
Unlike abstracts, full articles of clinical case studies provide more detailed proles of a patient, such as signs and symptoms, and important laboratory test results of the patient from the diagnostic and treat- ment procedures. This paper proposes a novel mark- up tag set to cover a wide variety of semantics in the description of clinical case stu...
The automatic conversion of free text into a medical ontology can allow computational access to important information currently locked within clinical notes and patient reports. This system introduces a new method for automatically identifying medical concepts from the SNOMED Clinical Terminology in free text in near real time. The system presented...
This paper proposes a machine learning ap-proach to the task of assigning the inter-national standard on classification of dis-eases ICD-9-CM codes to clinical records. By treating the task as a text categorisa-tion problem, a classification system was built which explores a variety of features in-cluding negation, different strategies of mea-surin...
This paper proposes a machine learning approach to the task of assigning the international standard on classification of diseases ICD-9-CM codes to clinical records. By treating the task as a text categorisation problem, a classification system was built which explores a variety of features including negation, different strategies of measuring glos...
Achieving interoperability in sharing and exchanging data between health information systems requires the support of standard medical terminology. To integrate standardised terminology into information systems, there is a need to map legacy interface terminology to a reference terminology. In this study, we mapped ICPC-2 PLUS, the interface termino...
This paper discusses the manner in which SNOMED CT (SCT) has confused the metonymic role of some class labels as holonyms and has inappropriately assigned property inheritance down a holonymic chain due to its transitiveness. The notion of emergent properties is introduced as the only form of property that can exist on a holonym and its use in a hy...
This chapter uses Systemic Functional Linguistic (SFL) theory as a basis for extracting semantic features of documents. We
focus on the pronominal and determination system and the role it plays in constructing interpersonal distance. By using a
hierarchical system model that represents the author’s language choices, it is possible to construct a ri...
A new wrapper induction algorithm WTM for generating rules that describe the general web page layout template is presented. WTM is mainly designed for use in weblog crawling and indexing system. Most weblogs are maintained by content management systems and have similar layout structures in all pages. In addition, they provide RSS feeds to describe...
The Scamseek project, as commissioned by ASIC, has the principal objective of building an industrially viable system that retrieves potential scam candidate documents from the Internet and classifies them as to their potential risk of containing an illegal investment proposals or advice. The project was operated in two stages over 15 months and pro...
Automatic mapping of key concepts from clinical notes to a terminology is an im- portant task to achieve for extraction of the clinical information locked in clinical notes and patient reports. The present pa- per describes a system that automatically maps free text into a medical reference terminology. The algorithm utilises Natu- ral Language Pro...
A logical database model for openEHR was developed to attest to the feasibility of whether openEHR's architecture is practical for an operational information system. A variant of the Entity-Attribute-Value (EAV) modelling technique and entity relationship diagrams (ERDs) are used to design the generic data structures. It was found that the generic...
This research aims to extract detailed clin- ical profiles, such as signs and symptoms, and important laboratory test results of the patient from descriptions of the diagnostic and treatment procedures in journal arti- cles. This paper proposes a novel mark-up tag set to cover a wide variety of semantics in the description of clinical case studies...
We consider the task of automatic classification of clinical incident reports using machine learning methods. Our data consists of 5448 clinical incident reports collected from the Incident Information Management System used by 7 hospitals in the state of New South Wales in Australia. We evaluate the performance of four classification algorithms: d...
We propose a machine learning approach, using a Maximum Entropy (ME) model to construct a Named Entity Recognition (NER) classifier to retrieve biomedical names from texts. In experiments, we utilize a blend of various linguistic features incorporated into the ME model to assign class labels and location within an entity sequence, and a post-proces...
The Scamseek project, as commissioned by the Australian Securities & Investment Commission (ASIC), had the principal objective of building an industrially viable system that retrieves scam candidate texts from the Internet and classifies them as to their potential risk of containing an illegal investment proposal or advice. The value of the system...
This paper presents a named entity classifica-tion system that utilises both orthographic and contextual information. The random subspace method was employed to generate and refine at-tribute models. Supervised and unsupervised learning techniques used in the recombination of models to produce the final results.
We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is...
In this paper we study the effectiveness of using a phrase-based representation in e-mail classifi- cation, and the affect this approach has on a number of machine learning algorithms. We also evaluate various feature selection methods and reduction levels for the bag-of-words representation on several learning algo- rithms and corpora. The results...
A verb particle construction (VPC) classification scheme gleaned from linguistic sources has been used to assess its usefulness for identifying issues in decomposability. Linguistic sources have also been used to inform the features suitable for use in building an automatic classifier for the scheme with a series of good performance results. The no...
This paper describes a system developed for the transformation of English sentences into a first order logical form representation. The metho d- ology is centered on the use of a dependency grammar based parser . We demonstrate the suit- ability of applying a dependency parser based solution to the given task and in turn explain some of the limitat...
Systemic features use linguistically- derived language models as a basis for text classification. The graph structure of these models allows for feature repre- sentations not available with traditional bag-of-words approaches. This paper explores the set of possible represen- tations, and proposes feature selection methods that aim to produce the m...
We present a new collection of training corpora for evaluation of language-independent named entity recognition systems. For
the five languages included in this initial release, Basque, Dutch, English, Korean, and Spanish, we provide an analysis of
the relative difficulty of the NER task for both the language in general, and as a supervised task us...
This paper presents a case study in which discourse analysis was used to suggest process improvements in two knowledge management services (KMS). It employs linguistic analysis as the means for assessing the degree to which each KMS is aligned to the users' needs. Areas requiring improvement are identified linguistically based on the differences be...
A proposal is presented to create a tagset for linguistic features that constitute the meaningful elements to one particular theory of psy- chotherapeutic intervention, that is Grinder & Bandler's Metamodel. These features include lexical, lexicogrammatical and semantic phenomena. It is proposed that determining the effectiveness of therapy could i...
This paper addresses the problem of deriving distance measures between parent and daughter languages with specific relevance to historical Chinese phonology. The diachronic relationship between the languages is modelled as a Probabilistic Fi- nite State Automaton. The Minimum Mes- sage Length principle is then employed to find the complexity of thi...
Orthographic tries are an efficient way of storing information for strings with shared prefixes. By storing probabilistic occurrence data, orthographic tries have been shown to be well-suited to the task of named entity recognition.
The task of creating a lexical knowledgebase has been defined in our work as extracting appropriate semantic phenomena from what are essentially print dictionaries stored in a desktop publishing format. The aim of this work is to achieve automatic identification of structural elements in the dictionaryy's stream of text that are isomorphic to seman...
Computer programs for the analysis of human behaviour captured in multimedia data format commonly provide mechanisms to describe the behaviour recorded. Yet these programs do not satisfactorily fulfil the need for a description mechanism which allows the production of rich descriptions of behaviour in a flexible way and which facilitates the correc...
The Sydney Language Independent Named Entity Recogniser and Classi er (SLINERC) is a multi-stage system for the recognition and classi cation of named entities. Each stage uses a decision graph learner to combine statistical features with results from prior stages. Earlier stages are focused upon entity recognition, the division of non-entity terms...
The task of creating a lexical knowledgebase has been defined in our work as extracting appropriate se- mantic phenomena from what are es- sentially print dictionaries stored in a desktop publishing format. The aim of this work is to achieve automatic iden- tification of structural elements in the dictionaryy's stream of text that are iso- morphic...
This paper reports the implementation of the AdaBoost algorithm on decision graphs, optimized using the Minimum Message Length Principle. The AdaBoost algorithm, which we call 1-Stage Boosting, is shown to improve the accuracy of decision graphs, along with we another technique which we combine with AdaBoost and call 2-Stage Boosting. which shows t...
Expert commentary of videotaped expertise (EXCOVE) is an approach to knowledge elicitation (KE) where expert performance is videotaped and then these videotapes are used to assist the elicitation of knowledge from these and other experts. It was initially formulated to assist in the understanding of the complex area of psychotherapy. Psychodrama wa...
This paper presents a semantic processing framework that offers a new approach to the traditionally problematic knowledge acquisition bottle-neck. The model presented here elucidates the advantages of adopting an interchangeable modular pipeline design of language engineering systems. We argue that a modular design more readily facilitates the auto...
An architecture for federating heterogeneousdictionary databases is described. It proposes acommon description language and query language toprovide for the exchange of information betweendatabases with different organizations, on differentplatforms and in different DBMSs. The common querylanguage has an SQL like structure. The first versionof the...
Entity-relationship (E-R) models continue to be the most common means of documenting the data requirements of information systems. Whether used as the basis for relational database design or to record organisational conceptual data structures, it is essential that the information content (the semantics) of such models is clearly understood by both...
ERP Systems are now ubiquitous in large businesses and the current move by vendors is to re-package them for small to medium enterprises (SMEs). This migration has many consequences that have to be addressed through understanding the history and evolution of ERP systems and their current architectures. The advantages and disadvantages of the ERP sy...
Anticipating the use of ERP systems among small to medium enterprises (SMEs) to be the future area of growth, ERP vendors such as SAP, Oracle, PeopleSoft, J.D. Edwards and Bann are introducing ERP software that appeals to the market segment of the SMEs. Introduction of the ERP systems for SMEs includes compact packages, flexible pricing policies, n...
ERP systems are now ubiquitous in large businesses and the current move by vendors is to repackage them for small to medium enterprises (SMEs). This migration has many consequences that have to be addressed through understanding the history and evolution of ERP systems and their current architectures. The advantages and disadvantages of the ERP sys...
Anticipating the use of the ERP systems among small-to-medium enterprises (SMEs) to be the future area of growth ERP vendors such as SAP, Oracle, PeopleSoft, JDEdwards and Bann are introducing ERP software that appeal to the market segment of the SMEs. Introduction of the ERP systems for SMEs includes compact packages, flexible pricing policies, ne...
Computer programs for the analysis of human behaviour captured in multimedia data format commonly provide mechanisms to describe the behaviour recorded. Yet these programs do not satisfactorily fulfil the need for a description mechanism which allows the production of rich descriptions of behaviour in a flexible way and which facilitates the correc...
In the past, many methods have been proposed for the inference of probabilistic and non-probabilistic finite state automata from positive examples of their behaviour. In this paper, we introduce a search method guided by the information-theoretic Minimum Message Length principle to infer Probabilistic Finite State Automata (PFSA).1 The method is a...
Entity-relationship models continue to be the most common means of
documenting the data requirements of information systems. Whether used
as the basis for relational database design or to record organisational
conceptual data structures if is essential that the information content
(the semantics) of entity-relational models is clearly understood bo...