Michael F. Lynch

Michael F. Lynch
The University of Sheffield | Sheffield · Department of Information

BSc, PhD

About

121
Publications
1,754
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,105
Citations
Citations since 2016
0 Research Items
115 Citations
201620172018201920202021202205101520
201620172018201920202021202205101520
201620172018201920202021202205101520
201620172018201920202021202205101520

Publications

Publications (121)
Article
Much attention has been paid to translating isolated chemical names into forms such as connection tables, but less effort has been expended in identifying substance names in running text to make them available for processing. The requirement for automatic name identification becomes a more urgent priority today, not the least in light of the inhere...
Article
Methods for automatically isolating and extracting biblio graphic references from the full texts of patents are described and evaluated; these include citations both to patents and to other bibliographic sources. Patents are unusual as citing documents in that citations occur princi pally in the text of the abstracts or description parts of the doc...
Article
One of the main problems involved in the use of free text for indexing and retrieval is the variation in word forms that is likely to be encountered. The most common type of variations are spelling errors, alternative spellings, multi-word concepts, transliteration, affixes and abbreviations. One way to alleviate this problem is to use a conflation...
Article
The problems posed by the requirements for storage and manipulation of generic chemical structure definitions in patents are reviewed. Chemists and patents agents have developed an armory of linguistic devices over many decades so that a generic structure description can describe large and often unlimited numbers of substances as a result of the co...
Article
Results are presented for the atom-level search, called the refined search, for matching components of generic chemical structures. The refined search is the last and most discriminating search strategy used by the Sheffield generic structures system and is performed after the faster screening stages, bit screening, and reduced graph screening. It...
Article
A description is given of an atom-level search strategy for matching components of generic chemical structures, called the refined search. The components used are those represented by nodes of a reduced graph. These nodes describe aggregates of atoms of the original chemical graph which are similar in structure or chemistry. The reduced graph scree...
Article
An evaluation is given of the search strategies used in the screening stages of the Sheffield generic structure system. The results of several searches are presented; a detailed account is given of their performance together with a discussion of outstanding problems and weaknesses. The screening stages are of two types, a fast bitscreening stage us...
Book
A hierarchy of screening methods applicable to full generic structures, including generic radical terms, is described. The most general level of description is fragment screening, which includes atom and bond centred fragments, ‘bubbled-up’ from full generic structures, where the logical relationships between fragments are retained in MUST and POSS...
Article
Methods for storing and retrieving records of individual chemical substances have been available and in use for several decades; dealing with generic chemical structures has called for much greater attention to theory, and much more complex software. The basis for viable solutions is identified and the main systems discussed.
Article
The generation of topological fragments from generically expressed components of generic structures is described. Fragments derived wholly from within a component or partial structure (PS) are termed intra-PS fragments, while those which span partial structures are termed inter-PS fragments. The generation of fragments from specific PS's and the li...
Article
A semiautomatic method for converting to GENSAL those parts of Derwent Publications Ltd. Documentation Abstracts which specify generic structures is reported in this paper and that which follows. Techniques of natural language processing (NLP) applied in a prototype system are discussed. This paper deals with the lexical isolation and categorizatio...
Article
The generation of topological fragments from generic structures for full structure and substructure searching is described; these include fragments from components described either in specific or in generic terms, and those which overlap them. Fragments derived wholly from within partial structures (PS's) are termed intra-PS fragments, while those...
Article
Part 1 of this series described the lexical isolation and categorization of the text tokens of statements describing generic structures in the texts of Documentation Abstracts from Derwent Publications Ltd.;1 this paper describes the syntactic and semantic processing of the tokens with a view to producing the corresponding GENSAL expressions. The s...
Article
A rational basis for discussion of issues relating to the storage and retrieval of generic chemical structures is developed in this paper and those which follow. It rests on well-known logical and linguistic foundations, and seeks to establish a consistent conceptual framework for considering generic structures as they occur in patents and as repre...
Article
Criteria for creating reduced graph representations of full generic structures for screening in full structure and substructure searching are compared; ring/non/ring reduction is identified as the principal criterion. The current form of the Extended Connection Table Representation (ECTR), the internal representation of generic structures in the GE...
Article
This paper continues the establishment of a consistent framework for discussing and treating representations of generic structures described in Part 11 of this series (see preceding paper in the issue). In this part, the nature of search operations and the use of parameter lists representing both specifically and generically described parts within...
Article
This issue of the Journal of Chemical Information and Computer Sciences is dedicated to the memory of George Vladutz, a friend of many in chemical information, who passed away on September 3, 1990.
Conference Paper
The technology for storing and searching large databases of specific chemical substances is well established; public and private systems and services have operated successfully for a decade or more. Attention has now turned to more complex database types — to 3-D chemical structures, to biological macro- molecules, and to generic structures in pate...
Article
Computer support for chemists in visualising complex chemical structures, searching databases and assessing reaction information has expanded - in the variety of available databases and software systems - over the past five years. Such developments have not only provided access to public information but also to integrated in-house data management s...
Article
This paper reports how Zamora's smallest set of smallest rings algorithm has been modified and extended to provide an algorithm that will find the extended set of smallest rings (ESSR) for specific and structurally explicit generic structures. Modifications are necessary to find the ESSR rings within a partial structure connection table, while exte...
Article
There are many unresolved issues concerning the definition of an optimum ring set for retrieval purposes. This paper considers the problems associated with processing planar (two-dimensional) representations of three-dimensional structures. To overcome the ambiguity of such representations, a new ring set is defined in terms of simple faces and cut...
Article
The rings perceived within and across the partial structures of structurally explicit generics are analyzed to produce ring-screen information that complements the existing fragment screens. This paper presents and discusses the format of these ring screens. The resultant bit vectors for each partial structure are accumulated to retain their logica...
Article
Current ring perception algorithms for use on chemical graphs concentrate on processing specific structures. In this review, the various published ring perception algorithms are classified according to the initial ring set obtained, and each algorithm or method of perception is described in detail. The final ring sets obtained are discussed in term...
Chapter
This paper summarises recent work at Sheffield University on the use of parallel computer hardware for the processing of chemical structure databases. The Distributed Array Processor, or DAP, has been used for the clustering of the fragment bit strings representing 2-D molecules (for chemical structure-property applications) and the ranking of outp...
Article
Two chemical substructure searching algorithms, the relaxation algorithm and the set reduction algorithm, are introduced and described. Transputer based serial implementations of both are compared for performance; the relaxation algorithm is shown to be both more effective and more efficient. Strategies are discussed for multi-transputer implementa...
Article
This paper discusses the use of networks of transputers for the matching of the labelled graphs which are used to represent chemical structures in computer-based chemical information systems; in particular, the implementation of a relaxation algorithm for chemical substructure searching is described. Tests with a doubly-linked chain of transputers...
Chapter
The achievements of the project are reviewed briefly. They include: a) the design, implementation and testing of a representation which closely mirrors the characteristics of generic structures in patents, and which enables an internal representation to be created, b) the application of formal grammar theory to some of the problems caused by generi...
Chapter
It is a great honour for me to be invited to give the keynote address at this conference, not least since it gives every appearance of being a worthy successor to the earlier conference held in 1973 here at Noordwijkerhout, which was quite seminal in its effects on R&D in chemical information sciences. It brought together, for the first time, a num...
Article
This paper discusses research into chemical information and document retrieval systems at the Department of Information Studies, University of Sheffield. The research includes the use of cluster analysis methods for document retrieval and drug design, the representation and searching of files of generic chemical structures, substructure searching a...
Article
Reduced chemical graphs for specific chemical substances comprise summary descriptions of the gross structural features of these substances; an example is summarization in terms of only the ring and nonring components, giving a tree structure in which each node is either a cyclic or an acyclic component. Other bases for graph reduction are possible...
Article
This paper discusses research which was carried out at the Department of Information Studies, University of Sheffield in the period 1965 to 1985 into storage and retrieval techniques for databases of textual and chemical structure data. The research includes the development of methods for the auto matic production of printed subject indexes and for...
Article
The provision of graphic retrieval facilities for the generic chemical structures which are especially characteristic of chemical patents call for more than simple extensions to the technology which already exists for searching files of specific chemical substances. A research project in progress at Sheffield University has brought novel insights t...
Article
A relaxation algorithm for chemical substructure search is simulated for implementation on general-purpose multiprocessors. An improved relaxation algorithm is described and the inherent parallelism detailed. The general-purpose simulation package PASSIM is described, and the methods used to simulate the algorithm are given. A variably sized pool o...
Article
The role of generic structures in the chemical knowledge base is described, with particular reference to patents. Operational information systems providing access to generic structures are reviewed, and past and present research leading to improved services is described.
Article
A new method of external distribution sorting called tree partitioning is suggested. It involves the use of a binary tree to split an incoming file into successively smaller partitions until these are small enough to be sorted internally. The tree is generated from a sample of data of the same type as those to be sorted using that tree. The method...
Article
A computer program is described that carries out syntactic and semantic analysis of generic structures encoded in GENSAL, a formal language for the description of such structures, simultaneously generating an extended connection table representation of the structure. Desirable enhancements to the program in the areas of structure diagram input, nom...
Article
Considerations for the use of limited-environment screens for screening generic chemical structures are discussed. The general strategy and detailed procedures for the automatic generation of screens from the extended connection table representation (ECTR) of generic chemical structures are described. A bitscreen record for generic database structu...
Article
An experimental best match retrieval system is described based on the serial file organisation. Documents and queries are characterised by fixed length bit strings and the time-consuming character-by-character term match is preceeded by a bit string search to eliminate large numbers of documents which cannot possibly satisfy the query. Two methods,...
Article
This paper addresses the problems associated with chemical structure searching of the patent literature. The deficiencies of existing services are reviewed, and the results of continuing research aimed at overcoming these problems are presented. These take the form of a systematic structure description language (GENSAL) for generic structure input,...
Article
A data structure for the unambiguous representation of generic structures at the machine level is described. It is designed for automatic generation from structures encoded in the formal language GENSAL and is based on connection tables. Its relationship with other forms of representation is discussed.
Article
Many methods have been suggested for representing text for storage on magnetic media or for transmission down telecommunication channels with fewer bits then are required by a conventional fixed-length character representation. These methods are reviewed, and attention is drawn to the advantages of techniques in which variable-length character stri...
Article
The use of variety generation techniques in the production of author-title search codes for files of monograph records is compared with methods based on division hashing. The latter perform better, and evidence is presented to suggest that the reason for this is the lack of statistical independence between the assignments of variety generation symb...
Article
The strategy of an approach to representing and searching the generic chemical formulas (Markush formulas) typical of chemical patents is outlined. The methods under development involve the following stages: (a) the description of generic chemical expressions by means of a formal language, GENSAL; (b) an approach to the generation and recognition o...
Article
A formal language, GENSAL, is described which is designed for the concise and unambiguous representation of generic structures from chemical patents ("Markush formulas") in a manner which is intelligible to a chemist, yet sufficiently formalized for automatic analysis by a computer. GENSAL contains a number of facilities for showing the alternative...
Article
A simple topological chemical grammar is developed, and its possible applications to the computer manipulation of chemical structures are discussed. The generative and recognitive capabilities of the grammar are illustrated by examples. The paper concludes by identifying the role of such capabilities in a generic (Markush) structure search system.
Article
A topological search code has been found to have high discriminatory power within large sets of disparate structures. The technique has been implemented in a pharmaceutical company's computerized chemical information system, for interactive registration and structure search.
Article
This paper presents a formal linguistic approach to the representation of generic chemical formulae in chemical patents, within the context of use of the ALWIN line-formula notation. The objective of the representation is to permit searches for specific structures and for substructures which are included within the generic expression. The relevance...
Article
The use of variety generation techniques in the production of fixed–length degenerate representations for search terms is compared with methods based on division–hashing. For files of words taken from INSPEC data, the latter perform better, almost certainly because of dependence between assignments of symbol sets. Attempts to overcome the problem p...
Article
A method of sorting large textual data-bases by computer using external storage is proposed. The range of sort-keys in a sample of data to be sorted is divided into a fixed set of partitions, which should also give an adequate representation of new data from a similar source. The partitions are composed of ordered key ranges. An incoming data strea...
Article
The use of variety-generation techniques for text compression depends on the selection of symbol sets, or sets of variable-length character strings occurring approximately equifrequently in the text in question. In order that the method perform efficiently in a variety of situations, the symbol set must be reasonably independent of the particular t...
Article
A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The vocabularies considered are sets of variable-length character strings chosen from the fronts of document and query terms so as to occur with approximate equifrequency. Sets containi...
Article
The use of variety generation for reversible text compression is described briefly, and it is shown how the technique may be applied to compress Wiswesser Line Notations. The notations may be compressed, using 8-bit codes to represent variable-length character strings, to occupy an average of just under 3.6 bits per original character, an improveme...
Article
Two methods of retrieving chemical reaction information are compared. One involves the generation of reaction descriptors automatically by an analysis of the Wiswesser line notation of the reacting molecules. The other, Derwent's Chemical Reaction Documentation Service (CRDS), involves manual indexing and uses a bond-change code to describe the rea...
Article
Variety Generation involves the selection of sets of character strings, or symbols, which are intended to occur with equal probabilities in bodies of text or sets of text units from a particular source. It is important that the sample used to generate the symbol set should be representative of the data with which the set will be used. An assessment...
Article
A method has been developed for the automatic analysis of chemical reactions by a consideration of the changes in the Wiswesser Line Notations of the reacting molecules. The notations are broken down by a multilevel fragmentation process which yields descriptions for all parts of the molecules. The two fragment lists are compared, duplicates elimin...
Article
A topological index has been developed which can discriminate between isomers in a molecular formula group. This index could be used, in combination with the molecular formula, to provide rapid access to those few compounds in a large chemical structure file which must be compared with a query structure at registration.
Article
An approximate structure-matching algorithm is described which rapidly identifies substructures common to the reactants and products of a chemical reaction. The deletion of these features results in the identification of those parts of the reacting molecules that have been changed in the course of the reaction; at the same time it is possible to lo...
Article
A simple categorization of reaction types, based on comparisons of WLN symbol strings, is reported. This permits the development of simple analyses which provide WLN descriptions of 70% of the reactions. A prototype index is described.
Article
The conventional interpretation of Shannon's mathematical theory of communication in relation to textual material is unduly restrictive and unhelpful. A reinterpretation which is based on the definition of new symbol sets, comprising approximately equally-frequent strings of characters, is presented. It is shown to have wide applicability in comput...
Article
The application of the variety-generation technique to the construction of truncated author-title search keys for data bases of monograph records is described. Instead of the usual fixed-length keys (e.g. three characters of the author's surname, and the first three filing characters of the title) the method uses strings of characters which vary in...
Article
Ayers’ recent suggestions for a Universal Standard Book Number, logically generated from a catalogue entry, and therefore applicable restrospectively to bibliographic files, have been implemented and tested on two one-year cumulations of BNB MARC files. The proportion of unique entries provided by the USBN was found to be about 91%. Revisions to th...
Article
Full-text available
Keys consisting of variable-length chamcter strings from the front and rear of surnames, derived by analysis of author names in a particular data base, am used to provide approximate representations of author names. When combined in appropriate ratios, and used together with keys for each of the first two initials of personal names, they provide a...
Article
Using direct access computer files of bibliographic information, an attempt is made to overcome one of the problems often associated with information retrieval, namely, the maintenance and use of large dictionaries, the greater part of which is used only infrequently. A novel method is presented, which maps the hyperbolic frequency distribution of...
Article
Full-text available
Conventional approaches to processing records of linguistic origin for storage and retrieval tend to regard the data as immutable. The data generally exhibit great variety and disparate frequency distributions, which are largely ignored and which entail either the storage of extensive lists of items or the use of complex numerical algorithms such a...
Article
Two analyses of the distributions of representations of chemical compounds in terms of simple structural characteristics have been carried out. The compounds were sampled from the Chemical Abstracts Service Registry System; the structural characteristics consist of a simple hierarchy of bond-centered fragments - simple, augmented, and bonded pairs....
Article
Simple algorithms designed to identify ring changes in records of chemical reactions are described. They operate on the WLN notations of reactant and product molecules sampled from Current Abstracts of Chemistry and Index Chemicus. They enable summaries of ring changes, including formation, cleavage, and interconversion, to be produced, and account...
Article
The application of a variable to fixed-length compression coding technique to two bibliographic data bases (MARC and INSPEC) is described. By appropriate transformation of characters or digrams into bit patterns reflecting more accurately the distributions of characters in the data bases, and application of the encoding process, varying degrees of...
Article
A substructure search screening system based on bond-centered fragments has been evaluated using 108 queries derived from user SDI profiles. The average screenout value obtained was 98.42%. Simple, augmented, and bonded pairs are used as a hierarchy of structural descriptors giving easy coding and good performance for both general and specific quer...
Article
The distributions of bond-centered fragments which form a simple hierarchy have been investigated for a sample of substructure search queries. They are found to conform with the general pattern of fragment incidences in a file which might be used in a substructure search system. This result has been assumed in previous work on the design of screeni...
Article
A major problem in the design of screening systems for substructure searches of chemical structure files is the development of a methodology for selection of an optimal set of structural characteristics to act as screens. The set chosen for a particular application will depend on the characteristics of the collection, as well as on its size and gro...
Article
The distributions of a variety of structural characteristics, including bond-centered, atom-centered, and ring fragments, have been investigated for a specialized file of chemical structures. The results are compared with those obtained for a random sample of the Chemical Abstracts Registry System and are found to reflect the biomedical bias of the...
Article
A high degree of constancy has been found to exist in the microstructure of titles of samples of the INSPEC data-base taken over a 3-year period. Character and digram frequencies are shown to be relatively stable, while variable-length character-strings characterizing samples separated by 3 years in time show close similarities.
Article
A set of programs is being developed for the purpose of producing printed indexes of chemical reactions from a simple reactant/product data base. A program is described which identifies functional group interconversion reactions, hydrogenations, and deny drogenations in a data base containing structures encoded as Wiswesser Line Notations. These re...