ArticlePDF Available

Outcomes from Indexing Initiatives of Medical Imaging DICOM Metadata Repositories. A Secondary Analysis

Authors:

Abstract

To optimize the usage of the enormous amount of data that results from medical imaging studies (e.g. Digital Imaging and Communication in Medicine – DICOM – metadata), which is relevant to characterize healthcare provision, it is pertinent to identify the main advantages and challenges related to the indexing of DICOM metadata for secondary analyses. In the study reported in the present paper, the authors performed a secondary analysis on the results of research studies supported on the indexing of DICOM metadata from Picture Archiving and Communication Systems (PACS) of different healthcare facilities. The analysis was made according to two perspectives: i) advantages of indexing and analyzing DICOM metadata from the PACS of different healthcare facilities; and ii) challenges associated to the indexing and managing of large volumes of DICOM metadata. The research studies being analyzed revealed the potential for the use of DICOM metadata by aggregating and consolidating huge amounts of DICOM metadata to characterize healthcare provision.
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 138 (2018) 203–208
1877-0509 © 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise
Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and
Social Care Information Systems and Technologies.
10.1016/j.procs.2018.10.029
10.1016/j.procs.2018.10.029
© 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientic committee of the CENTERIS - International Conference on
ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference
on Health and Social Care Information Systems and Technologies.
1877-0509
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000000
www.elsevier.com/locate/procedia
1877-0509 © 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise
Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social
Care Information Systems and Technologies.
CENTERIS - International Conference on ENTERprise Information Systems /
ProjMAN - International Conference on Project MANagement / HCist - International
Conference on Health and Social Care Information Systems and Technologies,
CENTERIS/ProjMAN/HCist 2018
Outcomes from Indexing Initiatives of Medical Imaging DICOM
Metadata Repositories. A Secondary Analysis
Milton Santosa,
*
, Nelson Pacheco Rochab
a University of Aveiro, School of Health Sciences / IEETA, Campus Universitário de Santiago, Agras do Crasto, Edificio 30, 3810-193 Aveiro,
Portugal
b University of Aveiro, Medical Sciences Department / IEETA, Campus Universitário de Santiago, Agras do Crasto, Edificio 30, 3810-193
Aveiro, Portugal
Abstract
To optimize the usage of the enormous amount of data that results from medical imaging studies (e.g. Digital Imaging and
Communication in Medicine DICOM metadata), which is relevant to characterize healthcare provision, it is pertinent to
identify the main advantages and challenges related to the indexing of DICOM metadata for secondary analyses. In the study
reported in the present paper, the authors performed a secondary analysis on the results of research studies supported on the
indexing of DICOM metadata from Picture Archiving and Communication Systems (PACS) of different healthcare facilities. The
analysis was made according to two perspectives: i) advantages of indexing and analyzing DICOM metadata from the PACS of
different healthcare facilities; and ii) challenges associated to the indexing and managing of large volumes of DICOM metadata.
The research studies being analyzed revealed the potential for the use of DICOM metadata by aggregating and consolidating
huge amounts of DICOM metadata to characterize healthcare provision.
© 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on
ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International
Conference on Health and Social Care Information Systems and Technologies.
Keywords: DICOM metadata, Big data, Medical imaging, PACS.
* Tel.: +351234401558; fax: +351234370089.
E-mail address: mrs@ua.pt
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000000
www.elsevier.com/locate/procedia
1877-0509 © 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise
Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social
Care Information Systems and Technologies.
CENTERIS - International Conference on ENTERprise Information Systems /
ProjMAN - International Conference on Project MANagement / HCist - International
Conference on Health and Social Care Information Systems and Technologies,
CENTERIS/ProjMAN/HCist 2018
Outcomes from Indexing Initiatives of Medical Imaging DICOM
Metadata Repositories. A Secondary Analysis
Milton Santosa,*, Nelson Pacheco Rochab
a University of Aveiro, School of Health Sciences / IEETA, Campus Universitário de Santiago, Agras do Crasto, Edificio 30, 3810-193 Aveiro,
Portugal
b University of Aveiro, Medical Sciences Department / IEETA, Campus Universitário de Santiago, Agras do Crasto, Edificio 30, 3810-193
Aveiro, Portugal
Abstract
To optimize the usage of the enormous amount of data that results from medical imaging studies (e.g. Digital Imaging and
Communication in Medicine DICOM metadata), which is relevant to characterize healthcare provision, it is pertinent to
identify the main advantages and challenges related to the indexing of DICOM metadata for secondary analyses. In the study
reported in the present paper, the authors performed a secondary analysis on the results of research studies supported on the
indexing of DICOM metadata from Picture Archiving and Communication Systems (PACS) of different healthcare facilities. The
analysis was made according to two perspectives: i) advantages of indexing and analyzing DICOM metadata from the PACS of
different healthcare facilities; and ii) challenges associated to the indexing and managing of large volumes of DICOM metadata.
The research studies being analyzed revealed the potential for the use of DICOM metadata by aggregating and consolidating
huge amounts of DICOM metadata to characterize healthcare provision.
© 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on
ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International
Conference on Health and Social Care Information Systems and Technologies.
Keywords: DICOM metadata, Big data, Medical imaging, PACS.
* Tel.: +351234401558; fax: +351234370089.
E-mail address: mrs@ua.pt
204 Milton Santos et al. / Procedia Computer Science 138 (2018) 203–208
2 Milton Santos et al. / Procedia Computer Science 00 (2018) 000000
1. Introduction
Currently there are many different information systems in clinical environment, as well as multiple imaging
modalities equipment’s from different manufacturers, that produce huge amounts of data [1].
In medical imaging arena, the Digital Imaging and Communication in Medicine (DICOM) has been used in
multiple scenarios [2,3,4] including Content Base Image Retrieve initiatives [5]. On the other hand, DICOM
metadata stored in medical image repositories can be a valuable source of information for professional practice
characterization and can be used in efficiency metrics for productivity assessment [6], or to provide data for different
initiatives as is the case of the development of medical imaging performance indicators [7,8] or biomedical research
[9].
However, the enormous amount of data, as well as the diversity of informational environments, poses some
difficulties to the access, indexing and usage of DICOM metadata [10,11], which turns relevant the development of
tools and methods for the extraction, processing, delivery and analyses of large volumes of DICOM metadata. In this
scenario, the development of tools like Dicoogle [12] can provide important assets.
The objective of the study reported in this paper was to identify the potentialities and challenges associated to the
indexing and analyses of the information contained in the DICOM header belonging to medical imaging studies,
stored in Picture Archiving and Communication Systems (PACS) of different healthcare facilities. For that the
Dicoogle application was used.
2. Background
The interest in the potential of the tremendous amount of information that is produced on a daily basis in the
hospital environment is not new. The reasons for this interest have been changing over time, much due to a constant
healthcare differentiation. Along with the information from Electronic Health Records (EHR), multidimensional data
from genomic, transcriptomic, proteomic, metabolomic, and microbiomic measurements, but also from medical
imaging modalities has been contributing to the Big Data phenomenon [13].
The management of large volumes of information is useful if good practices are adopted when using new
technologies according to quality standards, associated with interoperable storage strategies and promoting the
adoption of advanced computing solutions for data analyses [14].
Regarding the EHR data, these can be used for different purposes. However, the data extraction from EHR
systems to be used in the scope of clinical and translational research is not a trivial process, which has promoted the
development of multiple strategies and technological solutions over time [15].
Some of these solutions were developed to extract data from medical reports, including unstructured information,
often supported in Natural Language Processing (NLP) [16,17]. The use of NLP has been evident in multiple
scenarios namely in the identification of cases of cancer [18] or within medical imaging departments [19]. In
medical imaging environment, NLP can be useful to identify clinical syndromes and common biomedical concepts
related to the clinical situation under investigation, namely through data capture from free-text radiology reports
[20].
It is also possible the image retrieval based on radiology reports [21] as well as store and retrieve preclinical
molecular imaging data stored in PACS [22].
Also in the scope of collecting information stored in PACS, it is possible to index DICOM metadata from
scattered and unstructured files [23]. In this context, Dicoogle [24] is a software framework that enables developers
and researchers to quickly prototype and deploy new functionality taking advantage of the embedded DICOM
services. It allows the realization of two types of data indexing: a hierarchical content indexing of the DICOM
metadata (patient, study, series, image) and a text content indexing (free text query). The results of the indexing
process are stored in a index file, on which the user can perform queries and extract the DICOM metadata of interest.
The development of new methods for access, extraction and analysis of DICOM metadata [12,25], namely of
DICOM metadata belonging to medical imaging studies stored in multiple PACS [26], can be a complementary
strategy to characterize medical imaging department practices, namely the medical imaging procedures [27].
Milton Santos et al. / Procedia Computer Science 138 (2018) 203–208 205
2 Milton Santos et al. / Procedia Computer Science 00 (2018) 000000
1. Introduction
Currently there are many different information systems in clinical environment, as well as multiple imaging
modalities equipment’s from different manufacturers, that produce huge amounts of data [1].
In medical imaging arena, the Digital Imaging and Communication in Medicine (DICOM) has been used in
multiple scenarios [2,3,4] including Content Base Image Retrieve initiatives [5]. On the other hand, DICOM
metadata stored in medical image repositories can be a valuable source of information for professional practice
characterization and can be used in efficiency metrics for productivity assessment [6], or to provide data for different
initiatives as is the case of the development of medical imaging performance indicators [7,8] or biomedical research
[9].
However, the enormous amount of data, as well as the diversity of informational environments, poses some
difficulties to the access, indexing and usage of DICOM metadata [10,11], which turns relevant the development of
tools and methods for the extraction, processing, delivery and analyses of large volumes of DICOM metadata. In this
scenario, the development of tools like Dicoogle [12] can provide important assets.
The objective of the study reported in this paper was to identify the potentialities and challenges associated to the
indexing and analyses of the information contained in the DICOM header belonging to medical imaging studies,
stored in Picture Archiving and Communication Systems (PACS) of different healthcare facilities. For that the
Dicoogle application was used.
2. Background
The interest in the potential of the tremendous amount of information that is produced on a daily basis in the
hospital environment is not new. The reasons for this interest have been changing over time, much due to a constant
healthcare differentiation. Along with the information from Electronic Health Records (EHR), multidimensional data
from genomic, transcriptomic, proteomic, metabolomic, and microbiomic measurements, but also from medical
imaging modalities has been contributing to the Big Data phenomenon [13].
The management of large volumes of information is useful if good practices are adopted when using new
technologies according to quality standards, associated with interoperable storage strategies and promoting the
adoption of advanced computing solutions for data analyses [14].
Regarding the EHR data, these can be used for different purposes. However, the data extraction from EHR
systems to be used in the scope of clinical and translational research is not a trivial process, which has promoted the
development of multiple strategies and technological solutions over time [15].
Some of these solutions were developed to extract data from medical reports, including unstructured information,
often supported in Natural Language Processing (NLP) [16,17]. The use of NLP has been evident in multiple
scenarios namely in the identification of cases of cancer [18] or within medical imaging departments [19]. In
medical imaging environment, NLP can be useful to identify clinical syndromes and common biomedical concepts
related to the clinical situation under investigation, namely through data capture from free-text radiology reports
[20].
It is also possible the image retrieval based on radiology reports [21] as well as store and retrieve preclinical
molecular imaging data stored in PACS [22].
Also in the scope of collecting information stored in PACS, it is possible to index DICOM metadata from
scattered and unstructured files [23]. In this context, Dicoogle [24] is a software framework that enables developers
and researchers to quickly prototype and deploy new functionality taking advantage of the embedded DICOM
services. It allows the realization of two types of data indexing: a hierarchical content indexing of the DICOM
metadata (patient, study, series, image) and a text content indexing (free text query). The results of the indexing
process are stored in a index file, on which the user can perform queries and extract the DICOM metadata of interest.
The development of new methods for access, extraction and analysis of DICOM metadata [12,25], namely of
DICOM metadata belonging to medical imaging studies stored in multiple PACS [26], can be a complementary
strategy to characterize medical imaging department practices, namely the medical imaging procedures [27].
Milton Santos et al. / Procedia Computer Science 00 (2018) 000000 3
3. Materials and Methods
A secondary analysis was performed on the results of a set of research studies supported on the indexing of
DICOM metadata from the PACS of different healthcare facilities, using the Dicoogle application as a DICOM
metadata indexing tool. The analysis was made according to two perspectives: i) advantages of indexing and
analyzing DICOM metadata from PACS; and ii) challenges associated to the indexing and managing of large
volumes of DICOM metadata.
4. Results
Eight research studies reporting the indexing of DICOM metadata from the PACS of four different healthcare
facilities were analyzed [10,11,26,27,28,29,30,31]. The number of healthcare facilities covered by the studies varied
between one [28] and four [11].
The metadata indexing strategies began with the installation of Dicoogle on a personal computer [26,27,28,30] or
on virtual machines belonging to the healthcare facilities [10,11,27,31]. Regarding indexed metadata volume, it
ranged between 250 GB [28] and 34.2 TB [11], which had an impact on indexing time, ranging from 16.5h [28] to
86 days [11].
Some of the studies characterize comparatively the DICOM metadata stored in two PACS [26,30]. Regarding the
amount of data used, the studies were supported in all metadata resulted from the indexing process [10,11,29,26,28],
or in a subset of the initial DICOM metadata sample, which was defined according to the objectives of each study
[28,30,31]. For example, in [31], from an initial sample of 1274927 images, only 26233 images were selected and in
[30], from an initial sample of 69041 patients with 210582 studies and 351248 images, a final sample of 1757
patients was selected and the metadata of 2047 studies and 8087 images were analyzed.
4.1. Advantages Associated to the Indexation and Analysis of DICOM Metadata from PACS
Regarding the possibility of characterizing medical imaging stakeholders and population using DICOM metadata,
it is possible to identify patients and make their characterization by age and gender [27,28,31] in different imaging
modalities [28,30,31] but also to characterize healthcare provision [26,29]. The possibility of identifying different
medical imaging stakeholders can be significant for increasing the effectiveness and efficiency of the procedures.
The exchange of information between doctors and other professionals as well as with the patients presupposes that
they are correctly identified, namely to clarify the clinical situation of triggers the performance of imaging
procedures. On the other hand, the knowledge of the population with studies performed, may be useful for the
identification of the population predisposition to a particular pathology, or clinical situation, according to their age,
gender or clinical history.
Concerning the individual and population exposure characterization, the DICOM metadata indexed from PACS
seems to allow the exposure analysis in multiple scenarios. In a patient-oriented approach, it is possible to
characterize the patient´s radiation exposure history [27] and to identify situations representing inappropriate
exposure [28,30,31]. In fact, knowledge of the history of exposures to ionizing radiation may be useful for the
definition of other strategies that do not involve the enhancement of studies of modalities that do not require ionizing
radiation (e.g., ultrasound or magnetic resonance studies).
In a population-orientation approach, it is possible to characterize population with imaging studies performed in
different modalities [28,26] and to analyze the average number of imaging studies performed on the patients, age
groups and modalities [27,30].Considering the analysis of the material and human resources, it seems to be possible
the identification of the professionals who carry out the imaging studies [29], the identification of the studies carried
out on a specific equipment [26,30], and the identification of the imaging studies carried out on different modalities
and over time [10,26,28,30], making possible the professional practice analysis and being able to contribute to
human and material resources usage optimization. In terms of the analysis of performance and optimization, it seems
possible to identify contributions for the improvement of professional practice, regarding the optimization of
resource utilization [25] [15], procedures [28,30,31] and individual and population radiation protection [27].
206 Milton Santos et al. / Procedia Computer Science 138 (2018) 203–208
4 Milton Santos et al. / Procedia Computer Science 00 (2018) 000000
On the other hand, the analysis of the DICOM metadata can contribute to a better knowledge of the information
acquisition and storage needs, which can contribute to a better PACS information management, as well as a better
knowledge of all the infrastructure requirements to support the storage of the enormous amount of information
produced by medical imaging modalities in daily practice [10], particularly in the context of modalities such as
computed tomography and magnetic resonance imaging.
4.2. Challenges Associated with Indexing and Managing Large Volumes of Metadata
The use of Dicoogle in a clinical setting does not appear to promote limitations in PACS usage during daily
activity. At two healthcare facilities, the indexing of metadata was achieved without significant constraints, even in
situations where the PACS supplier gave no cooperation [25]. This scenario has demonstrated in a paradigmatic way
the potentialities and advantages of indexing metadata for DICOM data mining initiatives. The possibility of
accessing DICOM metadata without resorting to the equipment and information systems manufacturers, and without
altering the stored data, can be an advantage for the analysis of the metadata stored in different PACS, regardless of
their geographic location.
In addition, the existence of information systems and equipment from multiple manufacturers did not cause
constraints on access and indexing of metadata [28]. Depending on the included research studies, large amounts of
metadata were indexed, from 265 GB [28] to 34.2 TB [11]. As a result of the indexing of PACS with different
dimensions, large index files [10,11], associated with indexing times from 16,5h [28] to 86 to 86 days, were
produced [11]. However, the adoption of efficient storage, merging, and information extraction strategies mitigated
the computational and storage limitations associated with the management of large amounts of data.
In some of the included research studies, it was necessary to merge the information that results from the indexing
of the metadata stored on different disks that are part of the PACS [10,11,26], namely using a spinoff application of
the Dicoogle, which allowed the metadata aggregation and normalization [11].
Regarding the quality of the metadata, they seem to be very dependent on the DICOM attributes available in
different medical imaging modalities [10], as well as their percentage of use [10,28,29]. For example, the number of
DICOM attributes publicly available for the characterization of medical imaging stakeholders (e.g. institution,
manufacturer, or healthcare professionals) can range from 20 in Computed Radiography to ten in the Radio
Fluoroscopy modality [29]. This fact may become limiting to the need for healthcare delivery, namely in the
communication between different stakeholders (e.g., between different physicians and health professionals and
patients) as well as to define the best imaging strategy to clarify the clinical problem to be resolved.
On the other hand, some DICOM attributes although used, have unexpected values, namely related to the Patient
ID and Referring Physician Name DICOM attributes [28,29], or related to the Institution Name DICOM attribute.
Therefore, the DICOM attributes that characterize the patients, or the healthcare professionals have different
completion levels [10,29].
5. Conclusion
The research studies analyzed revealed the potential for the use of DICOM metadata, but also challenges
associated with indexing and managing large volumes of metadata, namely identification of medical imaging
stakeholders, computational needs and information management, quality of DICOM metadata, analyses of individual
and population exposure and the use of human and material resources.
Furthermore, it was possible to adopt efficient mechanisms to acquire and process the DICOM metadata without
impact in the performance of the PACS during daily activities. However, there are constrains in terms of the quality
of the available DICOM metadata.
The possibility of acquiring information on imaging studies carried out on a population, in a way that does not
constrain the provision of health care, can be a valuable asset for the medical imaging practice continuous
improvement. On the other hand, the possibility of DICOM metadata statistical analysis makes its use feasible and
potentially advantageous in multiple scenarios, such as in the definition of patient-centered care strategies, but also
in medical translational and multidimensional research.
Milton Santos et al. / Procedia Computer Science 138 (2018) 203–208 207
Milton Santos et al. / Procedia Computer Science 00 (2018) 000000 5
The studies included in the analysis reported in this paper, do not provide evidence of use of the indexation of
metadata embedded in DICOM Structured Reports (DICOM SR). This specific question will be object of future
research.
References
[1] Kansagra, Akash P., John Paul J Yu, Arindam R. Chatterjee, Leon Lenchik, Daniel S. Chow, Adam B. Prater, and others. (2016) ‘Big Data
and the Future of Radiology Informatics’, Academic Radiology, 23: 3042.
[2] Marques Godinho, Tiago, Rui Lebre, Luís Bastião Silva, and Carlos Costa. (2017) ‘An Efficient Architecture to Support Digital Pathology in
Standard Medical Imaging Repositories’, Journal of Biomedical Informatics, 71:19097.
[3] Singh, Rajendra, Lauren Chubb, Liron Pantanowitz, and Anil Parwani. (2011) ‘Standardization in Digital Pathology: Supplement 145 of the
DICOM Standards.’, Journal of Pathology Informatics, 2:23.
[4] Pandit, Ravi R., and Michael V. Boland. (2015) ‘Impact of Digital Imaging and Communications in Medicine Workflow on the Integration of
Patient Demographics and Ophthalmic Test Data’, Ophthalmology, 122:22732.
[5] Akgül, Ceyhun Burak, Daniel L. Rubin, Sandy Napel, Christopher F. Beaulieu, Hayit Greenspan, and Burak Acar. (2011) ‘Content-Based
Image Retrieval in Radiology: Current Status and Future Directions’, Journal of Digital Imaging, 24:20822.
[6] Hu, Mengqi, William Pavlicek, Patrick T Liu, Muhong Zhang, Steve G Langer, Shanshan Wang, and others. (2011) ‘Informatics in
Radiology: Efficiency Metrics for Imaging Device Productivity’, Radiographics, 31: 60316.
[7] Rubin, Daniel L. (2011) ‘Informatics in Radiology: Measuring and Improving Quality in Radiology: Meeting the Challenge with Informatics.’,
Radiographics. 31: 151127.
[8] Prieto, C., E. Vano, J. I. Ten, J. M. Fernandez, A. I. Iñiguez, N. Arevalo, and others (2009) ‘Image Retake Analysis in Digital Radiography
Using DICOM Header Information’, Journal of Digital Imaging, 22: 39399.
[9] Freymann, John, Justin Kirby, John Perry, David Clunie, and C Jaffe. (2012) Image Data Sharing for Biomedical ResearchMeeting
HIPAA Requirements for De-Identification’, Journal of Digital Imaging, 25: 1424.
[10] Santos, Milton, Luis Bastião, Nuno Neves, Dulce Francisco, Augusto Silva, and Nelson Pacheco Rocha. (2015) ‘DICOM Metadata Access,
Consolidation and Usage in Radiology Department Performance Analysis. A Non-Proprietary Approach’, Procedia Computer Science, 64:
65158.
[11] Santos, Milton, João Pavão, Tiago Godinho, and Nelson Rocha. (2017) ‘DICOM Metadata Aggregation from Multiple Healhtcare Facilities’,
in ENBENG 2017 - 5th Portuguese Meeting on Bioengineering, Proceedings.
[12] Valente, Frederico, Luís A Bastião Silva, Tiago Marques Godinho, and Carlos Costa. (2016) ‘Anatomy of an Extensible Open Source
PACS’, Journal of Digital Imaging, 29: 28496.
[13] Meldolesi, Elisa, Johan van Soest, Andrea Damiani, Andre Dekker, Anna Rita Alitto, Maura Campitelli. and others. (2015) ‘Standardized
Data Collection to Build Prediction Models in Oncology: A Prototype for Rectal Cancer’, Future Oncology, 12:11936.
[14] Auffray, Charles, Rudi Balling, Inês Barroso, László Bencze, Mikael Benson, Jay Bergeron, and others. (2016) ‘Making Sense of Big Data
in Health Research: Towards an EU Action Plan’, Genome Medicine, 8:71.
[15] Wang, Yanshan, Liwei Wang, Majid Rastegar-Mojarad, Sungrim Moon, Feichen Shen, Naveed Afzal, and others. (2018) ‘Clinical
Information Extraction Applications: A Literature Review’, Journal of Biomedical Informatics, 77:3449.
[16] Dreyer, Keith J, Mannudeep K Kalra, Michael M Maher, Autumn M Hurier, Benjamin A Asfaw, Thomas Schultz, and others. (2005)
‘Application of Recently Developed Computer Algorithm for Automatic Classification of Unstructured Radiology Reports: Validation Study’,
Radiology, 234:32329.
[17] Lacson, Ronilda, Katherine P Andriole, Luciano M Prevedello, and Ramin Khorasani. (2012) ‘Information from Searching Content with an
Ontology-Utilizing Toolkit (ISCOUT)’, Journal of Digital Imaging, 25: 51219.
[18] Yetisgen M, Harris WP, Kwan SW. (2016) ‘Natural Language Processing in Oncology: A Review’, JAMA Oncol., 2: 797804.
[19] Pons, Ewoud, Loes M M Braun, M G Myriam Hunink, and Jan A Kors. (2016) ‘Natural Language Processing in Radiology: A Systematic
Review’, Radiology, 279:32943.
[20] Flynm, Robert, Thomas M. Macdonald, Nicola Schembri, Gordon D. Murray, Alexander S. F. Doney. (2010) ‘Automated Data Capture
from Freetext Radiology Reports to Enhance Accuracy of Hospital Inpatient Stroke Codes’, Pharmacoepidemiology and Drug Safety, 19:
84347.
[21] Gerstmair, Axel, Philipp Daumke, Kai Simon, Mathias Langer, and Elmar Kotter. (2012) ‘Intelligent Image Retrieval Based on Radiology
Reports’, European Radiology, 22:275058.
[22] Lee, Jasper, Yue Liu, and Brent Liu. (2011) A Solution for Archiving and Retrieving Preclinical Molecular Imaging Data in PACS Using a
DICOM Gateway, Proc. of SPIE.
[23] Costa, Carlos, Filipe Freitas, Marco Pereira, Augusto Silva, and José Oliveira. (2009) ‘Indexing and Retrieving DICOM Data in Disperse
and Unstructured Archives’, International Journal of Computer Assisted Radiology and Surgery, 4: 7177.
208 Milton Santos et al. / Procedia Computer Science 138 (2018) 203–208
6 Milton Santos et al. / Procedia Computer Science 00 (2018) 000000
[24] Costa, Carlos, Carlos Ferreira, Luís Bastião, Luís Ribeiro, Augusto Silva, and JoséLuís Oliveira. (2011) ‘Dicoogle - an Open Source Peer-to-
Peer PACS’, Journal of Digital Imaging, 24:84856.
[25] Wang, Shanshan, William Pavlicek, Catherine C. Roberts, Steve G. Langer, Muhong Zhang, Mengqi Hu, and others. (2011) ‘An Automated
DICOM Database Capable of Arbitrary Data Mining (Including Radiation Dose Indicators) for Quality Monitoring’, Journal of Digital
Imaging, 24:22333
[26] Santos, Milton, Silvia de Francesco, Luis Bastião, Augusto Silva, Carlos Costa, and N. Rocha. (2013) ‘Multi Vendor DICOM Metadata
Access: A Multi-Site Hospital Approach Using Dicoogle’, in Information Systems and Technologies (CISTI), 2013 8th Iberian Conference
On, pp. 17
[27] Santos, Milton, Luis Bastião, Augusto Silva, and Nelson Rocha. (2016) ‘DICOM Metadata Analysis for Population Characterization: A
Feasibility Study’, Procedia Computer Science, 100:35561.
[28] Santos, Milton, Luis Bastião, Carlos Costa, Augusto Silva, and Nelson Rocha. (2011) ‘DICOM and Clinical Data Mining in a Small Hospital
PACS: A Pilot Study’, in Communications in Computer and Information Science, ed. by MariaManuela Cruz-Cunha, João Varajão, Philip
Powell, and Ricardo Martinho, Springer Berlin Heidelberg, 221:25463.
[29] Santos, Milton, Augusto Silva, and Nelson Rocha. (2017) ‘Characterization of the Stakeholders of Medical Imaging Based on an Image
Repository’, in Recent Advances in Information Systems and Technologies: Volume 2, ed. by Álvaro Rocha, Ana Maria Correia, Hojjat
Adeli, Luís Paulo Reis, and Sandra Costanzo. Springer International Publishing. DLXX, 80514.
[30] Santos, Milton, Pedro Couto, Nelson Rocha and Augusto Silva.(2014) ‘DICOM Metadata-Mining in PACS for Computed Radiography X-
Ray Exposure Analysis. A Mammography Multisite Study’, in European Congress of Radiology, Insights Imaging, p. B : Scient.
[31] Santos, Milton, Luis Bastião, Nelson Rocha, and Augusto Silva. (2017) ‘A Multi-Site Head CT Topogram Acquisition Protocol Analysis
Based on DICOM Metadata”, in European Congress of Radiology, Insights Imaging, p. B : Scient.
Article
The rapid development of Internet and various mobile communication media initiate the demands for access to medical image visualization systems. Medical image reading and interpretation at any time, any place and any device become an urgent need for radiologists. The current medical image online visualization methods have disadvantages in computing and storage resource restrained environments. This study presents a novel framework of medical image online visualization based on shadow proxy, which makes applications have across platform ability and universal environmental adaptability especial for devices with restricted running resources. The framework can be adapted in multiple client architectures including the pure web applications, mobile applications or regular desktop applications. It is easy to be integrated into third party software and there are no restrictions of the communication protocols between the client and server side due to two innovations of the framework that are shadow proxy mechanism and shadow data. The shadow proxy just does lightweight tasks on shadow data and the ultimate processing of computing tasks are moved to the server side to complete. The size of shadow data is small enough for shadow proxy that speeds up local display and processing tasks. Finally, the framework takes advantage of high performance on server side to render high quality image results. The performance of proposed work is evaluated in a web based medical image visualization system, and the results show that the framework in this paper allows the system to have smooth and quasi-real-time interaction performance. Therefore, this study ensures the local client operations fluency and fast while the quality of the visualization is still not lost that gives the best user experience.
Article
Full-text available
Background With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. Objectives In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. Methods A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. Results A total of 1,917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications (including disease areas, drug-related studies, and clinical workflow optimizations). Conclusions Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Article
Full-text available
In clinical environments the information resulting from the provision of healthcare is increasingly used to improve the delivery of healthcare. In the radiology context, the data analysis used to characterize the population that accesses different radiology departments is often supported by software applications from different manufacturers, which makes data integration very difficult. And it makes very difficult to characterize, in a centralized manner, the studies performed on each patient. In this context, is there a way to perform population characterization and patient centered studies by analyzing the DICOM metadata stored on Picture Archiving and Communication Systems (PACS) from different healthcare facilities?
Article
Full-text available
Medicine and healthcare are undergoing profound changes. Whole-genome sequencing and high-resolution imaging technologies are key drivers of this rapid and crucial transformation. Technological innovation combined with automation and miniaturization has triggered an explosion in data production that will soon reach exabyte proportions. How are we going to deal with this exponential increase in data production? The potential of " big data " for improving health is enormous but, at the same time, we face a wide range of challenges to overcome urgently. Europe is very proud of its cultural diversity; however, exploitation of the data made available through advances in genomic medicine, imaging, and a wide range of mobile health applications or connected devices is hampered by numerous historical, technical, legal, and political barriers. European health systems and databases are diverse and fragmented. There is a lack of harmonization of data formats, processing, analysis, and data transfer, which leads to incompatibilities and lost opportunities. Legal frameworks for data sharing are evolving. Clinicians, researchers, and citizens need improved methods, tools, and training to generate, analyze, and query data effectively. Addressing these barriers will contribute to creating the European Single Market for health, which will improve health and healthcare for all Europeans.
Article
Full-text available
Importance: Natural language processing (NLP) has the potential to accelerate translation of cancer treatments from the laboratory to the clinic and will be a powerful tool in the era of personalized medicine. This technology can harvest important clinical variables trapped in the free-text narratives within electronic medical records. Observations: Natural language processing can be used as a tool for oncological evidence-based research and quality improvement. Oncologists interested in applying NLP for clinical research can play pivotal roles in building NLP systems and, in doing so, contribute to both oncological and clinical NLP research. Herein, we provide an introduction to NLP and its potential applications in oncology, a description of specific tools available, and a review on the state of the current technology with respect to cancer case identification, staging, and outcomes quantification. Conclusions and relevance: More automated means of leveraging unstructured data from daily clinical practice is crucial as therapeutic options and access to individual-level health information increase. Research-minded oncologists may push the avenues of evidence-based research by taking advantage of the new technologies available with clinical NLP. As continued progress is made with applying NLP toward oncological research, incremental gains will lead to large impacts, building a cost-effective infrastructure for advancing cancer care.
Article
Full-text available
The advances in diagnostic and treatment technology are responsible for a remarkable transformation in the internal medicine concept with the establishment of a new idea of personalized medicine. Inter- and intra-patient tumor heterogeneity and the clinical outcome and/or treatment's toxicity's complexity, justify the effort to develop predictive models from decision support systems. However, the number of evaluated variables coming from multiple disciplines: oncology, computer science, bioinformatics, statistics, genomics, imaging, among others could be very large thus making traditional statistical analysis difficult to exploit. Automated data-mining processes and machine learning approaches can be a solution to organize the massive amount of data, trying to unravel important interaction. The purpose of this paper is to describe the strategy to collect and analyze data properly for decision support and introduce the concept of an 'umbrella protocol' within the framework of 'rapid learning healthcare'.
Article
In the past decade, digital pathology and whole-slide imaging (WSI) have been gaining momentum with the proliferation of digital scanners from different manufacturers. The literature reports significant advantages associated with the adoption of digital images in pathology, namely, improvements in diagnostic accuracy and better support for telepathology. Moreover, it also offers new clinical and research applications. However, numerous barriers have been slowing the adoption of WSI, among which the most important are performance issues associated with storage and distribution of huge volumes of data, and lack of interoperability with other hospital information systems, most notably Picture Archive and Communications Systems (PACS) based on the DICOM standard.
Conference Paper
Medical imaging procedures generate an enormous amount of information that is stored in multiple storage units by Picture Archiving and Communication Systems (PACS). Using Dicoogle and Digital Imaging and Communication in Medicine (DICOM) metadata the authors developed a merging and reconciliation procedure to create a seamlessly unique PACS view. This procedure was applied to the metadata held by 19 storage units of a PACS of a Portuguese hospital centre composed by three branches. In total, more than 36 million images, from more than ten modalities were collected. The merging and reconciliation procedure have proven to be robust and efficient. This shows that it is possible to merge the metadata collected from multiple PACS into a single unified view, preserving the relation between patients, studies, and images, which is essential for large-scale analyses of medical imaging repositories from multiple institutions.
Conference Paper
Background: The optimization of the performance of medical imaging departments should be supported by the quality of the available information, namely the characterization of the involved stakeholders. Objective: The study reported in this paper aimed to assess the quality of Digital Imaging and Communication in Medicine (DICOM) metadata related to the characterization of the stakeholders of medical imaging, particularly in terms of the availability of the attributes and their usage trends. Methods: The authors analysed the DICOM metadata related to all the imaging studies carried out by all medical imaging modalities and stored in the PACS of a medium size hospital during one year period (i.e. 5153870 images, corresponding to 97612 studies performed on 61256 patients). Results: It was identified a considerable variation in terms of the number of DICOM attributes available for each medical imaging modality (between 10 and 20), and how these attributes are used to characterize medical imaging stakeholders. Conclusion: A better use of DICOM metadata can be achieved, namely to characterize the stakeholders of medical imaging and to promote the communication between them.
Article
Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. http://pubs.rsna.org/doi/abs/10.1148/radiol.16142770 © RSNA, 2016