Conference PaperPDF Available

A large-scale solar image dataset with labeled event regions

September 2013

September 2013

DOI:10.1109/ICIP.2013.6738896

Conference: International Conference on Image Processing (ICIP) (2013)

Authors:

Michael Schuh

Georgia State University

Rafal A. Angryk

Georgia State University

Karthik Ganesan Pillai

Cerner Corporation

Juan M. Banda

Stanford University

Show all 5 authorsHide

This paper introduces a new public benchmark dataset of solar image data from the Solar Dynamics Observatory (SDO) mission. This is the first release, which contains over 15,000 images and nearly 24,000 solar events, spanning the first six months of 2012. It combines region-based event labels from six automated detection modules, ten pre-computed image parameters for each cell over a grid-based segmentation of the full resolution images, and a lower resolution version of the images for further analysis and visualization. Together, these components serve as a standardized, ready-to-use, solar image dataset for general image processing research, without requiring the necessary background knowledge to properly prepare it. We present here the fundamental dataset creation details and outline future improvements and opportunities as data collection continues for the coming years.

. An example SDO image with HEK labeled event regions.

…

. Example heatmap plots of an extracted image (in 64 × 64 cells) for each of our ten image parameters.

…

. Frequency of reports for all six event types over 90 days.

…

Figures - uploaded by Juan M. Banda

Content may be subject to copyright.

Content uploaded by Juan M. Banda

Content may be subject to copyright.

A LARGE-SCALE SOLAR IMAGE DATASET WITH LABELED EVENT REGIONS

Michael A. Schuh

†

Rafal A. Angryk

†

Karthik Ganesan Pillai

†

Juan M. Banda

†

Petrus C. Martens

‡⋆

†

Dept. Computer Science, Montana State University, Bozeman, MT 59717, USA

‡

Dept. Physics, Montana State University, Bozeman, MT 59717, USA

⋆

Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138, USA

ABSTRACT

This paper introduces a new public benchmark dataset of so-

lar image data from the Solar Dynamics Observatory (SDO)

mission. This is the ﬁrst release, which contains over 15,000

images and nearly 24,000 solar events, spanning the ﬁrst six

months of 2012. It combines region-based event labels from

six automated detection modules, ten pre-computed image

parameters for each cell over a grid-based segmentation of

the full resolution images, and a lower resolution version of

the images for further analysis and visualization. Together,

these components serve as a standardized, ready-to-use, solar

image dataset for general image processing research, without

requiring the necessary background knowledge to properly

prepare it. We present here the fundamental dataset creation

details and outline future improvements and opportunities as

data collection continues for the coming years.

Index Terms— computer vision, image processing, data

mining, machine learning, dataset benchmark

1. INTRODUCTION

The Solar Dynamics Observatory (SDO) mission has ush-

ered in the era of big data for solar physics. Capturing over

70,000 high-resolution images of the Sun per day, NASA’s

SDO mission will produce more data than all previous solar

data archives combined [1]. This overwhelming amount of

data is impossible to analyze manually, image-by-image, as

was common practice in earlier years. Through necessity, au-

tomated analysis is becoming the norm, utlizing algorithms

from computer vision, image processing, machine learning,

and more. As a result, large-scale solar data analysis is fast

emerging as a novel interdisciplinary research area with am-

ple opportunities for a wide variety of research interests.

The dataset presented here

combines the metadata of sev-

eral automated analysis modules that run continously in a

dedicated data pipeline. We use these metadata catalogs to

prune the massive data archive to a more useable form for re-

searchers interested in a hassle-free scientiﬁc image dataset

intended for single-image, region-based, event recognition.

Available at http://dmlab.cs.montana.edu/solar/data/

Fig. 1. An example SDO image with HEK labeled event regions.

The ﬁrst release covers a six month period from January 1,

2012 to July 1, 2012, and contains over 15,000 images in two

separate (but visually similar) wavebands and almost 24,000

event instances of six different event types. An example im-

age with matching event instances is shown in Figure 1.

Along with public access and free use, a benchmark

dataset such as this can offer value to potential researchers.

Given the complexity of the image data and event labels,

this brand new real-life dataset presents many opportunities

(and challenges) for new knowledge discovery from big data

in solar physics. We also pose several initial questions to

explore with the dataset, motivating research in classiﬁca-

tion and clustering, image similarity indexing and retrieval,

region-based object detection, and frequent pattern discovery.

It is our hope to establish an interesting, reputable, and

freely-available dataset of practical use for the community.

As data continues to be collected and analyzed over the com-

ing years, we plan to release updated and expanded versions

and alternative variations of the growing dataset based on re-

search directions and community feedback.

2. BACKGROUND

Large-scale image datasets are steadily growing in availabil-

ity and variety thanks to modern technology. This can be seen

in all facets of life, from a personal level, with the popular

crowd-sourced ﬂickr dataset, to a national level, such as satel-

lite imagery, and sometimes even on an international level,

such as the SDO mission upon which our dataset is founded.

Benchmark datasets are crucially important to allow unbi-

ased comparisons of independent research and development

efforts across entire communities. They standardize data that

is often otherwise in-exactly reproduced, causing unfavorable

variabilities in published results – at best taking up valuable

space in publications, and at worst altogether undocumented.

Increasingly, novel research is encouraged to present on

highly popular datasets supported by the community, which

further and easily validates the novelty of results and claims.

Exemplary datasets have hundreds of academic citations,

such as the medical ImageCLEF datasets [2] and the natural

scene PASCAL Visual Object Classes (VOC) datasets [3].

All of these datasets offer unique characteristics regarding

the source and context of the images and labels, and this solar

image dataset is no exception.

Launched on February 11, 2010, the SDO mission is the

ﬁrst mission of NASA’s Living With a Star (LWS) program,

a long term project dedicated to studying aspects of the Sun

that signiﬁcantly affect human life, with the goal of eventu-

ally developing a scientiﬁc understanding sufﬁcient for pre-

diction [4]. The SDO is a 3-axis stabilized spacecraft in geo-

synchronous orbit designed to continuously capture full-disk

images of the Sun [5]. It contains three independent instru-

ments (AIA, HMI, and EVE), but our dataset is currently only

from the Atmospheric Imaging Assembly (AIA), which cap-

tures images in ten separate wavebands across the ultra-violet

and extreme ultra-violet spectrum, selected to highlight spe-

ciﬁc elements of solar activity [6].

An international consortium of independent groups,

named the SDO Feature Finding Team (FFT), was selected by

NASA to produce a comprehensive set of automated feature

recognition modules [1]. The SDO FFT modules

operate

through the SDO Event Detection System (EDS) at the Joint

Science Operations Center (JSOC) of Stanford and Lockheed

Martin Solar and Astrophysics Laboratory (LMSAL), as well

as the Harvard-Smithsonian Center for Astrophysics (CfA),

and NASA’s Goddard Space Flight Center (GSFC). Some

modules are provided with specialized access to the raw data

pipeline for stream-like data analysis and event detection.

Even though data is made publicly accessible in a timely

fashion, because of the overall size, only a small window of

data is available for on-demand access, while tapes provide

long-term archival storage.

As one of the 16 SDO FFT modules, our interdisciplinary

research group at Montana State University(MSU)isbuilding

http://solar.physics.montana.edu/sol_phys/fft/

Label Name Equation

P1 Entropy E = −

L−1

i=0

p(z

) log

p(z

)

P2 Mean m =

L−1

i=0

P3 Std. Deviation σ =

L−1

i=0

− m)

P4 Fractal Dim. D

= lim

e→0

log N(ǫ)

log

P5 Skewness µ

L−1

i=0

− m)

p(z

)

P6 Kurtosis µ

L−1

i=0

− m)

p(z

)

P7 Uniformity U =

L−1

i=0

)

P8 Rel. Smoothness R = 1 −

1+σ

(z)

P9 T. Contrast *see Tamura [7]

P10 T. Directionality *see Tamura [7]

Table 1. Image Parameters, where L stands for the number of pix-

els in the cell, z

is the i-th pixel value, m is the mean, and p(z

) is

the grayscale histogram representation of z at i. The fractal dimen-

sion is calculated based on the box-counting method where N(e) is

the number of boxes of side length e required to cover the image cell.

a “Trainable Module” for use in the ﬁrst ever Content-Based

Image Retrieval (CBIR) system for solar images – now online

and publicly available

. We operate at a static six minute ca-

dence in the data pipeline on all 10 AIA wavelengths. Each

4096 × 4096 pixel image is segmented by a ﬁxed-size grid,

independent of any dynamic characteristics of the speciﬁc

image due to long-term, real-time, stream-processing con-

straints. The 64 × 64 grid creates 4096 cells per image and

our 10 image parameters (listed in Table 1) are calculated for

each cell. This results in roughly 240 images (nearly one mil-

lion image cells with 10 parameter values each) per 10 waves

per day, or over 800,000 images per year.

In previous work, we evaluated a variety of possible image

parameters to extract from the solar images. Given the vol-

ume and velocity of the data stream, the best ten parameters

were chosen based on not only their classiﬁcation accuracy,

but also their processing time [8, 9]. Preliminary event clas-

siﬁcation was performed on a limited set of human-labeled

partial-disk images from the TRACE mission [10] to deter-

mine which image parameters best represented the phenom-

ena [11, 12]. A later investigation of solar ﬁlament classiﬁca-

tion in H-alpha images from the Big Bear Solar Observatory

(BBSO) showed similar success, even with noisy region la-

bels and a small subset of our ten image parameters [13].

3. THE DATA

Here we discuss in detail the steps taken to create this dataset.

While this is beneﬁcial for reproducibility, our intent is more

so to provide enough assurances to the researcher in our data

curation methodologies and decisions. This also provides the

reader practical knowledge of working with the several un-

derlying large-scale data repositories.

http://cbsir.cs.montana.edu/sdocbir

Fig. 2. Example heatmap plots of an extracted image (in 64 × 64 cells) for each of our ten image parameters.

3.1. Collection

All data comes from SDO FFT modules, either already

available in-house at MSU or through the Heliophysics

Event Knowledgebase (HEK), which is a centralized archive

of metadata accessible online [14]. The HEK is an all-

encompassing, cross-mission metadata repository of solar

event reports and related information. This metadata can be

downloaded manually through the ofﬁcial web interface

but after ﬁnding several limitations towards large-scale event

retrieval, we instead developed our own open source and pub-

licly available software application named “Query HEK”, or

simply QHEK

We retrieved only event reports fromautomated SDO FFT

modules for six types of solar events: active region (AR),

coronal hole (CH), ﬁlament (FI), ﬂare (FL), sigmoid (SG),

and sunspot (SS). These speciﬁc events were chosen for sev-

eral reasons. From a science perspective, all of these events

are identiﬁable in static images, without requiring spatial or

temporal features. These events are also, generally speaking,

traditionally well-studied and frequently occuring – at least

enough so that a dedicated module was created for the sole

purpose of detecting such solar phenomena. From a practical

perspective, this meant a larger possible set of reported event

instances from well-known and well-performing automated

modules that never waiver or tire, unlike graduate students.

A summary of the event types can be found in Table 2,

which states the number of eventinstances and reported wave-

band. The dataset contains images in 131

A and 193

A wave-

bands that match any unique event timestamp. So the 23,517

total events represent 47,034 labels (each applied twice), but

only 17,785 are true labels – AR, CH, FL, and SG events in

http://www.lmsal.com/isolsearch

Available at http://dmlab.cs.montana.edu/qhek

their given wave. Note that FL and SG do not include event

outlines, or chain codes (CC), and that FI and SS are reported

from entirely different instrumentation. These events are in-

cluded because of their abundance and importance to solar

physics, and for the potential of novel knowledge discovery

from data. Future dataset releases will likely include images

from other wavebands and instruments.

Event Name Reported CC Instances

AR Active Region 193

A Yes 7108

CH Coronal Hole 193

A Yes 4702

FI Filament H-alpha Yes 4218

FL Flare 131

A No 4316

SG Sigmoid 131

A No 1659

SS Sunspot HMI Yes 1514

Table 2. A summary of the different event types in the dataset,

where CC denotes having a detailed event boundary outline.

3.2. Transformation

We ﬁrst had to standardize attributes across all event types

due to independent reporting styles, such as the wavelength

attribute values. For simplicity, we then discarded event in-

stances reported in other waves, retaining the majority of total

instances, but only using the two most popular waves.

Three spatial attributes deﬁne the event location on the

solar disk, with the center point and minimum bounding rect-

angle (MBR) required. These attributes are given as geomet-

ric object strings, encapsulated by the words ”POINT()” and

”POLYGON()”, where point contains a single (x, y) pair of

pixel coordinates and polygon contains any number of pairs

listed sequentially, e.g., (x

, y

, x

, y

, ..., x

, y

). When the

polygon is used for the MBR attribute, it always contains ﬁve

vertices, where the ﬁrst and last are identical, while a polygo-

nal chain code can contain any number of points. We convert

all spatial attributes from the helioprojective cartesian (HPC)

coordinate system to pixel-based coordinates based on image-

speciﬁc metadata [15, 16]. This process removes the need for

any further spatial positioning transformations.

For each event instance, we ﬁnd the midpoint of the

event’s duration and round to the nearest minute. We then

record all events that cover each unique time, and ﬁnd the

nearest image for each wave. This is combined to form a list

of events for each unique image in each wave. We note that

this is a many-to-many relationship, i.e., an image may have

many assocationed events, and an event may be associated

to many images. This means a single event instance might

derive more than just two labels (one per wave as previously

stated). Also, because we round times to the nearest minute,

we buffer the event durations by ±2 minutes so instantaneous

events are not lost when matching times.

Fig. 3. Frequency of reports for all six event types over 90 days.

3.3. Formats

The dataset is released as raw text and image ﬁles. All events

can be found in the events.csv ﬁle with the ﬁrst line as the

column headers. Thumbnail image and parameter data ﬁles

end in

th.png and .txt respectively. Labels are provided in

the labels.txt ﬁle, which contains one line for each image in

the dataset (ﬁrst value is the image ﬁle name), followed by a

list of event IDs matched to the image. The entire dataset is

available online

in a lossless compressed archive. Alterna-

tive formats available include SQL ﬁles for database useage

and direct image cell feature vectors (as ARFF ﬁles for Weka

[17]). Variations of the dataset will also be available, such as

a class-balanced dataset for standard use by the community.

3.4. Uses

There are a variety of directions and domains this dataset can

facilitate research in. At MSU, we use these data products to

http://dmlab.cs.montana.edu/solar/data/

deliver and better develop our solar CBIR system [18]. It is

also useful for benchmarking classiﬁcation [13] and indexing

effectiveness [19], but plenty of other interesting applications

in image processing, machine learning, and data mining are

possible.

Additionally, several science-related questions can now

be investigated through exploratory analysis as beneﬁcial in-

troductory work with this dataset. For example:

• What combination of image parameters (and algo-

rithms) work best for which types of events?

• How well can certain types of events be recognized in

images they were not reported in? (e.g. FI and SS)

• Are there any event interdependencies in multi-label re-

gions? Can this extend to spatial relationships (overlap,

envelope, neighbor, etc.)?

• Quiet Sun analysis – is there really such a thing? Inves-

tigating regions of low/no event activity, much like an

event type of its own.

4. CONCLUSION AND FUTURE WORK

This paper introduced the ﬁrst version of a new large-scale so-

lar image dataset, featuring full-disk images of the Sun, pre-

computed grid-based image cell signatures, and multi-class

region-based event labels. As an extension of previous work

[13], an upcoming publication using this dataset includes a

detailed statistical analysis of image parameters and events,

as well as basic data mining and machine learning of multi-

event regions. In related works, this dataset is also being ex-

tended for use as an “event tracking” dataset, introducing de-

pendencies on spatial and temporal attributes and exploring

the possibility of mining spatio-temporal co-occurrance pat-

terns (STCOPs) in solar physics [20, 21].

By introducing this ready-to-use dataset to the public, we

hope to interest more researchers from various backgrounds

(computer vision, machine learning, data mining, etc.) in the

domain of solar physics, further bridging the gap between

many interdisciplinary and mutually-beneﬁcial research do-

mains. In the future, we plan to extend the dataset with: (1)

a longer timeframe of up-to-date and labeled data, (2) more

observations from other instruments onboard SDO and else-

where, and (3) more types of events and additional event-

speciﬁc attributes for extended analysis of event “sub-type”

characteristics. Further information and news about dataset

updates and uses will be maintained online with the dataset

for timely dissemination. We welcome and encourage the

community to provide feedback about the dataset, including

ideas for alternative formats and future improvements.

5. ACKNOWLEDGEMENTS

This work was supported in part by two NASA Grant Awards:

1) No. NNX09AB03G, and 2) No. NNX11AM13A.

6. REFERENCES

[1] P. C. H. Martens, G. D. R. Attrill, A. R. Davey, A. Engell,

S. Farid, P. C. Grigis, et al., “Computer vision for the solar

dynamics observatory (SDO),” Solar Physics, Jan 2011.

[2] W. Hersh, H. Mller, and J. Kalpathy-Cramer, “The image-

clefmed medical image retrieval task test collection,” Journal

of Digital Imaging, vol. 22, pp. 648–655, 2009.

[3] M. Everingham, L. Gool, C. Williams, J. Winn, and A. Zis-

serman, “The pascal visual object classes (voc) challenge,” In-

ternational Journal of Computer Vision, vol. 88, pp. 303–338,

2010.

[4] G. L. Withbroe, “Living With a Star,” in AAS/Solar Physics

Division Meeting #31, vol. 32 of Bulletin of the American As-

tronomical Society, p. 839, May 2000.

[5] W. Pesnell, B. Thompson, and P. Chamberlin, “The solar dy-

namics observatory (sdo),” Solar Physics, vol. 275, pp. 3–15,

2012.

[6] J. Lemen, A. Title, D. Akin, P. Boerner, C. Chou, et al., “The

Atmospheric Imaging Assembly (AIA) on the Solar Dynamics

Observatory (SDO),” Solar Physics, vol. 275, pp. 17–40, 2012.

[7] H. Tamura, S. Mori, and T. Yamawaki, “Texture features corre-

sponding to visual perception,” IEEE Transactions on Systems,

Man, and Cybernetics, vol. 8, no. 6, pp. 460–472, 1978.

[8] J. M. Banda and R. A. Angryk, “Selection of image parameters

as the ﬁrst step towards creating a CBIR system for the solar

dynamics observatory,” in International Conference on Digi-

tal Image Computing: Techniques and Applications (DICTA),

pp. 528–534, 2010.

[9] J. M. Banda and R. A. Angryk, “An experimental evaluation

of popular image parameters for monochromatic solar image

categorization,” in The 23rd Florida Artiﬁcial Intelligence Re-

search Society Conference (FLAIRS), pp. 380–385, 2010.

[10] B. Handy, L. Acton, C. Kankelborg, C. Wolfson, D. Akin,

et al., “The transition region and coronal explorer,” Solar

Physics, vol. 187, pp. 229–260, 1999.

[11] J. M. Banda, R. A. Angryk, and P. C. H. Martens, “On the sur-

prisingly accurate transfer of image parameters between med-

ical and solar images,” in 18th IEEE Int. Conf. on Image Pro-

cessing (ICIP), pp. 3669–3672, 2011.

[12] J. M. Banda, R. A. Angryk, and P. C. H. Martens, “Steps to-

ward a large-scale solar image data analysis to differentiate so-

lar phenomena,” Solar Physics, pp. 1–28, 2013.

[13] M. A. Schuh, J. M. Banda, P. N. Bernasconi, R. A. Angryk,

and P. C. H. Martens, “A comparative evaluation of automated

solar ﬁlament detection.” under review, 2013.

[14] N. Hurlburt, M. Cheung, C. Schrijver, L. Chang, S. Freeland,

et al., “Heliophysics event knowledgebase for solar dynamics

observatory (SDO) and beyond,” Solar Physics, 2010.

[15] W. Thompson, “Coordinate systems for solar image data,” As-

tronomy and Astrophysics, vol. 449, no. 2, pp. 791–803, 2006.

[16] W. D. Pence, “Cﬁtsio, v2.0: A new full-featured data inter-

face,” in Astronomical Data Analysis Software and Systems,

(California), 1999.

[17] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,

and I. Witten, “The WEKA data mining software: An update,”

SIGKDD, 2009.

[18] J. M. Banda, M. A. Schuh, T. Wylie, P. McInerney, and R. A.

Angryk, “When too similar is bad: A practical example of the

solar dynamics observatory content-based image-retrieval sys-

tem,” in 17th East-European Conf. on Advances in Databases

and Information Systems (ADBIS), 2013.

[19] M. A. Schuh, T. Wylie, and R. A. Angryk, “Improving per-

formance of high-dimensional knn retrieval through localized

dataspace segmentation and hybrid indexing.,” in 17th East-

European Conf. on Advances in Databases and Information

Systems (ADBIS), 2013.

[20] K. G. Pillai, R. A. Angryk, J. M. Banda, M. A. Schuh, and

T. Wylie, “Spatio-temporal co-occurrence pattern mining in

data sets with evolving regions,” in ICDM Workshops, 2012,

In press.

[21] K. G. Pillai, R. A. Angryk, J. M. Banda, T. Wylie, and M. A.

Schuh, “Spatio-temporal co-occurrence rules,” in 17th East-

European Conf. on Advances in Databases and Information

Systems (ADBIS), 2013.

Spatiotemporal event detection: a review

Article

Full-text available

Mar 2020

The advancements of sensing technologies, including remote sensing, in situ sensing, social sensing, and health sensing, have tremendously improved our capability to observe and record natural and social phenomena, such as natural disasters, presidential elections, and infectious diseases. The observations have provided an unprecedented opportunity to better understand and respond to the spatiotemporal dynamics of the environment, urban settings, health and disease propagation, business decisions, and crisis and crime. Spatiotemporal event detection serves as a gateway to enable a better understanding by detecting events that represent the abnormal status of relevant phenomena. This paper reviews the literature for different sensing capabilities, spatiotemporal event extraction methods, and categories of applications for the detected events. The novelty of this review is to revisit the definition and requirements of event detection and to layout the overall workflow (from sensing and event extraction methods to the operations and decision-supporting processes based on the extracted events) as an agenda for future event detection research. Guidance is presented on the current challenges to this research agenda, and future directions are discussed for conducting spatiotemporal event detection in the era of big data, advanced sensing, and artificial intelligence.

Tracking Solar Phenomena from the SDO

Article

Full-text available

Dec 2018
ASTROPHYS J

This paper focuses on the problem of tracking solar phenomena by creating spatiotemporal trajectories from solar event detection reports. Though tracking of multiple objects in video sequences has seen much research and improvement in recent years, there has been relatively little focus in the domain of tracking solar phenomena (events). In this work, we improve upon our previous endeavors by eliminating offline model training requirements and utilizing crowd-sourced human labels to evaluate our performance. We apply our method to the metadata of two solar event types spanning four years of detection reports from the automated detection modules for the Solar Dynamics Observatory (SDO) mission. We compare our results with those produced by the detection module for active regions and coronal holes by using a crowd-sourced trajectory database as the ground truth. We show that our results are as good or better than the event-specific detection module for these two event types. This is especially promising because our tracking algorithm is a generalized module for all solar events, and not specific to a single event type allowing it to be applied to other solar event types reported to Heliophysics Event Knowledgebase that do not contain tracking information.

Roadmap for Reliable Ensemble Forecasting of the Sun-Earth System

Preprint

Full-text available

Oct 2018

The authors of this report met on 28-30 March 2018 at the New Jersey Institute of Technology, Newark, New Jersey, for a 3-day workshop that brought together a group of data providers, expert modelers, and computer and data scientists, in the solar discipline. Their objective was to identify challenges in the path towards building an effective framework to achieve transformative advances in the understanding and forecasting of the Sun-Earth system from the upper convection zone of the Sun to the Earth's magnetosphere. The workshop aimed to develop a research roadmap that targets the scientific challenge of coupling observations and modeling with emerging data-science research to extract knowledge from the large volumes of data (observed and simulated) while stimulating computer science with new research applications. The desire among the attendees was to promote future trans-disciplinary collaborations and identify areas of convergence across disciplines. The workshop combined a set of plenary sessions featuring invited introductory talks and workshop progress reports, interleaved with a set of breakout sessions focused on specific topics of interest. Each breakout group generated short documents, listing the challenges identified during their discussions in addition to possible ways of attacking them collectively. These documents were combined into this report-wherein a list of prioritized activities have been collated, shared and endorsed.

Sparse Coding for Event Tracking and Image Retrieval

Thesis

Full-text available

Jul 2018

Dustin Kempton

Comparing regions of images is a fundamental task in both similarity based object tracking as well as retrieval of images from image datasets, where an exemplar image is used as the query. In this thesis, we focus on the task of creating a method of comparison for images produced by NASA’s Solar Dynamic Observatory mission. This mission has been in operation for several years and produces almost 700 Gigabytes of data per day from the Atmospheric Imaging Assembly instrument alone. This has created a massive repository of high-quality solar images to analyze and categorize. To this end, we are concerned with the creation of image region descriptors that are selective enough to differentiate between highly similar images yet compact enough to be compared in an efficient manner, while also being indexable with current indexing technology. We produce such descriptors by pooling sparse coding vectors produced by spanning learned basis dictionaries. Various pooled vectors are used to describe regions of images in event tracking, entire image descriptors for image comparison in content based image retrieval, and as region descriptors to be used in a content based image retrieval system on the SDO AIA image pipeline.

Method G: Uncertainty Quantification for Distributed Data Problems using Generalized Fiducial Inference

Preprint

May 2018

It is not unusual for a data analyst to encounter data sets distributed across several computers. This can happen for reasons such as privacy concerns, efficiency of likelihood evaluations, or just the sheer size of the whole data set. This presents new challenges to statisticians as even computing simple summary statistics such as the median becomes computationally challenging. Furthermore, if other advanced statistical methods are desired, novel computational strategies are needed. In this paper we propose a new approach for distributed analysis of massive data that is suitable for generalized fiducial inference and is based on a careful implementation of a "divide and conquer" strategy combined with importance sampling. The proposed approach requires only small amount of communication between nodes, and is shown to be asymptotically equivalent to using the whole data set. Unlike most existing methods, the proposed approach produces uncertainty measures (such as confidence intervals) in addition to point estimates for parameters of interest. The proposed approach is also applied to the analysis of a large set of solar images.

A Survey of Event Detection Techniques in Intelligent IoT System

Conference Paper

Jul 2023

Method G: Uncertainty Quantification for Distributed Data Problems Using Generalized Fiducial Inference

Article

May 2021

It is not unusual for a data analyst to encounter data sets distributed across several computers. This can happen for reasons such as privacy concerns, efficiency of likelihood evaluations, or just the sheer size of the whole data set. This presents new challenges to statisticians as even computing simple summary statistics such as the median becomes computationally challenging. Furthermore, if other advanced statistical methods are desired, novel computational strategies are needed. In this paper we propose a new approach for distributed analysis of massive data that is suitable for generalized fiducial inference and is based on a careful implementation of a “divide and conquer” strategy combined with importance sampling. The proposed approach requires only small amount of communication between nodes, and is shown to be asymptotically equivalent to using the whole data set. Unlike most existing methods, the proposed approach produces uncertainty measures (such as confidence intervals) in addition to point estimates for parameters of interest. The proposed approach is also applied to the analysis of a large set of solar images.

The Supatlantique Scanned Documents Database for Digital Image Forensics Purposes

Conference Paper

Oct 2020

The ease of use and the capabilities of image editing tools has raised the challenges in the emerging field of digital image forensics related to scanned documents. Unfortunately, the universality of current methods and their applicability in real world scenarios have not been proven yet due to the absence of a standardized image database. In this paper, we introduce a novel test collection of more than 4500 images annotated with respect to each scanner as a useful tool for forensics in- vestigators to test and compare scanner-based forensic tech- niques. It is an image database that contains document of various content scanned with more than one resolution with 11 different scanner instances of widely known brands. This selection is based on our latest work on the identification of scanners at the origin of digitized documents and is adapted to fit any source scanner identification technique. The SUPATLANTIQUE database is available for free to the research community and is intended to become a useful and a reference resource for researchers in this field.

A Gentle Introduction to Spatiotemporal Data Mining

Chapter

Oct 2018

Spatiotemporal data mining refers to the extraction of knowledge, regularly repeating relationships, and interesting patterns from data with spatial and temporal aspects. In recent years, many spatiotemporal frequent pattern mining algorithms were developed for spatiotemporal event instances represented by a series of region objects that evolves over time. These algorithms focus on the discovery of spatiotemporal co-occurrence patterns and event sequences by inspecting the spatiotemporal overlap and follow relationships. Before moving onto these relationships, we will demonstrate different types of spatiotemporal knowledge to place the relationships and methods in the greater context. This chapter provides a bird-eye view on the output of spatiotemporal data mining techniques in the literature, gives rationale for mining spatiotemporal patterns from evolving regions, and explains the challenges of mining patterns from evolving region data.

A large-scale solar dynamics observatory image dataset for computer vision applications

Article

Full-text available

Jul 2017

The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun’s activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA’s solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.

The Solar Dynamics Observatory (SDO)

Article

Full-text available

Nov 2012
SOL PHYS

The Solar Dynamics Observatory (SDO) was launched on 11 February 2010 at 15:23 UT from Kennedy Space Center aboard an Atlas V 401 (AV-021) launch vehicle. A series of apogee-motor firings lifted SDO from an initial geosynchronous transfer orbit into a circular geosynchronous orbit inclined by 28° about the longitude of the SDO-dedicated ground station in New Mexico. SDO began returning science data on 1 May 2010. SDO is the first space-weather mission in NASA's Living With a Star (LWS) Program. SDO's main goal is to understand, driving toward a predictive capability, those solar variations that influence life on Earth and humanity's technological systems. The SDO science investigations will determine how the Sun's magnetic field is generated and structured, how this stored magnetic energy is released into the heliosphere and geospace as the solar wind, energetic particles, and variations in the solar irradiance. Insights gained from SDO investigations will also lead to an increased understanding of the role that solar variability plays in changes in Earth's atmospheric chemistry and climate. The SDO mission includes three scientific investigations (the Atmospheric Imaging Assembly (AIA), Extreme Ultraviolet Variability Experiment (EVE), and Helioseismic and Magnetic Imager (HMI)), a spacecraft bus, and a dedicated ground station to handle the telemetry. The Goddard Space Flight Center built and will operate the spacecraft during its planned five-year mission life; this includes: commanding the spacecraft, receiving the science data, and forwarding that data to the science teams. The science investigations teams at Stanford University, Lockheed Martin Solar Astrophysics Laboratory (LMSAL), and University of Colorado Laboratory for Atmospheric and Space Physics (LASP) will process, analyze, distribute, and archive the science data. We will describe the building of SDO and the science that it will provide to NASA. © Springer Science+Business Media B.V. 2012. All rights reserved.

Spatiotemporal Co-occurrence Rules

Chapter

Full-text available

Jan 2014

Spatiotemporal co-occurrence rules (STCORs) discovery is an important problem in many application domains such as weather monitoring and solar physics, which is our application focus. In this paper, we present a general framework to identify STCORs for continuously evolving spatiotemporal events that have extended spatial representations. We also analyse a set of anti-monotone (monotonically non-increasing) and non anti-monotone measures to identify STCORs. We then validate and evaluate our framework on a real-life data set and report results of the comparison of the number candidates needed to discover actual patterns, memory usage, and the number of STCORs discovered using the anti-monotonic and non anti-monotonic measures.

Chapter

Full-text available

Jan 2014

The measuring of interest and relevance have always been some of the main concerns when analyzing the results of a Content-Based Image-Retrieval (CBIR) system. In this work, we present a unique problem that the Solar Dynamics Observatory (SDO) CBIR system encounters: too many highly similar images. Producing over 70,000 images of the Sun per day, the problem of finding similar images is transformed into the problem of finding similar solar events based on image similarity. However, the most similar images of our dataset are temporal neighbors capturing the same event instance. Therefore a traditional CBIR system will return highly repetitive images rather than similar but distinct events. In this work we outline the problem in detail, present several approaches tested in order to solve this important image data mining and information retrieval issue.

Spatio-temporal Co-occurrence Pattern Mining in Data Sets with Evolving Regions

Conference Paper

Full-text available

Dec 2012

Spatio-temporal co-occurring patterns represent subsets of event types that occur together in both space and time. In comparison to previous work in this field, we present a general framework to identify spatio-temporal co-occurring patterns for continuously evolving spatio-temporal events that have polygon-like representations. We also propose a set of measures to identify spatio-temporal co-occurring patterns and propose an Apriori-based spatio-temporal co-occurrence mining algorithm to find prevalent spatio-temporal co-occurring patterns for extended spatial representations that evolve over time. We evaluate our framework on real-life data to demonstrate the effectiveness of our measures and the algorithm. We present results highlighting the importance of our measures in identifying spatio-temporal co-occurrence patterns.

Steps Toward a Large-Scale Solar Image Data Analysis to Differentiate Solar Phenomena

Article

Full-text available

May 2013
SOL PHYS

We detail the investigation of the first application of several dissimilarity measures for large-scale solar image data analysis. Using a solar-domain-specific benchmark dataset that contains multiple types of phenomena, we analyzed combinations of image parameters with different dissimilarity measures to determine the combinations that will allow us to differentiate between the multiple solar phenomena from both intra-class and inter-class perspectives, where by class we refer to the same types of solar phenomena. We also investigate the problem of reducing data dimensionality by applying multi-dimensional scaling to the dissimilarity matrices that we produced using the previously mentioned combinations. As an early investigation into dimensionality reduction, we investigate by applying multidimensional scaling (MDS) how many MDS components are needed to maintain a good representation of our data (in a new artificial data space) and how many can be discarded to enhance our querying performance. Finally, we present a comparative analysis of several classifiers to determine the quality of the dimensionality reduction achieved with this combination of image parameters, similarity measures, and MDS.

A Comparative Evaluation of Automated Solar Filament Detection

Conference Paper

Full-text available

May 2012

We present a comparative evaluation for automated filament detection in H-alpha solar images. By using metadata produced by the Advanced Automated Filament Detection and Characterization Code (AAFDCC) module, we adapted our Trainable Feature Recognition (TFR) component to accurately detect regions in solar images containing filaments. We first analyze the module's metadata and then transform it into labeled datasets for machine learning classification. Visualizations of data transformations and classification results are presented and accompanied by statistical findings. Our results confirm the reliable event reporting of the AAFDCC module as well as our ability to effectively detect solar filaments with our TFR component.

The Solar Dynamics Observatory

Article

Full-text available

Nov 2012

This volume is dedicated to the Solar Dynamics Observatory (SDO), which was launched 11 February 2010. The articles focus on the spacecraft and its instruments: the Atmospheric Imaging Assembly (AIA), the Extreme Ultraviolet Variability Experiment (EVE), and the Helioseismic and Magnetic Imager (HMI). Articles within also describe calibration results and data processing pipelines that are critical to understanding the data and products, concluding with a description of the successful Education and Public Outreach activities. This book is geared towards anyone interested in using the unprecedented data from SDO, whether for fundamental heliophysics research, space weather modeling and forecasting, or educational purposes. Previously published in Solar Physics journal, Vol. 275/1-2, 2012.

Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid Indexing

Conference Paper

Sep 2013

Efficient data indexing and nearest neighbor retrieval are challenging tasks in high-dimensional spaces. This work builds upon our previous analyses of iDistance partitioning strategies to develop the backbone of a new indexing method using a heuristic-guided hybrid index that further segments congested areas of the dataspace to improve overall performance for exact k-nearest neighbor (kNN) queries. We develop data-driven heuristics to intelligently guide the segmentation of distance-based partitions into spatially disjoint sections that can be quickly and efficiently pruned during retrieval. Extensive tests are performed on k-means derived partitions over datasets of varying dimensionality, size, and cluster compactness. Experiments on both real and synthetic high-dimensional data show that our new index performs significantly better on clustered data than the state-of-the-art iDistance indexing method.

The Transition Region and Coronal Explorer

Article

Dec 1998

The Transition Region and Coronal Explorer (TRACE), launched 1 April 1998, will have at the time of this meeting been in orbit for just over 8 months. In that time, the instrument will have taken over 500,000 exposures of the sun in ultraviolet and extreme ultraviolet wavelengths, will have completed three-forths of the nominal mission and will be approaching the end of the first eclipse season. The TRACE telescope is unique in its ability to observe in UV and EUV wavelengths at high cadence with unprecedented resolution. We present a review of the TRACE instrument and show current observations and results. We discuss the performance of the instrument in terms of observational capabilities, sensitivity, calibration, effects of aging on the instrument, CCD effects, and contamination effects.

Living With a Star

Article

Apr 2000

George L. Withbroe

NASA has proposed a new initiative, Living With a Star (LWS), a research and development program involving studying solar variability as it affects human technology, humans in space, and terrestrial climate. The goal of the initiative is to develop a capability to observe, understand, and predict the aspects of the connected Sun-Earth system that affect life and society. The initiative includes the following elements, (a) expanded utilization of the Solar Terrestrial Probe missions, (b) establishing a Space Weather Research Network with solar and geospace missions designed to address scientific research problems relevant to the above goal (c) data analysis/modeling targeted on scientific problems relevant to the goal of the program (d) Orbiting Environmental Testbeds for testing rad-hard and rad-tolerant systems, and (e) partnering with other agencies and industry.

A large-scale solar image dataset with labeled event regions

Abstract and Figures

Recommended publications

Tailoring Information Provision and Consent Processes to Research Contexts: The Value of Rapid Asses...

Improving Induction of Linear Classification Trees with Genetic Programming.

“Catching Up with the Data: Research Issues in Mining Data Streams.”

Observatory: A Tool for Recording, Annotating and Reviewing Emotion-Related Data