Conference PaperPDF Available

Finding Exoplanets Using NASA's Kepler Telescope

Authors:

Abstract

Humans are always fascinated about what's beyond this globe, what makes up the outer space? Can we live on another planet? To find solution to these questions many space missions were launched but the most successful of them all is Kepler Space Telescope mission, for years, scientists have used data from NASA's Kepler Space Telescope to look for and discover thousands of transiting exoplanets. This project extended this search with our Machine learning techniques. Machine learning and deep learning techniques have proven to be broadly applicable in various scientific research areas. This project aims to exploit some of these methods to improve the conventional algorithm-based approach used in astrophysics today to detect exoplanets. Our study indicates that machine learning will facilitate the characterization of exoplanets in future analysis of large astronomy data sets.
Impact Factor-4.013 e-ISSN: 2581-6667
International Journal of Engineering and Creative Science, Vol. 5, No. 4, 2022
www.ijecs.net
12
Finding Exoplanets Using NASA’s Kepler
Telescope
Om Rachalwar1, Tanmay Kapure2, Aasiktee Patil3, Nakul Lekurwale4, Dr. Sharda A. Chhabria5
1,2,3,4, Students at G H Raisoni Institute of Engineering and Technology, Nagpur (M.S)
5Associate Professor, Department of Artificial Intelligence
BOT LAB INCHARGE [Center of Excellence]
G H Raisoni Institute of Engineering & Technology, Nagpur (M.S.)
Abstract Humans are always fascinated about what’s beyond this globe, what makes up the outer space? Can we live on
another planet? To find solution to these questions many space missions were launched but the most successful of them all
is Kepler Space Telescope mission, for years, scientists have used data from NASA’s Kepler Space Telescope to look for
and discover thousands of transiting exoplanets. This project extended this search with our Machine learning techniques.
Machine learning and deep learning techniques have proven to be broadly applicable in various scientific research areas.
This project aims to exploit some of these methods to improve the conventional algorithm-based approach used in
astrophysics today to detect exoplanets. Our study indicates that machine learning will facilitate the characterization of
exoplanets in future analysis of large astronomy data sets.
Keywords- Exoplanets, NASA, Kepler Telescope, Machine Learning, Dataset.
I- INTRODUCTION
Planets outside of our solar system are known as extra-
solar planets or exoplanets. The discovery of the first
planets in 1992 (Wolszczan & Frail 1992) opened our
minds to the possibility of life beyond Earth. In the last
three decades, planet detection has become a major
research area in astrophysics and astronomers have
developed various methods to detect exoplanets. As of
July 2020, astronomers have discovered 4281 confirmed
planets and a majority of those are detected by the transit
method.[5]
There are Various methods for detecting exoplanets
which we have discussed further inside, and from our
project we discovered that the most effective way of
finding an exoplanet was using the Transit Method.
Depending on the observer’s position, a planet may
move in front of its host star blocking a part of the star’s
light and causing a dip in its brightness. In the transit
method, we continually observe a star and look for such
’dips’ in its brightness.[6]
In the Project we first determined the method we will
use to find Exoplanets, after our research we came to
know the many methods used to find an exoplanet, but
the most popular ones are listed below:
1. Transit Method: Depending on the observer’s
position, a planet may move in front of its host star
blocking a part of the star’s light and causing a dip
in its brightness. In the transit method, we
continually observe a star and look for such ’dips’ in
Impact Factor-4.013 e-ISSN: 2581-6667
International Journal of Engineering and Creative Science, Vol. 5, No. 4, 2022
www.ijecs.net
13
its brightness. Since the telescope is observing
continually it observes a low light area or a dip in
light and notes it down as an exoplanet.[12]
2. Radial Velocity: The radial-velocity method for
detecting exoplanets relies on the fact that a star
does not remain completely stationary when it is
orbited by a planet. The star moves, ever so slightly,
in a small circle or ellipse, responding to the
gravitational tug of its smaller companion. When
viewed from a distance, these slight movements
affect the star's normal light spectrum, or color
signature. The spectrum of a star that is moving
towards the observer appears slightly shifted toward
bluer (shorter) wavelengths. If the star is moving
away, then its spectrum will be shifted toward
redder (longer) wavelengths.[13]
Fig 2. Radial Velocity [2]
3. Gravitational Microlensing: Albert Einstein’s
Theory of General Relativity taught us that gravity
causes a distortion of space-time. Thus, a large
mass, such as a star, causes the fabric of space to
bend around it, Gravity can therefore bend and focus
light, just like the lens of a magnifying glass. The
gravitational microlensing method allows planets to
be found using light from a distant star. The path of
the light from this star will be altered by the
presence of a massive lens in our case, a star and a
planet. Thus, for a short period of time, the distant
star will appear brighter.[14]
Fig 3. Gravitational Microlensing [3]
II - OVERVIEW
After finding the meaning to exoplanets and how to find
them this project moved forward to determining its
methods.
It came across many methods but the ones which
grabbed our attention were, Transit Method, Radial
Velocity, Gravitational lensing.
After Running some python codes, we came to Running
bar chart (shown in Fig 4) which made it clear that the
most effective way of finding an exoplanet was the
transit method.
Fig 4. The horizontal bar chart shows total confirmed
exoplanets by discovery method. The time period for the
graph is 1989 to 2021. The planet transit method is, by
far, the most influential technique used to discover
exoplanets.[4]
III - METHODODOLGY
The entire project is based on Python and has been
compiled in different environments such as Anaconda
Jupiter notebook and Google Collab.
The project has extensive uses of libraries such as
pandas, NumPy, matplotlib, TensorFlow, Keras, Sklearn.
This project has used various algorithms, so that we can
get the best possible outcome.
Algorithms that we used are:
KNN (K-Nearest Neighbor):
K-Nearest Neighbor is one of the simplest Machine
Learning algorithms based on Supervised Learning
technique. K-NN algorithm assumes the similarity
between the new case/data and available cases and put
the new case into the category that is most similar to the
available categories. K-NN algorithm stores all the
available data and classifies a new data point based on
the similarity. This means when new data appears then it
can be easily classified into a well suite category by
Impact Factor-4.013 e-ISSN: 2581-6667
International Journal of Engineering and Creative Science, Vol. 5, No. 4, 2022
www.ijecs.net
14
using K-NN algorithm is one of the simplest Machine
Learning algorithms based on Supervised Learning
technique. K-NN algorithm assumes the similarity
between the new case/data and available cases and put
the new case into the category that is most similar to the
available categories. K-NN algorithm stores all the
available data and classifies a new data point based on
the similarity. This means when new data appears then it
can be easily classified into a well suite category by
using K- NN algorithm.
KNN accuracy in the project:
XGboost:
XGboost stands for Extreme Boosting. It is a boosting
algorithm which is based on Ensemble Learning
Technique. It provides a wrapper class that allows
models to be treated as classifiers or regressors. The
overall parameters that are to be judged are General
Parameters, Booster Parameters and Learning Task
Parameters.[11]
XGboost accuracy in the Project:
Random Forest:
Random Forest is a flexible, easy to use machine
learning algorithm that produces, even without hyper-
parameter tuning, a great result most of the time. It is
also one of the most used algorithms, because of its
simplicity and diversity (it can be used for both
classification and regression tasks). In this post we'll
learn how the random forest algorithm works, how it
differs from other algorithms and how to use it.
Random Forest accuracy in our project:
Decision Tree: A Decision Tree is constructed by
asking a serious of questions with respect to a record of
the dataset we have got. Each time an answer is
received, a follow-up question is asked until a
conclusion about the class label of the record. The series
of questions and their possible answers can be organized
in the form of a decision tree, which is a hierarchical
structure consisting of nodes and directed edges.
Decision Trees (DTs) are a non-parametric supervised
learning method used for classification and regression.
The goal is to create a model that predicts the value of a
target variable by learning simple decision rules inferred
from the data features. A tree can be seen as a piecewise
constant approximation.
Decision Tree accuracy in the project:
CNN (Convolution Neural Network):
CNN is a particular type of feed-forward neural network
in AI. It is widely used for image recognition. CNN
represents the input data in the form of multidimensional
arrays. It works well for many labeled data. CNN
extract’s each portion of the input image, which is
known as receptive field. It assigns weights for each
neuron based on the significant role of the receptive
field. So that it can discriminate the importance of
neurons from one another.
Architecture of the Convolutional Network in our
project:
Reshape Layer, Input layer;
1D convolutional layer, consisting of 10, 3x3
filters, L2 regularization and RELU activation
function;
1D max pooling layer, window size - 2x2,
stride - 2;
Dropout (20%);
Fully connected layer with 48 neurons and
RELU activation function;
Dropout (20%);
Fully connected layer with 18 neurons and
RELU activation function.
Output layer with sigmoid activation function.
CNN Accuracy in the Project:
Impact Factor-4.013 e-ISSN: 2581-6667
International Journal of Engineering and Creative Science, Vol. 5, No. 4, 2022
www.ijecs.net
15
IV - COMPARISION OF METHODS
BASED-OFF ACCURACY
Fig 5. Accuracy Comparison
Here, in Fig. 5:
0 is the accuracy of Decision Tree i.e., 93%
1 is the accuracy of Random Forest i.e. 99%
2 is the accuracy of KNN i.e. 98%
3 is the accuracy of XGBoost i.e. 99%
Plotting the Accuracy Comparison for a better
visualization:
Fig 6. Line Plot
Fig. 7 Dot Plot
The results are clear from the graphs using two different
plotting methods.
To know about the implementation of the given models,
we recommend the readers to visit the Google Colab file:
Model Building for Exoplanets
V - RESULTS AND DISCUSSION
Results:
The Result of our project lies itself in the
Accuracies we found on use of different algorithms
trained on the NASA Kepler Dataset.
The most useful Algorithm with a high Accuracy of
99.37% on test data turned out to be the CNN, the
Neural network is very efficient in finding the
Exoplanets when defined correctly and implemented
on a good, pre-processed data.
Discussion:
Neural networks are an extremely versatile tool
when it comes to pattern recognition. Deep nets can
be trained to characterize planetary emission spectra
as a means of narrowing the initial parameter space
for atmospheric retrieval codes.[8]
Automated detection and characterization will pave
the way for future planet finding surveys by
eliminating human interaction that can create a
bottleneck in the analyses and introduce error.
Observations of exoplanets from different platforms
contain separate observational limitations that can
produce an incomplete set of measurements or
sample the data differently. Accounting for each is
best done using a CNN.
The Exoplanet Archive is designed and operated to
facilitate exoplanet research by serving as a
repository for planetary and stellar physical and
orbital properties, and by providing tools to work
with these data along with light curves from
Kepler.[9]
The Exoplanet Archive also hosts Kepler pipeline
data, including planet candidate lists that are
updated as often as weekly, pipeline-identified
threshold-crossing events (TCEs), data validation
documentation and target stellar data.[10]
From here on in the future, space exploration is only
going to get more advanced with new telescopes and
newly designed algorithms. Therefore, the scope of
finding Exoplanets is only going to increase in
Impact Factor-4.013 e-ISSN: 2581-6667
International Journal of Engineering and Creative Science, Vol. 5, No. 4, 2022
www.ijecs.net
16
future and we feel grateful to explore this domain of
machine learning application.
VI - SUMMARY AND CONCLUSIONS
Summary:
The primary objectives of this project were to create
a machine learning model to automate the
classification of Kepler cumulative object of interest
data and deploy that model to the outside world.
To achieve this goal, a comprehensive machine
learning pipeline was created to engineer the data,
train, and test models. This process attempted to
produce a set of candidate features after accounting
for missingness, statistical inconsistencies,
correlations, and bias.
These candidate features were used to train K-
Nearest Neighbor (KNN), Convolution Neural
Networks, XGBoost, Decision Tree and Random
Forest classifiers. Based on model performance and
explainable feature importance, the Random Forest
classifier and Convolution Neural Network was
chosen as the primary model for testing and
deployment.
This project used the Transit method to find the
Exoplanets, depending on the observer’s position, a
planet may move in front of its host star blocking a
part of the star’s light and causing a dip in its
brightness. In the transit method, we continually
observe a star and look for such ’dips’ in its
brightness. Since the telescope is observing
continually it observes a low light area or a dip in
light and notes it down as an exoplanet is the Transit
Method.
Conclusions:
Machine learning methods have seen very active
development in the last decade and now they are an
essential part of our way of working. In fact, for
everyone who interacts with a computer or
smartphone, it’s highly likely to interact with some
sort of machine learning programs. They are also
widely used in sciences for several use cases such as
detecting diseases, creating new samples without
long simulations, and finding string models in a
string theory landscape etc.
In astronomy, we have seen several applications
such as Galaxy Zoo (predicting galaxy
morphology), identifying gravitational waves, and
gravitational lens. With new and advanced
telescopes, data in astronomy are growing at a fast
pace. Conventional methods that involve human
judgements are not efficient and prone to variability
depending on the investigating expert.[7]
In the era of ‘big data’, manual interpretation of
potential exoplanet candidates is a labor-intensive
effort and difficult to do with small transit signals
(e.g., Earth-sized planets). Exoplanet transits have
different shapes, as a result of, e.g., the stellar
activity. Thus, a simple template does not suffice to
capture the subtle details, especially if the signal is
below the noise or strong systematics are present.
In this paper we proposed Exoplanet detection
methods based on classical machine learning.
ACKNOWLEDGMENT
We thank the developers of Lightkurve, NumPy,
Matplotlib, Scikit-learn and TensorFlow and Flourish
Data Visualization Website for their very useful free
software.
We are grateful to Rahul Jana for his GitHub Code file;
we would also like to thank all the authors and
publishers of the Research Papers used in our literature
review for publicly sharing their code and training data.
This enabled us to quickly test our model on different
datasets and provided a benchmark to compare our
results.
We gratefully acknowledge the contributions of NASA
and many in the community who have provided data on
Exoplanet archive and who work to maintain resources
for the exoplanet community.
REFERENCES
[1] [5][6]Abhishek Malik, Ben Moster and Christian
Obermeier. ‘Exoplanet Detection using Machine
Learning’. Published in astrophysics paper in year 2015.
[2] [7]Christopher J. Shallue and Andrew Vanderburg.
‘Identifying Exoplanets with Deep Learning: A Five-
planet Resonant Chain around Kepler-80 and an Eighth
Planet around Kepler-90’. Published in The Astronomical
Journal in the year 2018.
[3] [1][2][4] Sturrock, George Clayton; Manry, Brychan; and
Rafiqi, Sohail (2019) "Machine Learning Pipeline for
Exoplanet Classification," SMU Data Science Review:
Vol. 2 : No. 1 , Article 9.
[4] [8]Anne Dattilo, Andrew Vanderburg and Christopher J.
Shallue. ‘IDENTIFYING EXOPLANETS WITH DEEP
LEARNING II: TWO NEW SUPER-EARTHS
UNCOVERED BY A NEURAL NETWORK IN K2 DATA’.
Impact Factor-4.013 e-ISSN: 2581-6667
International Journal of Engineering and Creative Science, Vol. 5, No. 4, 2022
www.ijecs.net
17
[5] [11]Kyle A. Pearson, Leon Palafox and Caitlin A. Griffith.
‘Searching for exoplanets using artificial intelligence’.
Advance Access publication 2017 October 25.
[6] [9][10]By Marcy Harbut, Stephen Kane and 33 others of
California Institute of Technology. ‘The NASA Exoplanet
Archive: Data and Tools for Exoplanet Research’.
Publications of the Astronomical Society of the Pacific
July 2013.
[7] [3]https://sciencesprings.wordpress.com/2022/05/21/from-
aas-nova-a-massive-reanalysis-of-microlensing-events/
[8] [12][13][14]https://exoplanets.nasa.gov/alien-worlds/ways-
to-find-a-planet/
[9] [12][13][14] Ziqi Dai et al 2021 J. Phys.: Conf. Ser. 2012
012135
[10] Rice, K. The Detection and Characterization of Extrasolar
Planets. Challenges 2014, 5, 296-323.
https://doi.org/10.3390/challe5020296.
[11] Wang, Mutian, Peter Tuthill, and Barnaby Norris.
"Finding exoplanets in the habitable zone with light
echoes." Adaptive
[12] Optics Systems VII. Vol. 11448. International Society for
Optics and Photonics, 2020,
[13] Stanislav Poddaný, Luboš Brát, Ondřej Pejcha, Exoplanet
Transit Database. Reduction and processing of the
photometric data of exoplanet transits, New Astronomy,
Volume 15, Issue 3,2010,Pages 297-301,ISSN 1384-1076.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
NASA's Kepler Space Telescope was designed to determine the frequency of Earth-sized planets orbiting Sun-like stars, but these planets are on the very edge of the mission's detection sensitivity. Accurately determining the occurrence rate of these planets will require automatically and accurately assessing the likelihood that individual candidates are indeed planets, even at low signal-to-noise ratios. We present a method for classifying potential planet signals using deep learning, a class of machine learning algorithms that have recently become state-of-the-art in a wide variety of tasks. We train a deep convolutional neural network to predict whether a given signal is a transiting exoplanet or a false positive caused by astrophysical or instrumental phenomena. Our model is highly effective at ranking individual candidates by the likelihood that they are indeed planets: 98.8% of the time it ranks plausible planet signals higher than false positive signals in our test set. We apply our model to a new set of candidate signals that we identified in a search of known Kepler multi-planet systems. We statistically validate two new planets that are identified with high confidence by our model. One of these planets is part of a five-planet resonant chain around Kepler-80, with an orbital period closely matching the prediction by three-body Laplace relations. The other planet orbits Kepler-90, a star which was previously known to host seven transiting planets. Our discovery of an eighth planet brings Kepler-90 into a tie with our Sun as the star known to host the most planets.
Article
Full-text available
In the last decade, over a million stars were monitored to detect transiting planets. Manual interpretation of potential exoplanet candidates is labor intensive and subject to human error, the results of which are difficult to quantify. Here we present a new method of detecting exoplanet candidates in large planetary search projects which, unlike current methods uses a neural network. Neural networks, also called "deep learning" or "deep nets", are a state of the art machine learning technique designed to give a computer perception into a specific problem by training it to recognize patterns. Unlike past transit detection algorithms deep nets learn to recognize planet features instead of relying on hand-coded metrics that humans perceive as the most representative. Our deep learning algorithms are capable of detecting Earth-like exoplanets in noisy time-series data with 99%\% accuracy compared to a 73%\% accuracy using least-squares. For planet signals smaller than the noise we devise a method for finding periodic transits using a phase folding technique that yields a constraint when fitting for the orbital period. Deep nets are highly generalizable allowing data to be evaluated from different time series after interpolation. We validate our deep net on light curves from the Kepler mission and detect periodic transits similar to the true period without any model fitting.
Article
Full-text available
We have now confirmed the existence of > 1800 planets orbiting stars other thanthe Sun; known as extrasolar planets or exoplanets. The different methods for detectingsuch planets are sensitive to different regions of parameter space, and so, we are discoveringa wide diversity of exoplanets and exoplanetary systems. Characterizing such planets isdifficult, but we are starting to be able to determine something of their internal compositionand are beginning to be able to probe their atmospheres, the first step towards the detectionof bio-signatures and, hence, determining if a planet could be habitable or not. Here, Iwill review how we detect exoplanets, how we characterize exoplanetary systems and theexoplanets themselves, where we stand with respect to potentially habitable planets and howwe are progressing towards being able to actually determine if a planet could host life or not.
Article
We introduce a new machine learning based technique to detect exoplanets using the transit method. Machine learning and deep learning techniques have proven to be broadly applicable in various scientific research areas. We aim to exploit some of these methods to improve the conventional algorithm based approaches presently used in astrophysics to detect exoplanets. Using the time-series analysis library TSFresh to analyse light curves, we extracted 789 features from each curve, which capture the information about the characteristics of a light curve. We then used these features to train a gradient boosting classifier using the machine learning tool lightgbm. This approach was tested on K2 campaign 7 data with injected artificial transit signals, which showed that it is competitive compared to the conventional box least squares fitting (BLS) method. We further found that our method produced comparable results to existing state-of-the-art deep learning models, while being much more computationally efficient and without needing folded and secondary views of the light curves. For Kepler data, the method is able to predict a planet with an AUC of 0.948, so that 94.8 per cent of the true planet signals are ranked higher than non-planet signals. The resulting recall is 0.96, so that 96 per cent of real planets are classified as planets. For the Transiting Exoplanet Survey Satellite (TESS) data, we found our method can classify light curves with an accuracy of 0.98, and is able to identify planets with a recall of 0.82 at a precision of 0.63.
Article
We demonstrate the newly developed resource for exoplanet researchers – The Exoplanet Transit Database. This database is designed to be a web application and it is open for any exoplanet observer. It came on-line in September 2008. The ETD consists of three individual sections. One serves for predictions of the transits, the second one for processing and uploading new data from the observers. We use a simple analytical model of the transit to calculate the central time of transit, its duration and the depth of the transit. These values are then plotted into the observed–computed diagrams (O–C), that represent the last part of the application.