ResearchPDF Available

Research and Development of AI-aided Software-Defined Radio GNSS Receiver

Authors:

Abstract

Endorsed individual research project under URIS scheme, supervised by Dr. Li-ta Hsu, and co-supervised by Dr. Guohao Zhang, Department of Aeronautical and Aviation Engineering, PolyU. Labeled dataset is also attached, please visit https://github.com/Thorkee/GNSS_SDR/tree/main/labeled%20training%20dataset/v1 for explanation of dataset.
Research and Development of AI-aided Software-Defined Radio GNSS Receiver
Author: Ju LIN
Supervisor: Dr. Li-ta HSU
Co-supervisor: Dr. Guohao ZHANG
Duration: September 2022 – September 2023
Research Activities
a) Progress made during the reporting period
1. Systematic Literature Review and Knowledge Development
A comprehensive exploration of academic articles and research papers
pertaining to the integration of machine learning techniques in the realm of
Software-Defined Radio (SDR) has been undertaken. This meticulous approach
has enabled a profound understanding of the field and facilitated an initial
validation of the proposed concepts. Additionally, independent efforts have been
made to acquire knowledge in various subject areas including navigation
positioning, multipath phenomena, error analysis, data analysis, and machine
learning. Educational resources from reputable institutions such as Stanford
University, along with a diverse range of online platforms, have been leveraged
to supplement this endeavor.
2. Acquisition of Training & Testing Dataset
By harnessing the open-source SDR receiver algorithm and comprehending the
fundamental principles of GNSS acquisition, pertinent features were collected
with the raw static dataset collected in the open-sky environments in multiple
locations, including Hotel Icon, Jockey Club Auditorium in Campus, and Mong
Kok Stadium. The acquisition of dataset may aid in the accurate prediction of
potential positioning errors and their corresponding magnitudes. This process
lays robust groundwork for the subsequent phases of training and evaluation of
machine learning models.
3. Determination of Variables and Labels
The identification of variables, encompassing various attributes, lies at the core of
effectively characterizing the data under scrutiny. By comprehensive examination
of pertinent research papers, candidate variables encompass a range of
quantitative and categorical descriptors tailored to the unique characteristics from
satellite performance and correlators’ outputs. The selection of these variables
was guided by an in-depth exploration of their relevance and significance,
ensuring their suitability for the research objectives.
Concurrently, the labels, serving as the output of the whole model, assume
paramount importance in the model training process. The labels, varied based on
the expectations on different model’s predictive performance, embody the
essence of the target variable under investigation and play a pivotal role in
prediction or classification tasks.
4. Classification & Labeling of Acquired Data
By leveraging the gathered relevant GNSS features and conducting an extensive
review of multiple research papers and reports, a comprehensive classification
and labeling process has been performed on key features and their corresponding
outputs. These classified features and outputs have been organized within a
dedicated matrix integrated into a designated model. Additionally, an analysis of
the features' performance, including the utilization of techniques such as
Principal Component Analysis (PCA), has been conducted. The incorporation of
techniques such as PCA aids in dimensionality reduction and provides insights
into the most influential features, further optimizing the model's performance.
5. Model Selection, Training & Optimization
A comprehensive investigation was conducted to determine the optimal model
selection for the research at hand. Extensive research revealed that within the
domain, researchers have employed various machine learning algorithms,
including Random Forest, Support Vector Machines (SVM), Boosting Trees,
Ensemble Boosting algorithms, and even Neural Networks.
A substantial pre-labeled dataset was imported into the model for training
purposes. Before training the model, we perform feature selection and weighting
to prioritize different features. In addition, certain models in the study also
employed optimization algorithms such as Bayesian optimization. In-depth
understanding of the principles underlying these algorithms was acquired, and
through a thorough analysis of performance metrics, such as Root Mean Square
Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination
(R-squared), the models were evaluated. RMSE and MAE quantify prediction
errors, while R-squared represents the model's explanatory power with respect to
the target variable. These metrics provide essential criteria for assessing model
accuracy and interpretability.
6. Explorational Study on Future Research Directions
The explorational study mainly investigates the possibilities of model
improvement and the possible future research directions with the current
positioning error prediction models.
b) Significant results and deliverables achieved
1. Variable Determination
After evaluations, several signal-level and correlator-level features are used in the
model training process.
I. Signal-level Variable
a. Feature I: Signal Strength
Signal strength is determined by the ratio of carrier power to noise power per
unit of bandwidth, measured in decibel-hertz (dB-Hz). In NLOS/multipath
scenarios, where the positioning accuracy is hugely compromised and thereby
the positioning error will be larger, the Carrier-to-Noise Ratio (C/NR) is
consistently lower than in LOS conditions, according to Gu et al. [1]. C/NR
serves as a measure of multipath contamination, and can help predict navigation
errors. To accurately predict navigation errors, additional variables must be
considered alongside C/NR. A comprehensive approach that accounts for
multiple factors is vital for improving the accuracy of navigation error
prediction. By incorporating a comprehensive set of factors and considering the
interplay between them, a robust predictive model can be developed.
b. Feature II: Satellite Elevation Angle (EA)
The satellite elevation angle (EA) is calculated by:
𝜃 = arcsin
𝐷
where
θ: Elevation angle of the satellite in degrees.
h: Altitude of the satellite above the observer’s altitude.
D: Distance between the observer's location and the satellite.
Likewise, if the EA is larger, generally, there will be lower chance for the signal
to be contaminated by NLOS/multipath. Therefore, this integrated approach,
leveraging the information provided by C/NR alongside EA, holds promise for
achieving improved accuracy in navigation error prediction.
c. Feature III: Satellite Azimuth Angle (AZ)
The satellite azimuth angle (AZ) is calculated by:
𝜃 = 𝑎𝑡𝑎𝑛2(𝑌, 𝑋)
where
𝜃: Azimuth angle of the satellite in degrees.
Y: The eastward distance between the observer's location and the satellite.
X: The northward distance between the observer's location and the
satellite.
d. Feature IV: Pseudorange Residuals
Pseudorange residuals play a significant role in the analysis and mitigation of
positioning errors. Various factors, such as atmospheric delays, clock bias,
multipath effects, and receiver noise, contribute to these residuals. These
residuals represent the discrepancies between measured and predicted
pseudorange values in positioning.
II. Correlator-level Variable
Referring to different publicization and performance analysis, two additional
correlator-level variables have been adopted in some of the model training
processes.
a. Feature V: Noise Ratio [2]
The equation reveals the relationship between the in-phase (i) and quadrature (q)
components of a signal, which represents the noise ratio and is also a
contributive variable in error prediction. It is calculated by:
𝐼
𝑄
b. Feature VI: Normalized Early-minus-late Power [2]
The normalized early minus late power refers to the difference between the
power levels of the early and late correlation components in a positioning
system. The discriminator, which processes these power differences, possesses a
remarkable property when the error magnitude exceeds the duration of a chip.
This property proves highly beneficial in enabling the Delay-Locked Loop
(DLL) to maintain accurate signal tracking even in the presence of noisy and
challenging signal conditions.
The feature is calculated by:
(𝐼
+ 𝑄
) (𝐼
+ 𝑄
)
(𝐼
+ 𝑄
) + (𝐼
+ 𝑄
)
2. Error Labeling
a. Labeling I: 3D Error of Positioning1
The 3D error of positioning is computed by:
𝜀 = (𝛿𝑥+ 𝛿𝑦+ 𝛿𝑧)
where
1 For the model where the label is 3D positioning error, mean value and standard deviation value are used
to represent the overall performance of different satellites within each epoch.
𝜀: Least square value of the 3D positioning error.
𝛿𝑥: Distance between the estimated position and the ground truth in x
direction.
𝛿𝑦: Distance between the estimated position and the ground truth in y
direction.
𝛿𝑧: Distance between the estimated position and the ground truth in z
direction.
b. Labeling II: Single Difference Error
The Single Difference Error in pseudorange measurements refers to the
discrepancy in range measurements between a satellite and a GNSS receiver,
aiming to cancel out the receiver clock bias which is unable to obtain. This error
is caused by various factors, including atmospheric delays, satellite orbit
inaccuracies, clock bias, and multipath effects. Detecting the Single Difference
Error is vital for achieving precise positioning solutions via reducing it.
The Single Difference Error is computed by:
𝜀 = (𝑃𝑟1 𝑃𝑟2) (𝑅1 𝑅2)
where
𝜀: Single Difference Error.
Pr1: The pseudorange measurement from the first satellite.
Pr2: The pseudorange measurement from the second satellite.
R1: The true distance between the first satellite and the receiver.
R2: The true distance between the second satellite and the receiver.
3. Dimensionality Reduction Analysis
Principle Component Analysis (PCA) has been conducted for dimensionality
reduction purpose. The methodology commences by applying feature
normalization to ensure consistent scaling across all features, thereby establishing
a uniform basis for subsequent analysis. Following this, PCA is employed to
effectively reduce the dimensionality of the dataset.
To determine the optimal number of principal components to retain, a
contribution ratio threshold is set. Principal components surpassing this threshold
are selectively extracted, enabling the projection of the data onto these chosen
components. As a result, the dataset undergoes a transformation by calculating
the eigenvalues and corresponding eigenvectors of the covariance matrix,
reducing its dimensionality to three and facilitating further analysis.
Scatter plots are used for the visualization of the processed dataset. Notably,
distinctive colors are assigned to represent the magnitudes of the error
eigenvalues. This representation allows for clear differentiation between the
effects of different principal components, aiding in the identification of
relationships between the principal components and errors.
a. Model I: 3D Error of Positioning as Label
After conducting principal component analysis (PCA) on the dataset of Model I,
the dimensionality of the original eight-dimensional dataset2 was successfully
reduced to three dimensions3.
In Figure 2, the resulting three dimensions also exhibit a high degree of
explanatory power. However, the preprocessing of certain data might have been
suboptimal due to the presence of correlated features resulting from mean and
standard deviation processing. Consequently, the visualization does not
effectively reveal clear differentiation within the dataset.
2 The eight parameters include the mean and standard deviation value of the EA of satellites in an epoch,
the mean and standard deviation value of the C/NR of satellites in an epoch, and the mean and standard
deviation value of the two correlator-level variables of satellites in an epoch.
3 In Figure 1, in the context of error projection in a new coordinate system, a higher degree of red color
represents larger values of the error after transformation. Conversely, a stronger blue color indicates
smaller error values in the projected data.
Figure 2 Cumulative Explained Variance of
Model I
Figure 1 3D Visualization Result of PCA of
Model I
b. Model II: Single Difference Error of Satellites as Label
After conducting principal component analysis (PCA) on the dataset of Model I,
the dimensionality of the original four-dimensional dataset was successfully
reduced to three dimensions4.
PCA analysis of Model II reveals distinct differentiating features, indicating
significant correlations between each principal component and the prediction
outcomes. It can be observed from Figure 3 that the Single Difference Error is
predominantly concentrated in the higher values of Principal Component 3,
while the values of Principal Component 1 and 2 are relatively lower.
Conversely, areas with lower values of Principal Component 3 exhibit a
concentration of smaller Single Difference Errors, accompanied by higher
values of Principal Component 1 and 2.
4. Ensemble Bagged Regression Tree‑based Predictive Models
a. Model Introduction
Prediction accuracy and training time are the two main factors considered in the
model selection. The ensemble decision tree is a machine learning technique
known for its ability to provide accurate predictions across various applications,
according to Adler et al. [3] and Mishra et al. [4]. As shown in Figure 5, the
ensemble bagged regression tree is a specific variation of this method that
combines multiple regression decision trees, each built on a subset of the
training data.
4 Similar to Figure 1, in the context of error projection in a new coordinate system, a higher degree of red
color represents larger values of the error after transformation. Conversely, a stronger blue color indicates
smaller error values in the projected data.
Figure 3 3D Visualization Result of PCA of
Model II Figure 4 Cumulative Explained Variance of
Model II
As a result, this ensemble model, consisting of multiple regression trees, brings
together a group of relatively simple learners to create a more robust and
accurate overall model. An important advantage is its resilience to minor
variations or uncertainties in the training data, making it suitable for tasks like
pseudorange error prediction.
Figure 5 Flowchart of the Ensemble Bagged Regression Tree Model
b. Dataset Overview
As two separate labeled datasets trained in two separate models, some
information including the variable and label used, quality of the original dataset,
validation methods, sample size and data type about the two datasets are shown
in Table 1.
Table 1: Summary of the Dataset
Model I Model II
𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Signal Level + Correlator Level Signal Level
𝐿𝑎𝑏𝑒𝑙
3D Error Single Difference Error
𝑀𝑒𝑎𝑛
𝐿𝑎𝑏𝑒𝑙
𝑉𝑎𝑙𝑢𝑒
47.698 17.919
𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛
5-fold Validation 5-fold Validation
𝑆𝑎𝑚𝑝𝑙𝑒
15,916 23,953
𝐷𝑎𝑡𝑎
𝑇𝑦𝑝𝑒
Static Dynamic
c. Model I: 3D Error Predictive Model
The training features in this study include both signal-level and correlator-level
features, while the output is labeled as the root mean square of the 3D
positioning error.
Several regression models have been created and their performance is compared
in Table 2. Notably, the proposed model exhibits the highest prediction
accuracy. When applied to a large dataset with an average label value of 47.698,
the model achieves an average prediction error of 8.64 meters in the cross
validation set. This indicates that the model provides a precise estimation of the
output and can be considered as an accurate approximation.
Table 2: Comparison of Ensemble Bagged Regression Trees and other
Candidate Models
Model R-Squared RMSE MAE Training
Time
𝐸𝑛𝑠𝑒𝑚𝑏𝑙𝑒
𝐵𝑎𝑔𝑔𝑒𝑑
𝑇𝑟𝑒𝑒
0.57 12.289 8.6437 28.881
𝐹𝑖𝑛𝑒
𝑇𝑟𝑒𝑒
0.37 14.823 10.439 11.297
𝐶𝑢𝑏𝑖𝑐
𝑆𝑉𝑀
-3127.42 1045.7 649.29 1057.8
𝑇𝑟𝑖𝑙𝑎𝑦𝑒𝑟𝑒𝑑
𝑁𝑒𝑢𝑟𝑎𝑙
𝑁𝑒𝑡𝑤𝑜𝑟𝑘
0.38 14.77 10.515 121.17
Figure 6 and Figure 7 show the model’s response plot and validation prediction
versus actual plot. The regression model effectively captures the trends in
multiple data variations and demonstrates a considerable level of accuracy in its
predictive performance.
Figure 6 Response Plot of the Model I Figure 7
Validation Prediction versus Actual
Plot of the Model I
d. Model II: Single Difference Error Predictive Model
Signal-level features are employed as training features, while the Single
Difference Error is designated as the output variable.
Several regression models have been created and their performance is compared
in Table 3.
During the process of cross-validation, the regression model exhibits the highest
prediction accuracy and exhibits a remarkable level of fit between the predicted
values and the actual values. The average error value is only 4.76 meters, and
the plot demonstrates a strong alignment, indicating excellent predictive
accuracy.
Table 3: Comparison of Ensemble Bagged Regression Trees and other Candidate
Models
Model R-Squared RMSE MAE Training
Time
𝐸𝑛𝑠𝑒𝑚𝑏𝑙𝑒
𝐵𝑎𝑔𝑔𝑒𝑑
𝑇𝑟𝑒𝑒
0.97 8.2567 4.76 42.115
𝐹𝑖𝑛𝑒
𝑇𝑟𝑒𝑒
0.96 9.7831 5.8054 8.1977
𝐶𝑢𝑏𝑖𝑐
𝑆𝑉𝑀
0.96 9.7619 6.1322 1115.5
𝑇𝑟𝑖𝑙𝑎𝑦𝑒𝑟𝑒𝑑
𝑁𝑒𝑢𝑟𝑎𝑙
𝑁𝑒𝑡𝑤𝑜𝑟𝑘
0.97 8.9608 5.44 256.3
Figures 8 and Figure 9 illustrate the response plot and validation prediction
versus actual plot of the model, respectively. In the plots, it is evident that the
predicted Single Difference Error closely aligns with the actual values. The
model exhibits a high degree of fit between the predicted values and the
observed values, showcasing its exceptional predictive performance.
e. Conclusion
Through dataset labeling, pre-processing, and training, two distinct predictive
models, particularly in predicting pseudorange errors, have exhibited
satisfactory levels of prediction accuracy and performance. By focusing on the
prediction of pseudorange errors and positioning errors, our models possess the
potential to significantly reduce the magnitude of positioning errors.
In future research, a key focus should be on improving the quality of the dataset,
particularly by enhancing the calibration of ground truth during the collection of
static data. Furthermore, there is an opportunity to optimize and refine the
handling of principal component analysis (PCA), specifically by selecting and
optimizing different potential features. Exploring advanced algorithms such as
deep learning and neural networks is also recommended. Fine-tuning and
optimizing model parameters, as well as hyperparameters, should be conducted
in a meticulous manner. These endeavors will contribute to further
advancements in the field and enhance the overall performance of the predictive
models.
Figure 9 Validation Prediction versus
Actual Plot of the Model II
Figure 8 Response Plot of the Model II
References
[1] Y. Gu, L.T. Hsu, and S. Kamijo, “GNSS/Onboard Inertial Sensor Integration
With the Aid of 3-D Building Map for Lane-Level Vehicle Self-Localization
in Urban Canyon,” IEEE transactions on vehicular technology, vol. 65, no. 6,
pp. 4274–4287, 2016, doi: 10.1109/TVT.2015.2497001.
[2] K. Borre, A Software-Defined GPS and Galileo Receiver: A Single-
Frequency Approach, 1st ed. Netherlands: Springer Nature, 2007.
[3] W. Adler, S. Potapov, and B. Lausen, “Classification of repeated
measurements data using tree-based ensemble methods,” Computational
statistics, vol. 26, no. 2, pp. 355–369, 2011, doi: 10.1007/s00180-011-0249-
1.
[4] P. K. Mishra, A. Yadav, and M. Pazoki, “A Novel Fault Classification Scheme
for Series Capacitor Compensated Transmission Line Based on Bagged Tree
Ensemble Classifier,” IEEE access, vol. 6, pp. 27373–27382, 2018, doi:
10.1109/ACCESS.2018.2836401.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.