Conference PaperPDF Available

FTRFS: A Fault-Tolerant Radiation-Robust Filesystem for Space Use

Authors:

Abstract and Figures

A satellite’s on-board computer must guarantee integrity and recover degraded or damaged data over the entire duration of the spacecraft’s mission in an extreme, radiated environment. While redundancy and hardware-side voting can protect Magnetoresistive RAM well from device failure, more sophisticated software-side storage concepts are required if advanced operating systems are used. A combination of hardware and filesystem measures can thus drastically increase system dependability, even for missions with a very long duration. We present a novel POSIX-compatible filesystem implementation offering memory protection, checksumming and forward error correction.
Content may be subject to copyright.
FTRFS: A Fault-Tolerant Radiation-
Robust Filesystem for Space Use
Christian M. Fuchs1,2, Martin Langer2, and Carsten Trinitis1
1Technical University Munich, Chair for Computer Architecture and Organization
2Technical University Munich, Institute for Astronautics
christian.fuchs@tum.de,martin.langer@tum.de,carsten.trinitis@tum.de
Abstract. A satellite’s on-board computer must guarantee integrity
and recover degraded or damaged data over the entire duration of the
spacecraft’s mission in an extreme, radiated environment. While redun-
dancy and hardware-side voting can protect Magnetoresistive RAM well
from device failure, more sophisticated software-side storage concepts
are required if advanced operating systems are used. A combination of
hardware and filesystem measures can thus drastically increase system
dependability, even for missions with a very long duration. We present
a novel POSIX-compatible filesystem implementation offering memory
protection, checksumming and forward error correction.
Keywords: Dependability, Data Storage, Filesystem, Spacecraft, Satel-
lite, Radiation, MRAM, Memory Protection, Failure Tolerance, EDAC
1 Introduction
Recent small- and nanosatellite development has shown a rapid increase in avail-
able compute power and storage capacity, but also in system complexity. Cube-
sats [1], are currently the most popular nanosatellite form factor due to their
cost efficiency and ever increased system performance. The authors are involved
in developing such a satellite, MOVE-II, whose predecessor, First-MOVE, was
launched into Low Earth Orbit (LEO) in 2013.
More challenging quality requirements, limitations in energy consumption,
heat dissipation and the generally extreme environmental conditions result in
spaceflight software and hardware evolution being considerably more time con-
suming and slower paced. Ultimately, nanosatellite computing will evolve away
from federated clusters of specialized microcontrollers [2], a development that
could also be observed with larger spacecraft over the past decades. Instead,
more powerful, hardened, centralized general purpose computers will cover a
wider range of responsibilities [3, 4]. Thereby, overall spacecraft complexity can
be reduced and efficiency improved, while each individual computer’s complexity
increases [2]. Certainly, an increased compute burden also requires more sophis-
ticated operating system (OS) software, which in turn results in increased code
complexity and size [5].
2
For very simple computers, custom tailored OSs offer an excellent balance
of size and functionality. However, development of proprietary OSs for unique
custom computers has been abandoned in most of the IT industry, in favor of
standard soft- and hardware reuse. This is still an ongoing process in spaceflight,
though already producing a focus on a few types of radiation hardened processor
platforms (i.e. LEON3, PPC750, RAD6000, see [6]) running common OSs [7, 8].
The same evolution has begun in nanosatellite computing, albeit much faster.
OSs popular in spaceflight such as RTEMS can consume less than 256KB
of non-volatile (nv) memory [9], whereas Linux requires at least 2MB. If such
a larger OS is used aboard a satellite, more sophisticated storage concepts are
needed. Data must be stored permanently and consistently throughout the mis-
sion lifetime. Space missions often last between 5 and 10 years [10], but can reach
25 years or longer like with the Voyager probes. Thus a satellite’s command and
data handling (CDH), the on-board computer, must guarantee integrity and
recover degraded or damaged data (error detection and correction – EDAC)
over a prolonged period of time in a hostile environment. We consider a filesys-
tem (FS) the most resource conserving and efficient approach, which also allows
dynamically adjustable protection for the individual data structures. As Magne-
toresistive Random-Access Memory (MRAM) [11] is widely used for radiation
resistant data storage in nanosatellites, FTRFS is applied FS to this technology.
This paper is organized as follows: Section 2 introduces the specific require-
ments and hazards to computing in orbit and deep-space, as well as the properties
of different memories. Section 3 analyzes existing FSs and related research, to
avoid implementation from scratch. Section 4 presents our FS, offering memory
protection, forward error correction (FEC) and checksumming for both data and
metadata. First results of our FS implementation are provided in Section 4.4 and
its limitations are elaborated in Section 4.5. Finally, Section 5 contains potential
solutions and our next steps in development.
2 Impact of the Spaceflight Use Case
Besides extreme temperature variations and the absence of atmosphere for heat
dissipation, the impact of the near-Earth radiation environment must be consid-
ered in space computing. About 20% of all anomalies [12] aboard satellites can
be attributed to high-energy particles from the sources depicted in Figure 1.
Particles originating from Earth’s radiation belts, the Van Allen belts, consist
mostly of trapped protons and electrons. Galactic Cosmic Rays from beyond
our solar system are mostly protons [13, 14], whereas various other high-energy
particles are ejected by the Sun during Solar Particle Events (SPEs).
Therefore, depending on the orbit of the spacecraft and the occurrence of
SPEs, an on-board computer will be penetrated by a mixture of high-energy pro-
tons, electrons and heavy ions. Physical shielding using aluminum or other ma-
terial can reduce certain radiation effects. However, sufficient protection would
require a spacecraft to dedicate unreasonable additional mass to shielding.
3
Galactic Cosmic Rays
MOVE-II in LEO
SAA
Van Allen Belts
Solar Energetic Particles
Fig. 1. The sources of radiation effecting a satellite. Figure is not to scale.
Furthermore, in LEO, the radiation bombardment will be increased while
transiting the South Atlantic Anomaly (SAA). Earth’s magnetic field experiences
a local, height-dependent dip within the SAA, due to an offset of the spin axis
from the magnetic axis. In this zone, a satellite and its electronics will experience
an increase of proton flux of up to 104times (energies >30 MeV) [14]. This flux
increase results in a rapid growth of bit errors and other upsets in a satellite’s
CDH. In case of MOVE-II, the full functionality of CDH-subsystem is required at
all time due to scientific measurements being conducted from one of the satellite’s
possible payloads, even though brief outages (e.g. reboots) are acceptable. This
scientific payload should measure the anti-proton flux within the SAA, as its
physical properties are subject of scientific debate.
Different storage technologies vary regarding the energy-threshold necessary
to induce an effect and the type of effect caused. The most important radiation
induced phenomena on memory are:
Single Event Effects (SEE), local ionization from protons or heavy ions
Total Ionizing Dose (TID), the cumulative effect of charge trapping in the
oxide of electronic devices
Displacement Damage due to structural displacement in crystalline compo-
nents of electronic hardware.
Other types of SEEs, the destructive ones being the most relevant, are well de-
scribed in [15]. Some novel memory technologies (e.g. MRAM [11], PCRAM [16])
have shown inherent radiation tolerance against bit-flips, Single Event Upsets
(SEUs), due to their data storage mechanism [17, 18].
Due to a shifting voltage threshold in floating gate cells caused by TID,
commercial flash memories are more susceptible to bit errors. Highly scaled flash
memories are also prone to SEUs causing shifts in the threshold voltage profile
of one or more storage cells, referred to as Multiple Bit Upset [19].
All these memory technologies are sensitive to Single Event Functional In-
terrupts (SEFIs) [20], which can affect blocks, banks or entire circuits due to
particle strikes in the peripheral circuitry.
4
3 Related Work and Preexisting File Systems
Filesystems often include performance optimizations like disk head tracking,
utilization of data locality and caching. However, most of these enhancements do
not apply to storage technologies used in spaceflight. In fact, such optimizations
add significant code overhead, possibly resulting in a more error prone FS and
may even reduce performance.
Next-generation FSs, e.g. BTRFS and ZFS, are designed to handle many-
terabyte sized devices and RAID-pools. Silent data corruption has become a
practical issue with such large volumes [21]. Thus, these FSs can maintain check-
sums for data blocks and metadata. Due to their intended use in large disk pools,
they do also offer integrated multi-device functionality.
Multi-device functionality would certainly be advantageous, but neither ZFS
nor BTRFS scale to small storage volumes. Minimum volume sizes are far beyond
what current nanosatellite CDHs can offer. Also, future development of these FSs
will eventually result in design decisions not in favor of spaceflight application.
FSs for flash devices, like the memory technology itself, have evolved con-
siderably over the past decade [22,23]. Upcoming FSs already handle challenges
like potentially negative compression rates [24] or erase/write-block abstraction,
offer proper wear leveling and interact with device EDAC functionality (check-
summing, spare handling and recovery). UFFS even offers integrity protection
for data and metadata using erasure codes.
Most new flash-FSs interact directly with memory 1, thereby are incompat-
ible with other memory technologies unless flash properties are emulated. This
introduces further IO and may result in unnecessary data loss, as flash memory
is of course block oriented.
RAM filesystems are usually optimized for throughput or simplicity, often
resulting in a relatively slim codebase. If designed for volatile RAM, these FS are
optimized for simplicity and do not necessarily require a nondestructive unmount
procedure. Non-volatile RAM FSs perform direct memory access to optimize for
throughput, other utilize compression to increase storage capacity [25].
Except for PRAMFS [26], none of these FSs consider memory protection to
increase dependability. PRAMFS offers execute-in-place (XIP) support [27] and
is POSIX-compatible, but offers no data integrity protection.
In contrast to flash memories RAM filesystems are not block based, but
benefit from the ability to access data arbitrarily. Thereby, no intermediate block
management is required and read-erase-update cycles are unnecessary. While
simple block-layer EDAC would certainly be possible, structures within a RAM
filesystem can be protected individually allowing for stronger protection.
Open source space engineering and CDH research is directed mainly
towards testing radiation related properties of memory technologies [20,28] and
on NAND-flash in particular [29, 30]. At the time of this writing, we are unaware
of advanced software-side non-flash driven storage concepts for space use.
1in the case of Linux through the memory technology device subsystem (MTD)
5
4 FTRFS
FTRFS (fault-tolerant radiation-robust filesystem for space use) operates ef-
ficiently with small volumes(4MB), but also scales to larger volumes and is
bootable.
Regarding the FS’s threat model, ECC is applied to all CPU-caches and
volatile SRAM, thus faults in these deceives are considered detectable and pos-
sibly correctable at runtime. A CPU running FTRFS must be equipped with a
memory management unit with its page-table residing in volatile memory. All
other elements (e.g. periphery and ALUs), other memories (e.g. registers and
buffers) and in-transit data are considered potential error sources, see Section 2.
Memory protection has been largely ignored in RAM-FS design. In part, this
can be attributed to a misconception of memory protection as a pure security-
measure against malware. However, for directly mapped nv-memory, memory
protection introduces the memory management unit as a safeguard against data
corruption due to upsets in the system [32]. Thus, only in-use memory pages
will be writable even from Kernel space, whereas the vast majority of memory
is kept read-only, protected from misdirected write access i.e. due to SEUs in a
register used for addressing during a store operation.
While data compression has been popular in size constrained FSs, it would
offer low compression rates, as well-compressible data, e.g. log data, will not be
kept in the same memory as the OS core components. Thus, it would offer little
capacity gains but entail severe code overhead.
After a detailed OS evaluation, we chose the Linux kernel as the base for our
FS due to its adaptability, extensive soft/hardware support and vast commu-
nity. We decided against utilizing RTEMS mainly due to our limited software
development manpower.
A loss of components has to be compensated at the software- or hardware
level through voting or simple redundancy. Multi-device capability was consid-
ered for this FS, however it should rather be implemented below the FS level
(e.g. via majority voting in hardware [33]) or as an overlay, e.g. RAIF [34].
The capability to detect and correct metadata and data errors was considered
crucial during development. Based on the mission duration, destination or the
orbit a spacecraft operates in, different levels of protection will be necessary. The
protective guarantees offered can be adjusted at format time or later through
the use of additional tools.
Our satellite’s CDH offers 32MB of ECC-SRAM and is driven by an ARM
Cortex-A5 CPU, however it could be upgraded to a Cortex-A7-MP. Due to
the relatively restricted system resources aboard a nanosatellite, cryptographic
checksums do not offer a significant benefit. Instead, CRC32 is utilized for per-
formance reasons in tandem with Reed-Solomon encoding (RS) [31].
4.1 Metadata Integrity Protection
For proper protection at the FS level, in addition to the stored filesystem objects
(inodes) and their data, all other metadata must be protected. Figure 2 depicts
6
Fig. 2. The basic layout of the presented FS. EDAC data is appended or prepended
to each FS structure. PSB and SSB refer to the primary and secondary super blocks.
the basic layout. Although similar to ext2 and PRAMFS [26], data addressing
and bad block handling work fundamentally different. We borrow memory pro-
tection from the wprotect component of PRAMFS, as well as the superblock and
inode layout. PRAMFS is licensed under GPLv2 and based upon ext2.
The Super Block (SB) is kept redundantly, as depicted in Figure 2. An
update to the SB always implies a refresh of the secondary SB, hence, hereafter
no explicit reference of the secondary SB will be made. The SB also contains
EDAC parameters for blocks, inodes and the bitmap.
The SB is the most critical structure within our FS, and is static after vol-
ume creation. Its content is copied to system memory at mount time, thus it is
sufficient to assure SB consistency the first time it is accessed.
As the SB contains critical FS information, we avoid accumulating errors
over time through scrubbing. Thereby, the CRC checksum is re-evaluated each
time certain filesystem API functions (e.g. directory traversal) are performed.
A block-usage bitmap is dynamically allocated based on the overhead
subtracted data-block count and is appended to the secondary SB. The bitmap
EDAC is also dynamically sized and must be stored beyond the compile-time
static SB, even though placing it there would be convenient. Thus, the protection
data is located in the first block after the end of the bitmap, see Figure 2. In
case the bitmap is extended, the new part of the bitmap is initialized and then
the error correction data is recomputed at its new location. We refrain from re-
computing and re-checking the EDAC data upon each access, instead FEC data
is checked before and updated after each relevant operation has been concluded.
Inodes are kept as an array. Their consistency is of paramount importance
as they define the logical structure of the filesystem. The array’s length is deter-
mined upon FS initialization and can change only if the volume is resized. As
each inode is an independent entity, an inode-table wide EDAC is unnecessary.
Instead, we extend and protect each inode individually.
4.2 Data Consistency and Organization
To optimize the FS towards both larger (e.g. a kernel image, a database) and
very small (e.g. scripts) files, direct and double indirect data addressing are
supported, as depicted in Figure 3. The FS selects automatically which method
is used. Data protection requirements vary depending on block size, and use
case. Thus FTRFS allows the user to adjust the protection strength for data
blocks, as will be described in the next section.
7
Fig. 3. Each inode can either utilize direct addressing or double indirection. Extended
attributes are always addressed directly.
Data block size cannot be arbitrarily decreased, as some Linux kernel sub-
systems assume them to be sized to a power of two. Instead, the FS internally
utilizes larger blocks to include EDAC data, see Figure 4.
Extended attributes (xattr ) are deduplicated and referenced by one or
more inodes, as depicted in Figure 3. Like in PRAMFS,xattrs are stored as data
blocks, thereby we can treat these identically to regular data.
Nanosatellites, at least the non-classified ones, are not yet considered secu-
rity critical devices. However, the application area of nanosatellites will expand
considerably in the future [3]. An increasing professionalization will introduce
enhanced requirements regarding dependability and security. Shared-satellite us-
age scenarios as well as technology testing satellites will certainly also require
stronger security measures, which can be implemented using xattrs.
An xattr block’s integrity is verified once its reference is resolved. Once all
write access (in bulk) has been concluded, the EDAC data is updated.
4.3 Algorithm Details and Performance
Our primary goal was to develop an FS which could be used to store a full size-
optimized Linux root FS including a kernel image safely over a long period of
time within an 8MB volume. There are numerous erasure codes available that
could be used to protect our FS. After careful consideration, RS was chosen
mainly due to the following reasons:
The algorithm is well analyzed, and widely used in various embedded sce-
narios, including spacecraft. Optimized software implementations, IP-cores
and hardware accelerations are available.
MRAM, while being SEU immune, is still prone to stray-writes, controller
errors and in-transit data corruption. RS relies upon symbol level error cor-
rection, which is precisely the kind of corruption the FS must correct. Misdi-
rected access within a page evades memory protection and corrupts the FS,
thus corrupted single-byte, 2, 4 and 8B runs will occur.
RS decoding is computationally expensive, thus the protected data is sub-
devided into sub-blocks sized to 128B plus the user specified error number
of correction-roots simplifying addressing and guaranteeing data alignment for
power-of-two correction-root counts. Inodes and SBs can be fit into one single
RS-code, while data block length does not result in extreme checking times. To
8
skip the expensive RS decoding step during regular operation, a CRC32 check-
sum allows high-performance checking. The RS-code is only read in case the
checksum is invalid.
Data blocks are divided into subblocks so the FS can make optimal use of
the RS code length. For common block-sizes and error correction strengths, 5
to 19 RS codes are necessary, see Table 1 for information on expected overhead.
The correction data is accumulated at the end of the data block. Checksums
across the entire block’s data, each subblock and the error correction data are
also retained. The resulting data format is depicted in Figure 4. Protection can
be enhanced further by performing symbol interleaving for the RS codes and the
block data, at the cost of performance.
FS traversal and data access will eventually slow down for strongly degraded
storage volumes. As we immediately commit corrected data to memory, perfor-
mance degradation is only temporary, assuming soft-faults.
4.4 Results and Current Status
FTRFS is currently undergoing testing and has been implemented for the Linux
kernel. Due to its POSIX-compliance, it could easily be ported to other plat-
forms. The memory protection functionality has been inherited from PRAMFS,
the FS structure from ext2. We utilize the RS implementation of the Linux
kernel, as its API also supports hardware acceleration.
Several components of the FS should undergo an optimization process, which
will result in a drastic performance increase. Even though we have not yet
conducted long-term benchmarking and performance analysis, the throughput
degradation during regular operations is minimal. Modern CPUs can compute
CRC32 within a few cycles due to hardware acceleration. We intend to publish
additional performance and energy consumption metrics, once testing has been
concluded and basic optimizations have been applied and the OBC computer
has been finalized.
Data is read and written once per access. It is good practice in critical scenar-
ios and especially spaceflight to read and write data multiple times, or deploy
Fig. 4. A data block subdivided into 5 subblocks. Separate checksums for the entire
data block, EDAC data and each subblock are depicted in red.
9
more advanced consistency checking techniques [35]. These changes could be
applied in bulk, through a macro, or compiler side.
The level of protection offered by the FS can be adjusted at format-time, or
later by using a proprietary FS-tuning tool. RS has a long record of space use
in CDH and communications. Thus, we know the algorithm offers efficient pro-
tection regarding our threat scenario. Once testing has been concluded, we will
perform long-term performance analysis in a degraded environment. To bench-
mark the FS, data degradation can be introduced using fault injection and we
will be performed these tests after optimizations have been applied. However,
artificial fault injection is usually not considered sufficient to prove the efficiency
of a fault-tolerance concept for space-use. Our satellite’s CDH computer includ-
ing the FS will – and in general a satellite has to – undergo testing using various
radiation sources before launch. Results will be made available once these tests
have been carried out.
4.5 Conceptional Limitations and Restrictions
It is debatable whether journaling would increase FTRFS’s reliability, as it usu-
ally helps safeguard FS consistency with slow storage media [36] due to power
loss or disconnect. However, all access in our FS happens synchronously, and
MRAM is only slightly slower than regular DRAM. Thus, journaling is currently
not implemented.
Loss of power can also happen in our spaceflight use case, but depending on
the event it can be handled differently. Spacecraft are battery backed and will
utilize on-PCB components providing relatively abundant hold-back time after
the electrical power subsystem (EPS) and the battery are disconnected due to
latch-up protection. The FS can thus either conclude a pending write operation
within the remaining active time, or the OS will have sufficient time to cancel
pending writes in case the system has sufficient warning time.
The FS can not protect itself from device or memory bank failure. However, as
MRAM access is deterministic, majority voting can be implemented in hardware
to compensate for device failure [33]. This would also further increase protection
against SEFIs, as upsets within one chip would be compensated by voting.
Data Correction Correction
Structure Size (B) Symbols/Code # Co des Total (B) Overhead (B) Overhead (%)
Super Block 128 32 1 32 68 53.13%
Inode 160 32 1 32 68 42.50%
Data Blocks 1024 4 5 20 68 5.86%
1024 16 5 80 188 17.58%
4096 4 17 68 212 4.98%
4096 16 19 304 692 16.70%
Bitmap 1773 32 10 320 688 38.80%
Table 1. EDAC overhead for FS structures. Bitmap: 16MB FS, 5% inodes, 1024B BS
10
If data is stored with RS-symbol interleaving, a XIP mapping would techni-
cally be impossible. XIP could still perform mappings for non-interleaved data
though, but thereby only the clear-text part of each RS code would be mapped
and read. Via this memory mapping, integrity protection for stored file data
would be ignored, unless we accept that a potential XIP mapping would allow
program code to be loaded/executed without any integrity checking. Thereby,
the integrity assumptions upon which FTRFS’s concept is based would be vio-
lated and integrity could not be guaranteed for any executed program stored on
the FS. Theoretically, data integrity could also be checked each time a mapping
is established for a block. To perform these checks however, this data would
have to be read in full, obsoleting the performance advantage and RAM con-
serving properties of XIP. XIP and FS-level data integrity protection can thus
be considered mutually exclusive.
5 Outlook and Future Work
Permanent defects will require FEC upon every access to an object. If such a
hard fault occurred in a frequently accessed object (e.g. the root inode or a
populated directory), we would want to avoid future re-checks. In the current
FS implementation, there is no functionality to avoid this behavior, however it
could be added later on.
Bad-block relocation is already implemented within the FS, but only used
during write, truncate and allocation operations, not during other access. The
only exception hereby is the root inode, which currently is assumed to be in a
fixed location, like in PRAMFS. This feature as well could be implemented in a
future version and would certainly increase storage reliability, performance and
reduce data degradation.
FTRFS could theoretically also operate on different memory technologies,
however, most of its advantages are enabled through RAM properties. Protection
at the FS layer would be rather complex, unwieldy and could still not offer proper
protection against device failure. Thus, the authors are working on a different
protective concept for flash memory.
In contrast to RAM, flash access times will vary depending on block integrity.
Thus, full voting based majority decisions would require very complex control
logic. If voting was conducted utilizing hardware-side flash controllers, a delayed
response from one controller would stall access to the entire voting circuit. Even
if the result has already been determined, the circuit would still be busy.
A transparent protective layer utilizing RAID1, FEC and checksuming could
however be implemented as an MTD middleware layer. MTD-striping [37] has
been proposed as a middleware function in the past, but has never been included
in the Linux Kernel. However, the existence of the MTD-striping code proofs the
feasibility of a mirroring and protection MTD-layer.
11
6 Conclusions
We presented a novel filesystem implementation enabling a software-side protec-
tive scheme against data degradation due to environmental effects introduced by
the space environment as described in Section 2. We have shown the feasibility
of a bootable, POSIX-compatible FS which can efficiently protect an OS image
from device failure and software flaws according to the threat model outlined at
the beginning of Section 4.
With respect to our use case in spaceflight, neither component level, nor
hardware- or software-side measures individually can guarantee sufficient system
consistency. Traditionally, radiation effects in space systems are compensated
for with stronger hardware-EDAC and component-redundancy, which do not
scale for complex systems and result in increased energy consumption. While
redundancy and hardware-side voting can protect well from device failure, data
integrity protection is difficult at this level. A combination of hardware and
software measures can thus drastically increase system dependability, even for
missions with a very long duration.
References
1. H Heidt et al. Cubesat: A new Generation of Picosatellite for Education and
Industry Low-Cost Space Experimentation. In 14. AIAA/USU Conference on
Small Satellites, Proc., 2000.
2. S Busch and K Schilling. UWE-3: A Modular System Design for the Next Gen-
eration of Very Small Satellites. In Proceedings of Small Satellites Systems and
Services–The 4S Symposium, Slovenia, 2012.
3. D Evans and M Merri. OPS-SAT: An ESA Nanosatellite for Accelerating Innova-
tion in Satellite Control. Spaceops, 2014.
4. C Bridges et al. Smartphone Qualification & Linux-based Tools for Cubesat Com-
puting Payloads. In Aerospace Conference, 2013 IEEE, pages 1–10. IEEE, 2013.
5. M Stringfellow, N Leveson, and B Owens. Safety-Driven Design for Software-
Intensive Aerospace and Automotive Systems. IEEE, Proc., 98(4):515–525, 2010.
6. K Ryu, E Shin, and V Mooney. A Comparison of Five Different Multiprocessor
SoC Bus Architectures. In Digital Systems Design, 2001. Proceedings. Euromicro
Symposium on, pages 202–209. IEEE, 2001.
7. D McComas. NASA/GSFC’s Flight Software Core Flight System. 2012.
8. J Williams and N Bergmann. Reconfigurable Linux for Spaceflight Applications.
Proc. Military and Aerospace Programmable Logic Devices (MAPLD 04), 2004.
9. D Atienza et al. Systematic Dynamic Memory Management Design Methodology
for Reduced Memory Footprint. ACM-TODAES, 11(2):465–489, 2006.
10. J Saleh, D Hastings, and D Newman. Weaving Time into System Architecture:
Satellite Cost per Operational Day and Optimal Design Lifetime. Acta Astronau-
tica, 54(6):413–431, 2004.
11. R Katti, H Stadler, and J Wu. High Speed Magneto-resistive Random Access
Memory, December 22 1992. US Patent 5,173,873.
12. S Bourdarie and M Xapsos. The Near-Earth Space Radiation Environment. IEEE
Trans. on Nuclear Science, 55:1810–1832, 2008.
12
13. M Xapsos, P O’Neill, and T O’Brien. Near-Earth Space Radiation Models. IEEE
Transactions on Nuclear Science, 60:1691–1705, June 2013.
14. J Schwank, M Shaneyfelt, and P Dodd. Radiation Hardness Assurance Testing of
Microelectronic Devices and Integrated Circuits. IEEE Transactions on Nuclear
Science, 60:2074–2100, June 2013.
15. ESA/ESTEC Requirements and Standards Division ECSS: Calculation of Radia-
tion and its Effects and Margin Policy Handbook. ECSS-E-HB-10-12A, 2010.
16. F Chen. Phase-Change Memory, February 26 2014. US Patent App. 14/191,016.
17. G Tsiligiannis et al. Testing a Commercial MRAM Under Neutron and Alpha
Radiation in Dynamic Mode. IEEE Trans. on Nuclear Science, 60, 2013.
18. J Maimon et al. Results of Radiation Effects on a Chalcogenide Non-Volatile
Memory array. In Aerospace Conference, 2004. Proceedings. 2004 IEEE, volume 4,
pages 2306–2315. IEEE, 2004.
19. S Gerardin et al. Radiation Effects in Flash Memories. IEEE Transactions on
Nuclear Science, 60:1953–1969, June 2013.
20. D Nguyen and F Irom. Radiation Effects on MRAM. In Radiation and Its Effects
on Components and Systems, pages 1–4. IEEE, 2007.
21. M Baker et al. A Fresh Look at the Reliability of Long-Term Digital Storage. In
ACM SIGOPS Operating Systems Review, volume 40, pages 221–234. ACM, 2006.
22. J Engel and R Mertens. LogFS – Finally a Scalable Flash File System. In 12th
International Linux System Technology Conference, 2005.
23. S Qiu and N Reddy. NVMFS: A Hybrid File System for Improving Random Write
in NAND-Flash SSD. In Mass Storage Systems and Technologies (MSST), 2013
IEEE 29th Symposium on, pages 1–5. IEEE, 2013.
24. W Liangzhu. The Investigation of JFFS2 Storage. In Microcomputer Information
8, 030, 2008.
25. N Edel et al. MRAMFS: a compressing file system for non-volatile RAM In MAS-
COTS 2004. Proceedings. The IEEE Computer Society’s 12th Annual International
Symposium on., IEEE, 2004.
26. M Stornelli. Protected and Persistent RAM Filesystem. pramfs.sourceforge.net.
27. J Hulbert. The Advanced XIP File System. In Linux Symposium, page 211, 2008.
28. M Elghefari et al. Radiation Effects Assessment of MRAM Devices. 2008.
29. M Cassel et al. NAND-Flash Memory Technology in Mass Memory Systems for
Space Applications. In DASIA 2008, volume 665, page 25, 2008.
30. H Herpel et al. Next Generation Mass Memory Architecture. In DASIA 2010.
31. SB Wicker et al. Reed-Solomon Codes and their Applications. Wiley & Sons, 1999.
32. S Suzuki and K Shin. On Memory Protection in Real-Time OS for Small Embedded
Systems. In Real-Time Computing Systems and Applications, 1997. Proceedings.,
Fourth International Workshop on, pages 51–58. IEEE, 1997.
33. S Su et al. A Hardware Redundancy Reconfiguration Scheme for Tolerating Mul-
tiple Module Failures. Computers, IEEE Transactions on, 100(3):254–258, 1980.
34. N Joukov et al. Raif: Redundant Array of Independent Filesystems. In Mass
Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE, pages 199–214.
35. B Cagno et al. Verifying Data Integrity of a Non-Volatile Memory System during
Data Caching Process. US Patent 8,037,380.
36. V Prabhakaran, A Arpaci-Dusseau, and R Arpaci-Dusseau. Analysis and Evolution
of Journaling File Systems. In USENIX Annual Technical Conference, General
Track, pages 105–120, 2005.
37. A Belyakov. Linux-MTD Striping Middle Layer. Linux-MTD mailing list, 03 2006.
... EDAC and device independence could also be provided by an FS directly as has already been shown for MRAM in [12]. A Flash FS such as UFFS could be extended to handle multiple memory devices and EDAC, or FTRFS [12] could be modified to handle flash memory. ...
... EDAC and device independence could also be provided by an FS directly as has already been shown for MRAM in [12]. A Flash FS such as UFFS could be extended to handle multiple memory devices and EDAC, or FTRFS [12] could be modified to handle flash memory. Even though possible to implement, such an all-in-one FS would be complex and error prone. ...
... EDAC and device independence could also be provided by an FS directly as has already been shown for MRAM in [12]. A Flash FS such as UFFS could be extended to handle multiple memory devices and EDAC, or FTRFS [12] could be modified to handle flash memory. ...
... EDAC and device independence could also be provided by an FS directly as has already been shown for MRAM in [12]. A Flash FS such as UFFS could be extended to handle multiple memory devices and EDAC, or FTRFS [12] could be modified to handle flash memory. Even though possible to implement, such an all-in-one FS would be complex and error prone. ...
Conference Paper
Full-text available
Future spacemissions will require vast amounts of data to be stored and processed aboard spacecraft. While satisfying operational mission requirements, storage systems must guarantee data integrity and recover damaged data throughout the mission. NAND-flash memories have become popular for space-borne high performance mass memory scenarios, though future storage concepts will rely upon highly scaled flash or other memory technologies. With modern flash memory, single bit erasure coding and RAID based concepts are insufficient. Thus, a fully run-time configurable, high performance, dependable storage concept, requiring a minimal set of logic or software. The solution is based on composite erasure coding and can be adjusted for altered mission duration or changing environmental conditions.
... We describe the implementation of FTRFS, a fault-tolerant radiation-robust filesystem for space use. It was published [Fuchs18] in the proceedings of the International Conference on Architecture of Computing Systems (ARCS). Furthermore, a protective concept for flash memory and phase change memory is described in the second part of this chapter. ...
Thesis
Full-text available
Miniaturized satellites enable a variety space missions which were in the past infeasible, impractical or uneconomical with traditionally-designed heavier spacecraft. Especially CubeSats can be launched and manufactured rapidly at low cost from commercial components, even in academic environments. However, due to their low reliability and brief lifetime, they are usually not considered suitable for life- and safety-critical services, complex multi-phased solar-system-exploration missions, and missions with a longer duration. Commercial electronics are key to satellite miniaturization, but also responsible for their low reliability: Until 2019, there existed no reliable or fault-tolerant computer architectures suitable for very small satellites. To overcome this deficit, a novel on-board-computer architecture is described in this thesis. Robustness is assured without resorting to radiation hardening, but through software measures implemented within a robust-by-design multiprocessor-system-on-chip. This fault-tolerant architecture is component-wise simple and can dynamically adapt to changing performance requirements throughout a mission. It can support graceful aging by exploiting FPGA-reconfiguration and mixed-criticality. Experimentally, we achieve 1.94W power consumption at 300Mhz with a Xilinx Kintex Ultrascale+ proof-of-concept, which is well within the powerbudget range of current 2U CubeSats. To our knowledge, this is the first COTS-based, reproducible on-board-computer architecture that can offer strong fault coverage even for small CubeSats.
... Without further measures, they are still susceptible to misdirected read-or write access, and SEFIs. We showed in [37] that these issues can be mitigated in software, through ECC, and redundancy. We also showed that this can be achieved with minimal overhead through the use of a bootable file-system with Reed-Solomon erasure coding. ...
Conference Paper
Full-text available
In this contribution, we present a CubeSat-compatible on-board computer (OBC) architecture that offers strong fault tolerance to enable the use of such spacecraft in critical and long-term missions. We describe in detail the design of our OBC's breadboard setup, and document its composition from the component-level, all the way down to the software level. Fault tolerance in this OBC is achieved without resorting to radiation hardening, just intelligent through software. The OBC ages graceful, and makes use of FPGA-reconfiguration and mixed criticality. It can dynamically adapt to changing performance requirements throughout a space mission. We developed a proof-of-concept with several Xilinx Ultrascale and Ultrascale+ FPGAs. With the smallest Kintex Ultrascale+ KU3P device, we achieve 1.94W total power consumption at 300Mhz, well within the power budget range of current 2U CubeSats. To our knowledge, this is the first scalable and COTS-based, widely reproducible OBC solution which can offer strong fault coverage even for small CubeSats. To reproduce this OBC architecture, no custom-written, proprietary, or protected IP is needed, and the needed design tools are available free-of-charge to academics. All COTS components required to construct this architecture can be purchased on the open market, and are affordable even for academic and scientific CubeSat developers.
... Within the MOVE-II satellite project[8]of the Technical University of Munich, several novel concepts are investigated. A fault-tolerant, radiation-robust filesystem[9], autonomous Chip Level debugging[10], dependable data storage on miniaturized satellites[11], a novel communication protocol for miniaturized satellites[12]and MicroPython as Application Layer Programming Language[13]have been studied so far. Results of a statistical analysis of 178 launched CubeSats[14]show that many CubeSats fail due to insufficient testing in flight configuration. ...
Conference Paper
Full-text available
Software development for space applications is characterized by historically grown structures and conservative methods derived from traditional project management. Many of these methods are not easily transferable from normal product development to software development. Project risk is high and delays are the rule due to the many uncertainties regarding the planned cost and time budget, possible requirement changes in later project phases as well as unforeseeable complications. Furthermore, these methods have very limited flexibility and come with highly time-consuming planning, implementation and, if necessary, problem solving. Agile software development does not require that all requirements are known and well-defined at the beginning of the project. The development is incremental and generates a usable and testable software product with every new iteration. This makes the development more flexible and problems can be detected earlier and solved with less effort. Due to the frequent integration into the existing system, a close collaboration is possible across subsystems as well as the customer or the project partners. This increased flexibility and improved cooperation reduces project risk, cost and time until delivery. This paper shows the application of agile software development in the space sector applied to a CubeSat project. Within the student satellite project Munich Orbital Verification Experiment II (MOVE-II) at the Technical University of Munich (TUM) the concept of agile software development was successfully applied to develop the software of the on-board computer within a few months. The agile methods presented in this article demonstrate software development that does not require the final requirements at the beginning of the development process. These methods allow that a new version of the software can be tested and operated after every iteration of the process. The launch of our CubeSat MOVE-II is scheduled for early 2018.
... Within the MOVE-II satellite project [5] of the Technical University of Munich, several novel computing concepts are investigated. Amongst others, this includes a fault-tolerant, radiationrobust filesystem [6], autonomous Chip Level debugging [7], dependable data storage on miniaturized satellites [8] and a novel communication protocol for miniaturized satellites [9]. This paper will investigate the ongoing research on using MicroPython [10] as application layer programming language on CubeSats. ...
Conference Paper
Full-text available
Since the dawn of the space age, software has always been a critical aspect for any space mission launched. Over the decades, more complexity, autonomy and functionality was added to both unmanned and manned missions, yielding in an exponential growth of the lines of codes used in space projects over the years. Although a lot of effort was put into ensuring reliable software on those missions, some of them failed. Still, as the space industry is a risk-averse business, testing of novel approaches in space programs cannot be done on large scale. To overcome this limitation, this paper investigates the potential use of MicroPython, an implementation of Python for constrained systems, for use on CubeSats by analyzing the language and tools in practical examples from the MOVE-II CubeSat project.
... Um dies zu erreichen, wurden innerhalb des Projekts drei verschiedene, softwarebasierte Konzepte zur Sicherung der Datenkonsistenz entwickelt. Hierdurch kann sowohl volatiler Speicher abgesichert [26], als auch die Funktionsfähigkeit des Betriebssystems sichergestellt werden [27]. Als dauerhafter Speicher soll hierbei MRAM [28] aufgrund der guten Toleranz gegen Effekte hochenergetischer Strahlung zum Einsatz kommen. ...
... Due to its POSIX-compliance, the FS could easily be ported to other platforms. An in-depth description and analysis of FTRFS has been published as part of the computer science conference ARCS2015 proceedings [51]. While the FS has been tested and logically validated, the code should be optimized which will result in a drastic performance increase. ...
Conference Paper
Full-text available
We present storage integrity concepts developed for the CubeSat MOVE-II over the past two years, enabling dependable computing without relying solely upon hardened special purpose hardware. Neither component level, nor hardware-or software-side measures individually can guarantee sufficient system consistency with modern highly scaled components. Instead, a combination of hardware and software measures can drastically increase system dependability, even for missions with a very long duration. Dependability in the most basic sense can only be assured if program code and required supplementary data can be stored consistently and reliably aboard a spacecraft. Thus, to enable any form of meaningful dependable computing, storage integrity must be assured first and foremost. We present three software-driven concepts to assure storage consistency, each specifically designed towards protecting key components: a system for volatile memory protection, the filesystem FTRFS to protect system software, and MTD-mirror to safeguard payload data. All described solutions can be applied to on-board computer design in general and do not require systems to be specifically designed for them. Hence, simplicity can be maintained, error sources minimized, testability enhanced, and survival rates of miniaturized satellite increased drastically.
Conference Paper
Full-text available
A common rootfs option for Linux mobile phones is the XIP-modified CramFS which, because of its ability to eXecute-In-Place, can save lots of RAM, but requires extra Flash memory. Another option, SquashFS, saves Flash by compressing files but requires more RAM, or it delivers lower performance. By combining the best attributes of both with some original ideas, we've created a compelling new option in the Advanced XIP File System (AXFS). This paper will discuss the architecture of AXFS. It will also review benchmark results that show how AXFS can make Linux-based mobile devices cheaper, faster, and less power-hungry. Finally, it will explore how the smallest and largest of Linux systems benefit from the changes made in the kernel for AXFS.
Conference Paper
Full-text available
Modern computers are now far in advance of satellite systems and leveraging of these technologies for space applications could lead to cheaper and more capable spacecraft. Together with NASA AMES's PhoneSat, the STRaND-1 nanosatellite team has been developing and designing new ways to include smart-phone technologies to the popular CubeSat platform whilst mitigating numerous risks. Surrey Space Centre (SSC) and Surrey Satellite Technology Ltd. (SSTL) have led in qualifying state-of-the-art COTS technologies and capabilities - contributing to numerous low-cost satellite missions. The focus of this paper is to answer if 1) modern smart-phone software is compatible for fast and low-cost development as required by CubeSats, and 2) if the components utilised are robust to the space environment. The STRaND-1 smart-phone payload software explored in this paper is united using various open-source Linux tools and generic interfaces found in terrestrial systems. A major result from our developments is that many existing software and hardware processes are more than sufficient to provide autonomous and operational payload object-to-object and file-based management solutions. The paper will provide methodologies on the software chains and tools used for the STRaND-1 smartphone computing platform, the hardware built with space qualification results (thermal, thermal vacuum, and TID radiation), and how they can be implemented in future missions.
Conference Paper
In this paper, we design a storage system consisting of Nonvolatile DIMMs (as NVRAM) and NAND-flash SSD. We propose a file system NVMFS to exploit the unique characteristics of these devices which simplifies and speeds up file system operations. We use the higher performance NVRAM as both a cache and permanent space for data. Hot data can be permanently stored on NVRAM without writing back to SSD, while relatively cold data can be temporarily cached by NVRAM with another copy on SSD. We also reduce the erase overhead of SSD by reorganizing writes on NVRAM before flushing to SSD. We have implemented a prototype NVMFS within a Linux Kernel and compared with several modern file systems such as ext3, btrfs and NILFS2. We also compared with another hybrid file system Conquest, which originally was designed for NVRAM and HDD. The experimental results show that NVMFS improves IO throughput by an average of 98.9% when segment cleaning is not active, while improves throughput by an average of 19.6% under high disk utilization (over 85%) compared to other file systems. We also show that our file system can reduce the erase operations and overheads at SSD.
Article
This document describes the radiation environments, physical mechanisms, and test philosophies that underpin radiation hardness assurance test methodologies. The natural space radiation environment is presented, including the contributions of both trapped and transient particles. The effects of shielding on radiation environments are briefly discussed. Laboratory radiation sources used to simulate radiation environments are covered, including how to choose appropriate sources to mimic environments of interest. The fundamental interactions of radiation with materials via direct and indirect ionization are summarized. Some general test considerations are covered, followed by in-depth discussions of physical mechanisms and issues for total dose and single-event effects testing. The purpose of this document is to describe why the test protocols we use are constructed the way they are. In other words, to answer the question: “Why do we test it that way”?
Article
Academic and industrial research interest in terrestrial radiation effects of electronic devices has expanded over the last years from avionics and military applications to commercial applications as well. At the same time, the need for faster and more reliable memories has given growth to new memory technologies such as Magnetic (magneto-resistive) Random Access Memories (MRAM), a promising new non-volatile memory technology that will probably replace in the future the current SRAM and FLASH based memories. In this paper, we evaluate the soft error resilience of a commercial toggle MRAM in static and dynamic test mode, under neutron radiation with energies of 25, 50 and 80 MeV as well as under a Californium (Cf-252) alpha source.
Article
This document gives detailed test guidelines for single-event upset (SEU), single-event latchup (SEL), single-event burnout (SEB), and single-event gate rupture (SEGR) hardness assurance testing. It includes guidelines for both heavy-ion and proton environments. The guidelines are based on many years of testing at remote site facilities and our present understanding of the mechanisms for single-event effects.
Article
We review ionizing radiation effects in Flash memories, the current dominant technology in the commercial non-volatile memory market. A comprehensive discussion of total dose and single event effects results is presented, concerning both floating gate cells and peripheral circuitry. The latest developments, including new findings on the mechanism underlying upsets due to heavy ions and destructive events, are illustrated.
Article
Review of models of the near-Earth space radiation environment is presented, including recent developments in trapped proton and electron, galactic cosmic ray and solar particle event models geared toward spacecraft electronics applications.