A Low Power 16-bit RISC with Lossless Compression Accelerator for Body Sensor Network System
ABSTRACT A low power 16-bit RISC is proposed for body sensor network system. The proposed IPEEP scheme provides zero overhead for the wakeup operation. The lossless compression accelerator is embedded in the RISC to support the low energy data compression. The accelerator consists of 16times16-bit storage array which has vertical and horizontal access path. By using the accelerator the energy consumption of the lossless compression operation is reduced by 93.8%. The RISC is implemented by 1-poly 6-metal 0.18 um CMOS technology with 16 k gates. It operates at 4 MHz and consumes 24.2 uW at 0.6 V supply voltage.
-
Citations (0)
-
Cited In (0)
Page 1
A Low Power 16-bit RISC with Lossless Compression
Accelerator for Body Sensor Network System
Hyejung Kim, Sungdae Choi and Hoi-Jun Yoo
Semiconductor System Laboratory
Department of Electrical Engineering and Computer Science, KAIST
Daejeon, Korea
seeseah@eeinfo.kaist.ac.kr
Abstract—A low power 16-bit RISC is proposed for body
sensor network system. The proposed IPEEP scheme provides
zero overhead for the wakeup operation. The lossless
compression accelerator is embedded in the RISC to support
the low energy data compression. The accelerator consists of
16x16-bit storage array which has vertical and horizontal
access path. By using the accelerator the energy consumption
of the lossless compression operation is reduced by 93.8%. The
RISC is implemented by 1-poly 6-metal 0.18um CMOS
technology with 16k gates. It operates at 4MHz and consumes
24.2uW at 0.6V supply voltage.
I.
INTRODUCTION
Recently, in according with the interests about the
healthcare increase, people desire to check their vital signal
or health condition at anytime and anywhere. To solve this
request, the wireless sensor network (WSN) system has been
studied to apply for the body, which called body sensor
network (BSN) system [1]. The BSN offers health condition
monitoring, vital signal collecting, collected data analysis
and diagnostics. To provide these various operations BSN
should consists of lots of sensors, the data processing unit
and the efficient network system, and various BSN systems
have been proposed [1]-[6].
BSN system requires the ultra low energy operation for
stable long time operation and the small footprint for
wearability. To achieve these requirements, the size of
hardware components such as the power source, processing
block, and data storing memory have to be small. They also
have limited power supply, bandwidth for communication,
processing speed, and memory space. Various researches
have been conducted so far focuses on how to achieve the
maximum utilization of limited source. The data
compression is one of the most effective methods [7]. Since
the transmitting power consumption is much more than the
data processing power, minimizing data size before
transmitting can reduce total system power consumption.
Moreover, the low power hardware design and the efficient
algorithms are also important for limited resource operation.
This paper presents a low power RISC with two
proposed schemes for low energy BSN system. The first one
is a zero overhead wakeup scheme and the other is the
efficient compression algorithms for bio signals. We verify
low energy operation of the proposed RISC by fabrication of
real silicon.
II.
ARCHITECUTRE OF THE 16-BIT RISC
A. Architecture of The Base Station
The star topology network system which consists of one
base station for master node and many sensor nodes for slave
node is selected for BSN system. The base station is
designed to manage various sensor nodes, receive and
analyze sensor data, and execute the prescribed user program
with lowest power consumption [1]. The base station
contains the schedule director (SD), 16-bit general purposed
RISC, 3 kinds of the memory and the radio block shown in
figure 1. The SD manages the sending/receiving the packets
to/from the maximum 254 sensor nodes and wakes up the
RISC when complex jobs are requested [1]. The radio block
is for wireless communication with the distributed sensor
nodes [8]. The RISC is general purpose processor which
executes the system initialization, the data compression/
decompression operation or user program. The code memory
(CM) stores the system initialization and user programs, the
temporary memory (TM) stores the raw data from the sensor
nodes and the data memory (DM) stores the processed/
compressed data. If the data comes from the sensor nodes,
SD receives the data from radio block stores the raw data to
TM and wakes up the RISC. The RISC executes the
compression program, then the stores the compressed data to
DM.
Schedule
Director
(SD)
16-bit
RISC
System Bus
Radio
12
Temp.
Memory
(TM)
Code
Memory
(CM)
Data
Memory
(DM)
packet
wakeupExt.PC
Base Station
Figure 1. The Block Diagram of Base Station
0-7803-9735-5/06/$20.00 ©2006 IEEE207
7-2
Page 2
B. 16-Bit RISC Architecture
The RISC is designed based on a basic 3-stage pipeline
of Harvard architecture which is optimal selection for low
power operation [3]. The figure 2 shows the pipeline flow
diagram of the proposed 16-bit RISC. The first stage fetches
the instruction from code memory, and the second stage
decodes the fetched instruction. The last stage executes ALU
operations, memory access, and write-back to the register file.
Since the both operations of read and write the register file
occurs in the same stage, the data hazard is eliminated. The
branch is performed with a 2-cycle penalty.
The RISC is event-driven computation with sleep and
wakeup mode. When the wakeup signal is accessed, the
PCGen block in the fetch stage generates the suitable PC
value. The detailed algorithm will be described in section III.
The RISC has 2 kinds of register files which are the 16
general register files and the compression accelerator for the
proposed lossless compression algorithm described in
section IV. The bitwise XOR block is also implemented for
the proposed compression operation.
The processor implements 16-bit Instruction Set
Architecture (ISA). The RISC has 30 instructions, and some
special instructions are proposed for the sleep mode and the
compression algorithm.
RISC
ALU / Shifter / Mul
C
AB
Decode
Code
Memory
iData[15:0]
PC
iR
Data In/Out
Register
Data Address
Generator
Data
/ Temp.
Memory
dR
dW
dAddr
dRData
dWData
16 0
16 x 16-bit
GPR
R0
R1
R15
R14
160
R0
R1
R15
R14
AB
C
16 x16-bit
Compression
Accelerator
Scheduler
Director
wakeup
Ext. PC
reset
128kB
512kB
/128kB
Control signal out
Instruction
reset
wakeup
Fetched Instruction
PC Gen.
Int. PC
Ext. PC
PC
16
16
16
16
16
16
16
16Fetch
Decode
Execute
XOR
Figure 2. Top Architecture of the Processor
III. INSTANTANEOUS PROGRAM EXECUTION WITH
EXTERNAL PROGRAM COUNTER (IPEEP)
The RISC sleeps in ordinary times by clock gating until
the events occur. The two start modes exist in the proposed
IPEEP scheme – reset mode and wakeup mode. The figure 3
shows the proposed IPEEP scheme. The reset mode
performed by reset signal begins zero address for system
initialization or boot-up operation. The wakeup mode is
performed when external events occur. If the event occurs,
the SD sends an active clock with wakeup signal and
external PC value. Then the RISC wakes up to operate from
specified PC value.
PC Generation
Int. PC
Gen.
/
PC
/
12
Ext. PC
/
wakeup
0x0000
0x0200
..
0x0500
..
0x1000
..
..
..
..
Instr.
/
16
Core
RISCCode Memory
Schedule
Director
...
User
Request
Go To Sleep Mode
Control
12
/
reset
RISC clk
sleep
Operation
A
System
Initialization
Operation
B
Operation
C
...
...
12
12
PC 1
PC 2
PC n
...
(a)
clk
reset
wakeup
Ext.PC
PC
000001 0021fe1ff
500
500501 502
System Initialization
5ff
Oepration B
clock gating
(b)
Figure 3. Program Execution with IPEEP Scheme (a) Block Diagram of
IPEEP scheme (b) The Waveform by IPEEP operations
With IPEEP scheme the RISC doesn’t need ISR (Interrupt
Service Routine), so that it reduces the wakeup operation
time and power consumption. After beginning of both
starting mode, RISC operates in normal mode by internal
generated PC value. The each program has sleep instructions
at the end of the block. After all the arranged program codes
are processed, the RISC sends a sleep command to SD.
When receiving the sleep command, the SD gates the RISC
clock and the RISC returns to its sleep mode.
IV. PROPOSED LOSSLESS COMPRESSION ALGORITHM
The sensor nodes gather the bio signal data, process the
collected data, and transfer the processed data to base station.
Since processing power consumption is much less than
wireless transmitting power, so it is necessary to minimize
data size before transmitting due to reduce system total
power consumption. Therefore, it is necessary to employ a
data compression algorithm for body sensor network system.
There are some limitations to apply the compression
algorithm for BSN system. First, since the sensor node
currently has limited resources such as battery energy, CPU
performance and the memory capacity, the algorithm size
must be as small as possible. The second one is the operating
frequency. Usually the processor of sensor node operates
only under 4MHz, therefore it is necessary to design a low
complexity algorithm which is enough to operate at low
frequency. The proposed compression algorithm is target to
small code size ant low energy operation. The hardwired
accelerator and the special instructions are designed to
reduce the operating cycles and the code size.
The proposed compression algorithm has three features.
First, the proposed algorithm offers the lossless compression.
The loss compression algorithm is efficient to minimize data
size, but all loss algorithms have some degree of quantization
208
Page 3
error, resulting in a possible loss of diagnostic information.
Thus, the lossless compression algorithm is studied to
preserve all the information of original data. Second, the
proposed algorithm is optimized to continuous signal data
because most of bio signals consist of continuous value. The
last one is various precision coding. The each kind of sensor
data has various precision. The leading zeros of data which
length is under 16-bit are redundant. The proposed algorithm
reduces these leading zeros simply so that it is suitable for
the all kinds of various precision data compression.
TMLossless Compression Accelerator
d0 (h0) : (1)
0 (2)
d1-d0 (3)
d2-d1
h1[15:0]
DM
h1[i] != 0
yes (8)
(4)
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
d11
d12
d13
d14
d15
(6)
1 block (16-bit x 16 -word)
(7)
d3-d2
d4-d3
d5-d4
d6-d5
d7-d6
d8-d7
d9-d8
d10-d9
d11-d10
d12-d11
d13-d12
d14-d13
d15-d14
h0
h1
c1
c2
c3
c4
c5
c6
c7
c8
c9
150150
d[i]
d[i]'
(5)
15 14 13 12 11 10 9876543210
15 14 13 12 11 10 9876543210
MSB LSB
Figure 4. The Lossless Compression Algorithm
(a) Original Data
...
(b) After Subtraction
...
(c) After XOR
...
...
(d) After Column Access
c[0]
c[1]
c[2]
c[3]
...
c[15]
c[15]
0 0 1 1 1 0 1 1 1 1 0 1 0 1 1 0
0 0 1 1 1 0 1 1 1 1 0 1 0 1 1 0
0 0 1 1 1 0 1 1 0 1 1 1 1 1 0 0
0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1
0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0
0 0
1
0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0
0
0
0
0 0
0 0 0 0 0 0 0 0
0 1 1 1 0 1 0 1
0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0
1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
...
0 0 0 0
0 0 0 0
0 0 0 0
0
0
0
00 0 00 0
0
1501141312234567891011
1501141312234567891011
c[0]c[1]c[2]
0 0
0 0
1 1
0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1
0
0
1
0 0
0 0 0 0 0 0 0 0
1 0 1 0 0 1 1 0
0 0 1 1 1 1 1 1 1 0 1 1 0 1 0 0
0 0 0 0
0 0 0 0
1 1 1 1
0
0
1
00 0 00 0
15011413122345678910 11
...
0 0
0 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0
0
0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 0
1 0 0 0
0 0 0 0
0 0 0 0
0
0
0
01 1 00 0
0
1501141312234567891011
STR
x
x
x
STR
d[0]
d[1]
d[2]
d[3]
d[15]
...
d[0]
d[1]
d[2]
d[3]
d[15]
...
h1
d[0]
d[1]
d[2]
d[3]
d[15]
...
15 14 13
...
1 0
Figure 5. Example of Proposed Compression Algorithm (a) Original Data
(b) Data After Wordwise Subtraciton (c) Data After Bitwise XOR
Operation (d) Data After Column Access
A. Lossless Compression Algorithm
The figure 4 shows the block diagram of proposed
compression algorithm and the figure 5 shows a coding
example. The detail explanation of algorithm is following.
Step1. Load the first row data (d0) from TM, and store the
data to DM. The d0 data is the first header data (h0).
Step2. Move the zero data to the first row of accelerator.
Step3. Load the second row data (d1) of TM, operate d1-d0
on general purpose register, and move the result to the
second row of accelerator.
Step4. Load the next row data (d2), and repeat step3 until the
1 block data is finished. The 1 block data consist of the 16
rows of the 16-bit data (d0-d15).
Step5. Operate XOR each bit with left-bit, and replace the
row data by the results.
Step6. The second header data (h1) is made automatically by
dedicated hardware. If there is non-zero-bit in the i-th
column of the accelerator, the i-th bit of the h1 (h1[i]) will be
one. Otherwise, h1[i] will be zero.
Step7. After 16-bit header data is complete, store the second
header data (h1) to DM.
Step8. Store the i-th column data to DM if h1[i] is non-zero.
Step9. Repeat step1-step8 until TM is empty.
B. Lossless Data Compression Accelerator
The data compression accelerator is embedded in the
RISC to support the low energy data compression/
decompression operation and it consists of 16x16 storage
array shown in figure 6. Its vertical and horizontal
accessibility reduces the required execution cycle to 95%
compared with conventional RISC operation. The interface
of accelerator is same as the register file, so all data
transition takes 1 cycle by new instruction for accelerator. In
addition, it can be used as a general purpose registers when
the compression program is not executed.
WL0
WL1
WL15
WL16 WL17WL31
...
...
WL14
...
...
R15
R14
R1
R0
W15 W14
W0
...
...
Write Word
Write Word#
Read Word
Read Word#
Read Data
Write Data
Read DataRead Word Read Word#
ML
PCG#
Figure 6. Lossless Compression Accelerator
The proposed algorithm compresses the 1 block data
(16x16-bit) into 2 headers and the compressed data and it
takes up to 114-cycle. It is recalculated to 1.425-
instructions/bit. Figure 7 shows comparison result of the
number of instruction required by removing a single bit. By
using the proposed algorithm, the performance is improved
by maximum 83 times compare to other conventional
algorithms [7].
209
Page 4
Bzip2PPMdzlibCompress LzoThis Work
Instruction / Bit
0
2
4
6
8
10
80
116
76
74
10
7
1.4
x83
x53
x5
120
Figure 7. Required Instruction per Removed A Bit
V.
RESULTS
With the proposed architecture, the RISC is implemented
into a chip by using 1-poly 6-metal 0.18um CMOS
technology. A chip photograph is shown in figure 8. The
RISC size is 400um x 1000um with compression accelerator.
The memory size of code memory, data memory and
temporary memory is 128kb, 512kb and 128kb, respectively.
The RISC operates maximum frequency of 200MHz with
1.8V and 22MHz with 0.6V. The RISC generally operates at
4MHz with 0.6V supply voltage for low energy and reliable
operation. The power consumption is 24.2uW with 4MHz at
0.6V supply voltage.
The ECG records data from the MIT/BIH [9] are used to
verify the proposed compression algorithm. The sampling
rate and the resolution are 360 samples/s and 12 bits,
respectively. The figure 9(a) shows the ECG record
waveform and the figure 9(b) shows the simulation result of
compression operation. The data memory write enable signal
goes high if the column data are not zero. The compression
rate is good if the number of enable signal is small. The
compression rate is 25% and 56.3% at steep and slow slop,
respectively. The better compression rate is obtained with
stable signals. The average compression rate is 38.7% for 10
sec amount data. It consumes only 641us and 0.69nJ for
compression ECG data of 1 sec amount with 4MHz
operation frequency. Table-I shows that the energy
consumption of the accelerator is much smaller than that of
the conventional low-power processors [4], [5], [6] when
16x16bit data compression is executed. The proposed
compression algorithm can operate sufficiently real time
lossless compress/decompress with ultra low energy.
Figure 8. Chip Photograph
TABLE I.
ENERGY COMSUMPTION FOR 16X16B DATA COMPRESSION
[4]
1.0V
500kHz
32.4nJ
[5]
0.66V
4MHz
64.8nJ
[6]
0.23V
833kHz
11.2nJ
Proposed
0.6V
4MHz
0.69nJ
VDD
Clock
Energy
0 1 2 3 4 5 6 7 8 9 101112131415h0 h1
0 1 2 3 4 5 6 7 8 9 101112131415h0 h1
(b1) 25%
(b2) 56.3%
(a) ECG record data(b) Memory Write Enable of RISC
(sec)
Figure 9. Simulation Result of Proposed Compression Algorithm
VI. CONCLUSTION
A 16-bit RISC of base station is proposed for body
sensor network system. The RISC is designed based on a
basic 3-stage pipeline of Harvard architecture and two major
schemes for low energy operation. The IPEEP scheme
provides zero overhead at the wakeup operation. The lossless
compression accelerator is embedded for low energy data
compression. By using accelerator which consists of 16x16-
bit storage array with vertical and horizontal access path, the
energy consumption of the lossless compression operation is
reduced by 83 times. The RISC is implemented by 1-poly 6-
metal 0.18um CMOS technology with 16k gates. It operates
at 4MHz and consumes 24.2uW at 0.6V supply voltage. The
evaluation results clearly indicate the proposed RISC is
suitable for the body sensor network system.
REFERENCES
[1] Sungdae Choi, Seong-Jun Song, Kyomin Sohn, Hyejung Kim,
Jooyoung Kim, Namjun Cho, Jeong-Ho Woo, Jerald Yoo and Hoi-
Jun Yoo, “A Multi-Nodes Human Body Communication Sensor
Network Control Processor,” Proc. CICC, Sep.2006.
[2] Benton H.Calhoun, Denis C.Daly, Naveen Verma, Daniel
F.Finchelstein, David D.Wentzloff, Alice Wang, Seong-Hwan Cho,
and Anantha P.Chandrakasan, “Design Considerations for Ultra-Low
Energy Wireless Microsensor Nodes.” IEE Tran.Computer, vol.54,
no.6, pp.727-740, Jun.2005.
[3] Mark Hempstead, Nikhil Tripathi, Patrick Mauro, Gu-Yeon Wei,
David Brooks, “An Ultra Low Power System Architecture for Sensor
Network Applications,” Proc. ISCA 2005.
[4] Brett A.Warneke, Kristofer S.J.Pister, “An Ultra-Low Energy
Microcontroller for SmartDust
Proc.ISSCC, Feb.2004.
[5] Virantha N.Ekanayake, Clinton Kelly, IV, and Rajit Manohar,
“BitSNAP: Dynamic Significance Compression For a Low-Energy
Sensor Network Asynchronous Processor,” Proc. ASYNC, pp.144-
154, Mar.2005.
[6] Leyla Nazhandali, et al, “A Second-Generation Sensor Network
Processor with Application-Driven Memory Optimizations and Out-
of-Order Execution,” Proc. ACM CASES, pp.249-256, Sep.2005.
[7] Naoto Kimura and Shaharm Latifi, “A Survey on Data Compression
in Wireless Sensor Networks,” Proc.ITCC 2005.
[8] S.Song, et al., “A 2Mb/s Widband Pulse Transceiver with Direct-
Coupled Interface for Human Body Communication,” Proc. IEEE
ISSCC 2006.
[9] http://www.physionet.org/physiobank/database/mitdb/
Wireless Sensor Networks,”
210