Page 1

CONNECTING THE DOTS BETWEEN LASER WAVEFORMS AND

HERBACEOUS BIOMASS FOR ASSESSMENT OF LAND DEGRADATION

USING SMALL-FOOTPRINT WAVEFORM LIDAR DATA

aJ. Wu, aJ.A.N. van Aardt, bG. P. Asner, cR. Mathieu, bT. Kennedy-Bowdoin, bD. Knapp, cK. Wessels,

dB.F.N. Erasmus, and eI. Smit

aChester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY

bCarnegie Institution for Science, Stanford, CA

cCouncil for Scientific and Industrial Research, Pretoria, South Africa

dSchool of Animal, Plant and Environmental Science, University of the Witwatersrand, Johannesburg,

South Africa

eKruger National Park Scientific Services, Skukuza, South Africa

ABSTRACT

Measurement and management of vegetation biomass

accumulation in ecosystems typically involves extensive

field data collection, which can be expensive and time

consuming, while leaving the user with relatively crude

inputs to intricate biomass models. Light detection and

ranging (LiDAR) remote sensing, which provides extensive

height measurements of terrain and vegetation, has become

an effective alternative to characterize vegetation structure.

In this paper, we report on ongoing efforts at developing

signal processing approaches to model herbaceous biomass

using a new generation of airborne laser scanners, namely

full-waveform LiDAR systems. Structural and statistic-based

feature metrics are directly derived from LiDAR waveforms

at the pixel level and related to plot-level field data. Initial

results reveal a definite correlation between the LiDAR

waveform and herbaceous biomass. Ongoing research

focuses on the links between fractional cover estimated from

imaging spectroscopy and woody biomass.

Index Terms—Biomass, LiDAR, waveform, signal

processing

1. INTRODUCTION

Information regarding global carbon sources (e.g.,

emissions) and sinks (e.g., carbon sequestration) is essential

to our understanding of global energy flows and general

carbon stock fluctuations. Such information also plays an

important role in fine-scale dynamics, specifically those

related to vegetation biomass and its link to land

degradation, i.e., the loss of an ecosystem’s capability to

provide services to communities. However, measurement

and

accumulation typically involves extensive field data

collection, which includes parameters such as foliar area,

crown volume, bare soil coverage, and vegetation height.

Acquisition of these data can be expensive and time

consuming. Traditional remote sensing technology, such as

multi-spectral data (e.g. 1km2 pixels in NOAA AVHRR data

or 250m x 250m pixels MODIS data), has been applied to

develop regional indicators of vegetation production.

However, these spectrally-and-spatially coarse resolution

data cannot unravel changes in the land surface at the scale

at which fine scale plant physiological processes actually

occur (a few meters). Nor can they identify vegetation

composition and structure, especially in the vertical

dimension. Light detection and ranging (LiDAR) remote

sensing, which provides extensive height measurements of

terrain and vegetation, has created novel opportunities for

accurate characterization of vegetation structure. A LiDAR

sensor typically emits a laser pulse and registers the return

trip distance between the sensor and a reflective target,

thereby enabling range measurements. A novel type of

LiDAR sensor, called waveform LiDAR, capable of

recording and digitizing the full-backscattered signal at high

vertical resolution (~1ns), holds much promise for detailed

vertical characterization of vegetation structure.

Full-waveform LiDAR data have been widely used for

estimating forest parameters, e.g., canopy height, stem

diameter, woody biomass, etc. [1-5]. However, these studies

are constrained to tree characterization. In this paper, we

explored the possibility of above-ground herbaceous

biomass estimation via a signal-processing approach applied

to small-footprint waveform LiDAR data. The entire

waveform processing workflow consists of de-noising,

signal deconvolution, Gaussian decomposition, statistical

management of vegetation biomass (carbon)

Page 2

feature extraction, and regression model development. The

research goal is to eventually link woody-herbaceous

biomass assessment and

approaches, even though this paper focuses mainly on

herbaceous biomass modeling.

2. DATA

2.1. Study area

The study area is bounded by (22°8’00” S; 30°34’52”E) and

(25°32’48”s; 32°2’50”E) in South Africa (Figure 1) and

spans a conservation-subsistence farming land use gradient.

This gradient is defined along a transect from the

Bushbuckridge (communal range lands; high rural

population density) to the Sabie Sands game reserve (private

conservation area) and Kruger National Park (state-owned

conservation area) areas.

lidar-imaging spectroscopy

Figure.1: Location of the study area that spans a land use

gradient (west-to-east) from heavily exploited range- and

farmland, a private game reserve, and the Kruger National

Park, South Africa

2.2. Remote sensing and field data

Waveform LiDAR data (pixel size: 0.56x0.56 m; vertical

resolution: 1ns) were acquired by Carnegie Airborne

Observatory (CAO) during April 2008. Each scene pixel is

represented by an incoming

distribution with 256 bands at 1ns (0.15m) spacing. The

associated waveform of the outgoing pulse was also

available.

Field data for

this research were

collected from 36

sites in the study

area, each 50 x 50 m

in size. A total of 36

plots were laid out

within each site at a

10 m spacing,

resulting in a grid-like

pattern (Figure 2). For

design with 36 plots/site

(received) waveform

each plot, the herbaceous biomass was weighed within a

0.5x0.5m grid, along with assessment of other variables,

e.g., woody biomass, canopy density, etc.

3. HERBACEOUS BIOMASS MODELING

Figure 3 shows the workflow of the waveform LiDAR-based

herbaceous biomass modeling procedure. It consists of two

parts, namely signal preprocessing and modeling.

De-noising (FFT)

Figure.3: Workflow of LiDAR waveform based herbaceous

biomass modeling

3.1. Signal preprocessing

The raw incoming (received) waveform typically exhibits a

stretched and featureless character, attributed to a fixed time

span allocated for detection, the sensor’s variable outgoing

pulse signal, the receiver impulse response, and system

noise. Signal preprocessing therefore is necessary to recover

the true response distribution of optically active targets

along the path of the LiDAR waveform.

First, system noise is typically present in the form of

high frequency components of the raw signal in the

frequency domain. Therefore, we smoothed the raw

waveform by setting a cut-off frequency threshold for

removal of noise components in the frequency domain (a

similar effect as a low pass filter), followed by conversion

back to the time domain. The subsequent noise-filtered

waveform can be mathematically modeled as:

Pr(t) = Pt(t)*σ (t)*Γ(t) (1)

where Pt(t) refers to the outgoing waveform (known), σ (t)

represents the cross-section (the true response distribution of

the target), and Γ(t) is the receiver impulse response

(estimated by the return signal from a flat ground area). The

true response of the target can be derived by sequentially

50 m50 m

10 m10 m

Figure.2: Site-level sampling

Signal

Preprocessing

Modeling

Biomass model

Deconvolution (Richard-Lucy)

Gaussian decomposition (EM)

Feature metrics extraction

Linear regression

Raw waveform

Field data

Page 3

deconvolving the incoming waveform from the outgoing

waveform and receiver impulse response. We applied the

Richardson-Lucy algorithm [6] for this purpose. Richardson-

Lucy is an iterative deconvolution procedure, which is based

on Bayes’ statistical theorem. The mathematical solution of

σ (t) can be expressed as:

)

σ i+1(t) =)

σ i(t)

h(t)*)

where h(t) = Pt(t)*Γ(t), and i denotes the iteration. The

residual for each iteration is computed as:

ri(t) = Pr(t) − h(t)*)

The residual will converge as the iteration progresses. The

user can terminate the iteration, either by selecting a specific

residual threshold or by setting a constant iteration number.

Harsdorf and Reuter [7] claimed that the one-dimensional

Richardson-Lucy algorithm resulted in the most stable

results when compared to Fourier transform and non-

negative least squares approaches. This processing step

enhances the vertical signal resolution, which facilitates

extraction of target information from the waveform.

3.2. Modeling

Pr(t)

σ i(t)*h(t)

(2)

σ i(t) (3)

Figure.4: Raw waveform (left) and Gaussian decomposition

of the deconvolved waveform (right)

The last component of a return LiDAR waveform typically

corresponds to the ground-level response, which may be

composed of bare soil, grass, leaves, stones, etc. We

hypothesized that the herbaceous biomass, directly

associated with the grass abundance, can be linked to the

properties of the last waveform component (e.g. width,

height, area). Figure 4 (left) shows the raw return waveform

(single peak) where there is no tree or shrub. Figure 4 (right)

reveals a dual-peak intensity distribution after deconvolution

of the raw waveform; this was hidden in the raw signal (left)

due to the existence of an imperfect system response and

variable outgoing “pulse”. An expectation-maximization

(EM) algorithm was subsequently employed to decompose

this deconvolved waveform into two individual Gaussian

curves [8]. It is evident from Figure 4 that the second

Gaussian is mainly due to the asymmetric trailing edge,

relative to the leading edge in the raw waveform. This

asymmetric trailing edge typically results from the late

return photons due to the structure of the ground layer (e.g.,

grass), leading to multiple scattering of the return signal. On

the other hand, the first Gaussian was seen as corresponding

mainly to the single scattering from the ground material (e.g.

bare soil, grass, stone, etc). The mathematic description of

this waveform as a mixed Gaussian model is expressed as:

x−u1

()2

2σ1

g(t) = a1e

where a1 and a2 are the amplitudes of the Gaussian peaks

and σ1 and σ2 are the standard deviation (related to width) of

each Gaussian (x and µi are input and mean variables,

respectively). The next step involved extraction of waveform

metrics (independent variables) and linking these to the field

biomass data. Since we have parameterized the waveform in

terms of a Gaussian distribution, feature metrics can be

directly extracted from Eq. 4 (e.g., a1, a2, σ1, and σ2). We

also added two additional metrics, namely s1 and s2, which

correspond to the integration (area) of the two Gaussian

curves. These six independent metrics are not necessarily

uncorrelated, which led to the exclusion of highly-correlated

(> 0.8) metrics after calculation of correlation coefficients.

The herbaceous biomass model was finally retrieved based

on a linear regression fit between the selected, independent

feature metrics and field data in the form of:

n

∑

−

2

+ a2e

−

x−u2

(

2σ2

)2

2

(4)

Hbiomass=

cnpn+ k

1

(5)

where pn refers to the nth feature metric, cn represents the

associated coefficient, and k is the regression intercept.

4. EXPERIMENTAL RESULTS

The proposed model was tested for 6 different sites, the only

ones that contained waveform lidar data. Herbaceous

biomass in these sites ranged between 0~90 gram/plot (216

plots in total). We only considered waveforms (before

deconvolution) with a single peak, i.e., waveforms that did

not exhibit multiple peaks due to tree canopy returns. This

reduced the number of sample plots to 159. We also

assumed that the GPS locations of the pixel-based

(0.56x0.56m) waveform and the plot center (field sample)

were both representative of the same plot. Herbaceous

biomass samples were then grouped into 5g classes for the

purposes of this study, which led to 18 weight-based

biomass classes (e.g. 0~5, 5~10, …85~90) in the 0-90g

range. Waveform-derived metrics and measured biomass

were averaged within each class.

Table 1 shows the correlation coefficient matrix for the

field data and waveform-derived metrics, used to optimize

the variable selection. All the metrics in Table 1 have been

converted into “natural log” space to minimize the

nonlinearity between the parameters. It is evident that pairs

(a1, s1) and (a2, s2) exhibited high correlations. We therefore

discarded a1 and s2 to ensure model robustness, since these

correlated metrics also exhibited a lower correlation to the

biomass, when compared with s1 and a2, respectively.

Single scattering

Multiple

scattering

Deconvolved

1st component

2nd component

Page 4

Figure 5 shows the results of herbaceous biomass

estimation using feature metrics σ1, s1, a2, and σ2 (Eq. 6),

where the coefficients were solved by least squares linear

regression. We have concluded that the waveform approach

has potential for estimating above-ground herbaceous

biomass, given the model’s ability to explain almost 60% in

herbaceous biomass variability. However, we feel that the

small range in herbaceous biomass field values, limited

structural information, and senescent state of the vegetation

were detrimental to model performance.

ln()

Table 1: Correlation coefficients between field data and

waveform-derived feature metrics

H ) = 6.3*ln(a1) + 5.2*ln(s1) + 0.3* ln(a2) + 0.4 *ln(σ2) −41.6 (6)

Figure 5: Herbaceous biomass estimation using LiDAR

waveform-derived metrics

5. CONCLUSIONS

We successfully extracted waveform LiDAR feature metrics

from the deconvolved waveform’s Gaussian responses to

model plot-level herbaceous biomass - the coefficient of

determination (R2) indicated that our model could explain

60% of the variation in herbaceous biomass. Although this

could be considered as relatively low, it is clear that

significant potential exists for assessment of herbaceous

biomass in savanna ecosystems at fine scales using

waveform LiDAR. We mainly attributed the relatively poor

model performance to a narrow range of field biomass

values. Future research will focus on biomass estimation

during the wet season, linking woody-herbaceous biomass

assessment, and applying spectral-based mixture mapping to

further explore the relative variation of LiDAR returns

across different vegetation species, structures, biomass, etc.,

at the sub-pixel level.

6. ACKNOWLEDGEMENTS

The Carnegie Airborne Observatory is supported by the

W.M. Keck Foundation and William Hearst III; the study

was funded by the Andrew Mellon Foundation. We are

grateful for field campaign funding from the Council for

Scientific and Industrial Research (SA) and PhD student

funding from the Rochester Institute of Technology (USA).

7. REFERENCES

[1] H.-E. Andersen, R. McGaughey, and S. Reutebuch,

“Estimating forest canopy fuel parameters using LIDAR data”,

Remote Sensing of Environment 94(6), pp. 441-449, 2005.

[2] J. Anderson, M, Martin, M.-L. Dubayah, R. Dubayah, M.

Hofton, P. Hyde, B. Peterson, J. Blair, and R. Knox, “The use of

waveform lidar to measure northern temperate mixed conifer and

deciduous forest structure in New Hampshire”, Remote Sensing of

Environment 105(3), pp. 248-261, 2005.

[3] J. Drake, R. Dubayah, D. Clark, R. Knox, J. Blair, M. Hofton,

R. Chazdon, J. Weishampel, and S. Prince, “Estimation of tropical

forest structural characteristics using large-footprint lidar”, Remote

Sensing of Environment 79(2-3), pp. 305-319, 2002.

[4] B. Koetz, F. Morsdorf, G. Sun, K. Ranson, K. Itten, and B.

Allgower, “Inversion of a lidar waveform model for forest

biophysical parameter estimation”, IEEE Geoscience and Remote

Sensing Letters 3(1), pp. 49-53, 2006.

[5] M. Lefsky, D. Harding, W. Cohen, G. Parker, and H. Shugart,

“Surface lidar remote sensing of basal area and biomass in

deciduous forests of eastern Maryland, USA”, Remote Sensing of

Environment 67(1), pp. 83-98, 1999b.

[6] L.B. Lucy, “An iterative technique for the rectification of

observed distributions”, The Astronomy Journal 79(6), pp. 745-

754, 1974.

[7] S. Harsdorf, and R. Reuter, “Stable deconvolution of noise

lidar signals”, Proceedings of EARSel-SIG-Workshop LIDAR,

Dresden, June 16-17, 2000.

[8] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood

from incomplete data via the EM algorithm”, Journal of the Royal

Statistical Society 39(1), pp. 1-38, 1977.

a1

σ1 s1

a2

σ2

s2

Bio

(H)

0.69

a1

σ1

s1

a2

σ2

s2

Bio

(H)

1 -0.12 0.98 0.79 0.35 0.59

-0.12 1 0.07 0 -0.05 -0.09 0.21

0.98

0.79

0.35

0.59

0.07

0

-0.05

-0.09

1 0.80

1

0.52

0.93

0.36

0.52

1

0.57

0.58

0.93

0.57

1

0.75

0.67

0.35

0.50

0.80

0.36

0.58

0.69 0.21 0.75 0.67 0.35 0.50 1