ArticlePDF Available

Abstract

The paper describes a new software package for sequence alignment analysis called ClustalG. The package is a rewrite of the well-known Clustal series of alignment packages. The main new feature of ClustalG is the recognition of input word sequences of up to six characters. This effectively eliminates the 20 letter constraint implmented in biological software on the number of event categories available to the researcher. The essential ClustalG windows are shown and the meaning of the most common variable settings are discussed. Some elementary alignments and clustering trees are illustrated. The software package including the help file and a number of time use sample files is freely available to any user from the St. Mary's University ftp site.
1
ClustalG: Software for analysis of activities and sequential events
Clarke Wilson
Canada Mortgage and Housing Corporation
C5-318 700 Montreal Road
Ottawa, Canada K1A 0P7
Phone 613 748 4670
Fax 613 748 4865
email cwilson@cmhc-schl.gc.ca
Andrew Harvey
Department of Economics
St. Mary’s University
Halifax, Nova Scotia B3H 3C3
Phone 902 420 5676
email andrew.harvey@stmarys.ca
Julie Thompson
Intitute de Genetique et de Biologie Moleculaire et Cellulaire
Strasboug, France
julie@igbmc.u-strasbg.fr
1999
Paper presented at the Workshop on Longitudinal Research in Social Science: A Canadian Focus,
Windermere Manor, London, Ontario, Canada, October 25-27, 1999.
2
ABSTRACT
The paper describes a new software package for sequence alignment analysis called ClustalG. The
package is a rewrite of the well-known Clustal series of alignment packages. The main new
feature of ClustalG is the recognition of input word sequences of up to six characters. This
effectively eliminates the 20 letter constraint implmented in biological software on the number of
event categories available to the researcher.
The essential ClustalG windows are shown and the meaning of the most common variable settings
are discussed. Some elementary alignments and clustering trees are illustrated. The software
package including the help file and a number of time use sample files is freely available to any user
from the St. Mary’s University ftp site.
Key words: activity patterns sequence alignment software
Acknowledgements
The authors acknowledge the original work on the Clustal program group by Des Higgins and
Paul Sharp, and we thank Toby Gibson of the European Molecular Biology Laboratory for
agreeing to allow us to redesign the program for use outside molecular biological applications.
Their work represents a huge investment of research funds and talent that will now be applied to
subjects far beyond those for which Clustal was originally designed.
This work is part of the Activity Settings, Sequencing and the Measurement of Time Allocation
Patterns, project funded by the Social Science and Humanities Research Council of Canada.
1. The ClustalG project
The Activity Settings: Design, Measurement, and Analysis research project funded by the Social
Science and Humanities Research Council of Canada in 1994 produced a number of papers and
publications that have illustrated the application of sequence alignment methods and software as
developed in molecular biology to time use and transportation research [1, 2, 3]. These have all
used versions of the Clustal programs maintained at the European Molecular Biology Laboratory.
The results of the applications suggest that alignment methods hold great promise for examining
social processes that consist of sequences of activities. Abbott [4] has reviewed a variety of
research into sequential processes based on alignment or optimal matching as the methods are
sometimes called. However, the biological software contains a number of features that have no
place in social science research, and available packages generally limit the eligible alphabet to just
over 20 characters.
A subsequent SSHRCC project, Activity Settings, Sequencing and the Measurement of Time
Allocation Patterns, has contracted the Clustal programmer, Julie Thompson, to amend the
windows version, ClustalX, for the research in any discipline that deals with sequential processes.
3
The product is called ClustalG (for general) and is available from the ftp site at St. Mary’s
University, Halifax.
The properties of ClustalW and ClustalX have been published [5,6]. Briefly, the packages
implement a two stage process of calculating the pairwise similarities in a set of sequences then
constructing a tree from transformations of the similarities. The tree is used to guide the
progressive multiple alignment of the set of sequences.
ClustalG has deleted the explicitly biochemical features of ClustalX, has expanded the input
routines to accept multiple letter words of up to six characters, and has created an new output file
that specifies the members of each step by which the program clusters individuals into
progressively larger and more general pattern groupings. The key feature is the introduction of
multiple letter words because this permits analysts to use complex coding schemes that are usual
in many sciences. Analysts may use different positions in the word to indicate different dimensions
of events. In our example data, the first two positions indicate an activity, the third indicates
location, and the fourth who else was present.
2. Sequence alignment methods
Sequence alignment, or optimal string matching as the methods are also called, employ
combinatorial algorithms to calculate measures of either similarity or distance between character
sequences. See Waterman [7] for a comprehensive treatment of alignment mathematics and
biological applications. When stages of processes or activities are represented by characters, these
measurements can form the basis of taxonomies of the behaviour being examined. Alignment
methods provide the most rigorous basis available for classifying groups of character sequences.
The general process can be illustrated by writing the elements of two sequences in the margins of
a comparison table and placing an asterisk in cells for which marginal elements match. Consider
the comparison of letters of [mississippi] and [missouri] shown in Figure 1.
Figure 1: Comparison table for [mississippi] and [missouri]
m i s s i s s i p p i
m *
i * * * *
s * * * *
s * * * *
o
u
r
i * * * *
The degree similarity of the two names is established in the first syllable as shown by the
downward sloping diagonal pattern of stars. The [iss] substring is repeated in [mississippi] and
4
this is illustrated by the second diagonal, translated three positions to the right. The remaining
letter matches are more or less random.
The alignment algorithms are based on calculation of a cumulative score beginning at the upper
left cell and proceeding to the lower right. A cell’s score is based on the preceding score plus its
own value. Values are determined by weighting systems related to the substance of the problem in
question. A path can be found that leads backwards from the lower right cell through the highest
value cells to the upper left. The order in which letters are included in the path, and in particular
whether a letter matches another letter or is placed against a gap, determines the pairwise
alignment. Gaps may be inserted in either sequence to allow identical letters to match. Optimal
paths and alignments are often not unique. One option for the alignment of [mississippi] and
[missouri] is shown below:
m i s s - - - i s s i p p i
m i s s o u r i - - - - - -
The exact patter of letters in positions five and following is determined by the system of scoring
weights and gap penalties used.
Pairwise alignment may be generalized to multiple alignments by defining comparison tables and
paths in N dimensions. However, for N greater than about 10 sequences, the algorithms are
prohibitively costly in time and memory space. Multivariate alignments are usually implemented
using approximate methods based on pairwise measures. This is the case with the Clustal program
family.
3. ClustalG screen
The ClustalG screen is shown in Figure 1 as it is displayed when the program is executed. Seven
menus control the loading of sequence files, editing, alignments, preparation of trees calculated as
a result of the clustering process performed progressively on the sequence file, coloring, depiction
of special sequences or segments (quality), and the help screens. This presentation deals only with
file manipulation and specification of alignment parameters. The ClustalG online help facility
covers the other items.
ClustalG operates in Multiple Alignment or Profile Alignment modes, which are selected from the
first of the drop down boxes. Multiple Alignment mode uses a single screen. Profile Alignment
mode uses two screens because a profile alignment is an alignment of two previously constructed
alignments. Multiple alignment mode is normally used first to find useful arrangements of
sequence data. The researcher may later want to combine various alignments.
The Alphabet Size from one to six characters must be chosen before sequences are loaded in to
ClustalG. All elements of all sequences are treated as having a constant size.
The Windows menu bar at the bottom is not part of the ClustalG screen.
5
4. File menu options
6
Load sequences:
This is the first step and is mandatory. Selection of the load sequence option invokes a Windows
Open screen that allows user to specify drive, folder and filename. Sequence labels are written in
the left-hand box and the sequence elements are written to the right.
Many single letter sequence formats are used in biology. ClustalG allows all that have been
implemented in ClustalX in addition to the multiple letter words. The simplest is the Pearson or
Fasta format which has been used in the example files. This format begins each sequence record
with a greater-than symbol and the characters on the line following are treated as a label. Lines
following the first line are treated as sequence data and are read until another greater-than symbol
or the end of file character is found. For example:
> 1346wda 12e 12
rewaeawreamr
> 1444sna 17e 17
rrkcdecdacemckkmr
An example of a sequence that uses 4 letter words is:
> 2011mna 15e 15
ZzhaPchaPchaTrtaWkwaWkwaWkwaTrtaZzhaTvhaEthfZzhaZzhaFchfZzha
The diary reads: asleep, home, alone; personal care, home, alone; personal care, home, alone;
travel, location is travel, alone; work, at workplace, alone; etc...
Append sequences:
Additional sequences can be added to a file previously loaded.
Save sequences as:
Permits user to specify a new file location for edited sequence files or for new alignments using
different sets of parameter values.
Profile options:
Similar to the multiple alignment options except that two files are specified
Write as Postscript:
Creates Postscript graphic output file.
7
5. Alignment menu options
The first subgroup of options launches and controls the output of ClustalG files, including the
alignment file (*.aln), the dendrogram file (*.dnd). The second group allows amendment of
existing alignments, and the third group sets the parameters for the alignment. Group three is
described first.
Reset These are selected with the mouse and display a check when selected
Save log file This is also selected with the mouse and controls the writing of the
pairwise similarity values (*.lg1) and the grouping steps (*.lg2) to disk.
Pairwise alignment parameter screen
This option invokes a dialogue box that allows the user to control the set of pairwise alignments
that are computed first and from which the guide tree is calculated which in turn controls the
multiple alignment step. Gap open and gap extension parameters should be set.
A file containing weights for element matches and substitutions may be input. The default is to use
an identity matrix with diagonal values set to 10. Gap penalty parameters should relate to weight
matrix values.
8
Multiple Alignment Parameter screen
This options invokes another dialogue box the allows the user to control the set of parameters
used by ClustalG in conjunction with pairwise similarity scores to calculate a guide tree, and from
there to assemble the multiple alignment. The weight matrix usage is the same as for the pairwise
screen.
Gap parameter option screen
A dialogue box allows further specification of gap parameters.
Output format option screen
User selects one or more formats for the output files
The line width may be set to control alignment appearance. The alignment is written as a series of
blocks of fixed width. Where output lines are comparatively short, they may fit on letter or legal
paper in portrait or landscape orientation. Where lines are too long the user can control block
width up to 1000 characters.
9
The aligned sequences may be written in input order or in an order that roughly follows their
grouping order.
The user may choose to have the
parameter set written to a file for
future reference.
Do Complete Alignment
Screen
This screen allows the
user to name the output
alignment and
dendrogram files. The
default is to use the same
name as the sequence file
that was loaded with extensions of *.aln and *.dnd. An alignment file is shown later. The guide
tree or dendrogram is written as a text file of nested parentheses containing sequence labels and
the branch lengths of the tree which can be drawn from the nesting pattern. No tree is drawn.
However, the file format is recognized by biological graphics software.
10
Produce guide tree only
This generates only the *.dnd file. This may be used with tree drawing software (for example
Treeview by Rod Page, University of Glasgow, or Phylip by Joe Felsenstein , University of
Washington) to display the dendrogram graphically. A Treeview [8] screen is shown later.
Alignment from guide tree screen
Specifies an existing guide tree to control the multiple alignment. New multiple parameters may
be selected.
6. ClustalG Alignment Screen
ClustalG has identified three primary behavioural groups in the test data file. Their multiple
alignment is shown above and the groupings are shown in the tree diagram on the next page. The
order of output in the alignment is roughly but not precisely that in the tree diagram and the two
should be used together to identify the membership of the primary behavioural groupings that
occur in the alignment.
11
7. Treeview illustration of the ClustalG guide tree file
The tree identifies similar groups of sequences precisely. The alignment describes what the
activity patterns are. The middle group in the tree diagram are employed people who had several
work episodes in their diaries.
12
References
1. Wilson W.C. 1998 Activity pattern analysis by means of sequence alignment methods,
Environment and Planning, volume 30, pp. 1017-1038
2. Wilson W.C. 1998 Analysis of travel behaviour using sequence alignment methods,
Transportation Research Record, number 1645, pp. 52-59.
3. Harvey A.S. and Wilson W.C. 1998, Evolution of daily activity patterns: a study of the Halifax
panel survey, paper presented at Thematic Group 1, Time-Use, World Congress of Sociology (in
conjunction with Association, International Association for Time Use Research) University of
Quebec, Montreal, July 26-August 1, 1998.
4. Abbott A. 1999 the review paper. citation to come
5. Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G.1997, The
ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality
analysis tools. Nucleic Acids Research, 24:4876-4882.
6. Thompson, J.D., Higgins, D.G. and Gibson, T.J. 1994 CLUSTAL W: improving the sensitivity
of progressive multiple sequence alignment through sequence weighting, positions-specific gap
penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680.
7. Waterman, M. 1995, Introduction to Computational Biology, Chapman and Hall, London
8. Page, R. D. M. TREEVIEW: An application to display phylogenetic trees on personal
computers, Computer Applications in the Biosciences, 12:357-358.
... Programs using this kind of sequence analysis produce "trees", which divide sequences taxonomically. The second type of sequence analysis, less frequently employed, is used to match and detect patterns of behaviour in some or all of the sequences scrutinised (Wilson, 1999). The first type of utilisation is more relevant to our research. ...
... The first type of utilisation is more relevant to our research. Early studies (Wilson, 1999) of sequence alignments, including "Activity Settings, Sequencing, and Measurement of Time Allocation Patterns", were based on software called ClustalG. ...
... gov/Blast .cgi). The phylogenetic and molecular evolutionary analysis was conducted using ClustalG software (Wilson et al. 1999), and the results were further analyzed with Mega6 software (Tamura et al. 2013). The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model (Jones et al. 1992). ...
Article
Full-text available
The signal transducer and activator of transcription 3 (STAT3) gene plays a crucial role in leptin-mediated energy metabolism, upon which the growth and development of animals depend. Nevertheless, no studies have reported the effects of STAT3 gene polymorphisms on body weight and fatness modulation in sheep. This study aimed to illustrate STAT3 mRNA expression across tissues and various developmental stages of sheep and to highlight the association of STAT3 gene polymorphisms with body weight and fat-related traits in sheep, in order to identify a genetic marker that may conceivably be of value for marker-assisted selection (MAS). This study revealed that STAT3 was differentially expressed across age and sex (p < 0.05), with higher expression in the ram liver. The abundant expression of STAT3 in the liver of male sheep and increased expression in the hypothalamus and longissimus dorsi muscle from birth to six months of age may indicate the vital role of the STAT3 gene in animal growth and development. Moreover, SNP association analysis also revealed that the novel SNPs of the STAT3 gene detected in this study showed a significant association with body weight and fatness traits (p < 0.05). In conclusion, the significant genetic effects of the STAT3 gene polymorphisms on sheep growth and development revealed that STAT3 could be a marker gene for the selection of growth-related traits in sheep.
... SAM compares two strings of sequences and makes the two strings identical by adding or deleting characters and/or switching the order of certain characters. The idea is to count how many operations it takes to make one string of sequences identical to another string: the more operations needed, the larger the distance between the two sequences and vice versa (Wilson, Harvey, & Thompson, 1999). This operation of comparing two sequences ("pairwise alignment") is the basis for the algorithm of multiple alignments that compares more than three sequences. ...
Article
Full-text available
Although migration trajectories over people's life courses seem to be associatedwith mental health outcomes, previous studies have considered migration at onlyone point in time when correlating migration with mental health. However, peoplecan migrate multiple times during their life courses. The decision to migrate canbe triggered by several life course development events, such as education, entryto the labour market, marriage, or retirement. The present study addressed thisresearch gap by focusing on the trajectories of migration and their relationship tomental health among internal migrants in China. Data were collected from across-sectional survey (N= 534) in Shenzhen, China, in 2017. People's migrationtrajectories were aligned into migration groups using sequence alignment method.Binary logistic regression models were estimated to assess the associationsbetween each migration trajectory group and the prevalence of mental healthproblems, controlling for sociodemographics and self-reported physical health. Theresults show that migration trajectories—namely, the sequence of multiple migra-tions between migrants' places of origin and their final destinations—are signifi-cantly related to mental health outcomes. Our findings suggest that treatingmigration as a one-time transition could be problematic because many migrantsundertake multiple migration trips.
... (1) compute distance matrix for similarity between all sequences pairs, (2) construct guide tree from the distance matrix, a hierarchical data structure that groups sequences by similarity, and (3) progressively align sequences according to the guide tree. Wilson et al. (2005) created ClustalG to align generic activity sequences in the social sciences. It works on user-defined symbols in addition to symbols that represent nucleic and amino acids. ...
Article
Full-text available
Our objective was to model process variation of Emergency Medical Service teams responding to simulated pediatric emergencies and determine if sequence alignment distinguishes performance quality. We performed a retrospective process analysis by watching and coding activities in videos from standardized simulations of 42 Emergency Medical Service teams. Teams were classified into high- or low-performing groups based on the Clinical Teamwork Scale™. Activities were coded according to resuscitation tasks, performer, and times. We used ClustalG to align task sequences within and between groups, and measured similarity. Teams within and between performance levels had an average sequence similarity of 52 ± 7% and 50 ± 7%. Teams performed clinically appropriate tasks that varied in prioritization, for example, performing compressions or connecting the EKG monitor early. There was no statistical difference in gross similarity between groups but specific differences in prioritization may have had clinically meaningful implications. Alignment could improve by accounting for task duration and concurrency.
... Sequences were compared with the use of the Optimal Matching Analysis tool (Chan, 1995). Fabrikant, Rebich-Hespanha, Andrienko, Andrienko, and Montello (2008) analysed eye-movement data recorded in controlled experiments on small-multiple map (a series of similar maps using the same scale, allowing them to be easily compared) displays with the use of ClustalG software (Wilson, Harvey, & Thompson, 1999). Clustal software packages are widely used for analysing gene sequences in DNA and proteins. ...
Article
Full-text available
The paper is dealing with scanpath comparison of eye-tracking data recorded during case study focused on the evaluation of 2D and 3D city maps. The experiment contained screenshots from three map portals. Two types of maps were used - standard map and 3D visualization. Respondents’ task was to find particular point symbol on the map as fast as possible. Scanpath comparison is one group of the eye-tracking data analyses methods used for revealing the strategy of the respondents. In cartographic studies, the most commonly used application for scanpath comparison is eyePatterns that output is hierarchical clustering and a tree graph representing the relationships between analysed sequences. During an analysis of the algorithm generating a tree graph, it was found that the outputs do not correspond to the reality. We proceeded to the creation of a new tool called ScanGraph. This tool uses visualization of cliques in simple graphs and is freely available at www.eyetracking.upol.cz/scangraph . Results of the study proved the functionality of the tool and its suitability for analyses of different strategies of map readers. Based on the results of the tool, similar scanpaths were selected, and groups of respondents with similar strategies were identified. With this knowledge, it is possible to analyse the relationship between belonging to the group with similar strategy and data gathered from the questionnaire (age, sex, cartographic knowledge, etc.) or type of stimuli (2D, 3D map).
... Sequences were compared with the use of the Optimal Matching Analysis tool (Chan, 1995). Fabrikant, Rebich-Hespanha, Andrienko, Andrienko, and Montello (2008) analysed eye-movement data recorded in controlled experiments on small-multiple map (a series of similar maps using the same scale, allowing them to be easily compared) displays with the use of ClustalG software (Wilson, Harvey, & Thompson, 1999). Clustal software packages are widely used for analysing gene sequences in DNA and proteins. ...
Article
Full-text available
The article describes a new tool for analyses of eye-movement data. Many different approaches to scanpath comparison exist. One of the most frequently used approaches is String Edit Distance, where the gaze trajectories are replaced by the sequences of visited Areas of Interest. In cartographic literature, the most commonly used software for scanpath comparison is eyePatterns. During the analysis of eyePatterns functionality, we have found that tree-graph visualization of its results is not reliable. Thus, we decided to develop a new tool called ScanGraph. Its computational algorithms are modified to work better with the sequences with different lengths. The output is visualized as a simple graph, and similar groups of sequences are displayed as cliques of this graph. The article describes ScanGraph’s functionality on the example of a simple cartographic eye-tracking study. Differences of the reading strategy of a simple map between cartographic experts and novices were investigated. The paper should serve to the researchers who would like to analyze differences between groups of participants, and who would like to use our tool - ScanGraph, available at www.eyetracking.upol.cz/scangraph.
... Various alignments can be built, but the one that well reflects the common letters of the cluster of sequences at certain important positions, is regarded as an accurate signature, from which a representative pHMM can be deduced. Most of the existing multiple alignment methods, such as ClustalG (Wilson, Harvey, & Thompson, 1999) and Dana (Joh et al., 2001), use a matrix of pairwise alignment scores between the sequences to automatically build the multiple alignment, by means of a progressive alignment approach. In this process, sequences are added to the alignment incrementally, beginning with the most similar pair of sequences and finishing with the most distant ones, according to the distance of the sequences measured by the pairwise alignment scores. ...
Article
In literature, activity sequences, generated from activity-travel diaries, have been analyzed and classified into clusters based on the composition and ordering of the activities using Sequence Alignment Methods (SAM). However, using these methods, only the frequent activities in each cluster are extracted and qualitatively described; the infrequent activities and their related travel episodes are disregarded. Thus, to quantify the occurrence probabilities of all the daily activities as well as their sequential orders, we develop a novel process to build multiple alignments of the sequences and subsequently derive profile Hidden Markov Models (pHMMs). This process consists of 4 major steps. First, activity sequences are clustered based on a pre-defined scheme. The frequent activities along with their sequential orders are then identified in each cluster, and they are subsequently used as a template to guide the construction of a multiple alignment of the cluster of sequences. Finally, a pHMM is employed to convert the multiple alignment into a position-specific scoring system, representing the probability of each frequent activity at each important position of the alignment as well as the probabilities of both insertion and deletion of infrequent activities.
Thesis
Full-text available
New forms of transportation demand models use an activity based approach which requires an activity pattern assigned to each individual. When all the finer details of activities are considered, there are infinite numbers of possible ways person can arrange his or her daily activity pattern. Therefore, for practical transport modelling, similar types of activity patterns need to be identified so the process can allocate a day pattern type and then select a specific day pattern for each individual. It appears that in much of the activity-based modelling that has occurred in practice to date, the segmentation is largely based on received wisdom and still follows the home-based-work, home-based-school and home-based-other thinking inherent in the trip-based paradigm, suggesting that little has been done to exploit the possibilities that are available. Much of the clustering done so far in activity research has used a crisp clustering approach where an object belongs only to a single cluster. In the other hand, fuzzy clustering provides a membership to all the clusters in some degree ranging between 0 and 1 making it more useful for practical modelling. A dataset of 17,740 individual daily diaries including 1501 for weekends collected from the Calgary Household Activity Survey (HAS) is used for analysis. A method known as sequential alignment method (SAM) first introduced to gene comparison in biological science is used such that the sequential effect of the data can be captured in the proximity method feeding into fuzzy clustering algorithm to find the membership to the clusters. The daily activity patterns were clustered into 10 unique activity pattern types with memberships to all the clusters. The basic design of a modelling process for assigning activity patterns to individuals was proposed. This process considers each individual in a population in turn, with three components, a) Cluster Membership Calculator, which assigns membership probabilities to the vector of available activity pattern clusters for the individual, b) Cluster Selector, which assign one activity pattern cluster to the individual using a Monte Carlo approach with the membership probabilities as the selection probabilities, and c) Activity Pattern Selector, which assigns a specific activity pattern to the individual.
Article
Abstract Task-Technology Fit theory and the Technology Acceptance Model identify system utilization as an important indicator for the performance of complex software systems. Yet, empirical evaluations of user interaction with group decision support systems are scarce and often methodologically underdeveloped. For this study we employed an exploratory evaluation of user interaction in the context of web-based group decision support systems. Specifically, we used information-rich server logs captured through a web-based platform for participatory transportation planning to identify groups of users with similar use patterns. The groups were derived through multiple sequence alignment and hierarchical cluster analysis based on varying user activity measures. Subsequently, we assessed the reliability of the classifications obtained from the two clustering methods. Our results indicate limited reliability of classifications of activity sequences through multiple sequence alignment analysis and robust groupings from hierarchical cluster analysis for user activity initiations and durations. The presented work contributes a novel methodological framework for the evaluation of complex software systems that extends beyond the common approach of soliciting user satisfaction.
Article
Full-text available
Sequence alignment methods are applied to daily activity data derived from the Statistics Canada 1992 General Social Survey on Time Use, with special emphasis on travel episodes and the activities that generate travel. Sequence alignment is a combinatorial procedure that gives a quantitative measure of the similarity of character sequences, which may be used to represent daily activity patterns. It accommodates all the details supplied from activity diaries including the ordering of activity episodes, their duration, and patterns of transitions from one activity to another. Analysis of daily activity patterns by using such methods offers a new way of improving understanding of travel behavior. Such an understanding is especially critical when public transport policy is being driven increasingly by budget constraints, and traffic management through congestion is considered an acceptable response to increasing travel demands. The method successfully identifies groupings of behavioral patterns, which then may be further described by using multivariate analysis of sociodemographic characteristics. A key issue in the application of the method is to determine the circumstances in which activity sequences should or should not reflect episode duration.
Article
Full-text available
The author describes a method of comparing sequences of characters,called sequence alignment or string matching, and illustrates its use in the analysis of daily activity patterns derived from time-use diaries. It allows definition of measures of similarity or distance between complete sequences, called global alignment, or the evaluation of the best fit of short sequences within long’sequences, called local alignment. Alignments may be done pairwise to develop similarity or distance matrices that describe the relatedness of individuals in the set of sequences being examined. Pairwise alignment methods may be extended to many individuals by using multiple alignment analysis. A number of elementary hand-worked examples are provided. The basic concepts are discussed in terms of the problems of time-use research and the method is illustrated by examining diary data from a survey conducted in Reading, England. The CLUSTAL software used for the alignments was written for molecular biological research. The method offers a powerful technique for analyzing the full richness of diary data without discarding the details of episode ordering, duration, or transition. It is also possible to extend the analysis to include the context of activities, such as the presence of other persons or the location, but such extensions would require software designed for social science rather than biochemical problems. The method also offers a challenge to researchers to begin to develop theories about the determinants of daily behavior as a whole, rather than about participation in single activities or about time-budget totals.
Article
Full-text available
CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.
Article
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to downweight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.
Evolution of daily activity patterns: a study of the Halifax panel survey, paper presented at Thematic Group 1, Time-Use, World Congress of Sociology (in conjunction with Association, International Association for Time Use Research
  • A S Harvey
  • W C Wilson
Harvey A.S. and Wilson W.C. 1998, Evolution of daily activity patterns: a study of the Halifax panel survey, paper presented at Thematic Group 1, Time-Use, World Congress of Sociology (in conjunction with Association, International Association for Time Use Research) University of Quebec, Montreal, July 26-August 1, 1998.
the review paper. citation to come
  • A Abbott
Abbott A. 1999 the review paper. citation to come
Evolution of daily activity patterns: a study of the Halifax panel survey, paper presented at Thematic Group 1, Time-Use
  • A S Harvey
  • W C Wilson
Harvey A.S. and Wilson W.C. 1998, Evolution of daily activity patterns: a study of the Halifax panel survey, paper presented at Thematic Group 1, Time-Use, World Congress of Sociology (in conjunction with Association, International Association for Time Use Research) University of Quebec, Montreal, July 26-August 1, 1998.