ThesisPDF Available

A User-Friendly, Object-Oriented Multi-Media Mail Filterer

Authors:
  • Siliconglen Ltd

Abstract

The increasing use of electronic mail and the great diversity of materials that are sent via electronic mail has resulted in people having problems managing the volume of mail and identifying important messages in their mailboxes. Until recently, electronic mail consisted of plain text. However, with proposed new standards it is now possible to send and receive sound, graphics, compound messages and many other types of mail. These new formats are likely to pose new problems to the user who wants to handle mail efficiently. This M.Sc. report describes the research, design, implementation and future development issues for an innovative prototype mail filterer. An Object Oriented (OO) Analysis and Design method that was used to implement the filterer is also presented. The filterer is designed to be user-friendly and to handle electronic mail messages that conform to the draft standard for Multi-Media mail written by Nathaniel Borenstein and Ned Freed. This draft standard, called Multi-...
A USER-FRIENDLY, OBJECT-ORIENTED
MULTI-MEDIA MAIL FILTERER
CRAIG COCKBURN
A report submitted in partial fulfilment of the requirements of
Napier University for the degree of Master of Science in
Large Software Systems Development.
Department of Computer Studies
February 1994
Craig Cockburn, M.Sc. report, 1994 1
Abstract
The increasing use of electronic mail and the great diversity of materials that are sent
via electronic mail has resulted in people having problems managing the volume of mail
and identifying important messages in their mailboxes. Until recently, electronic mail
consisted of plain text. However, with proposed new standards it is now possible to
send and receive sound, graphics, compound messages and many other types of mail.
These new formats are likely to pose new problems to the user who wants to handle
mail efficiently.
This M.Sc. report describes the research, design, implementation and future
development issues for an innovative prototype mail filterer. An Object Oriented (OO)
Analysis and Design method that was used to implement the filterer is also presented.
The filterer is designed to be user-friendly and to handle electronic mail messages that
conform to the draft standard for Multi-Media mail written by Nathaniel Borenstein
and Ned Freed. This draft standard, called Multi-purpose Internet Mail Extensions will
be referred to in this document by its usual abbreviation MIME.
The filterer implemented is believed to be the first user friendly mail filterer built
specifically for MIME format mail.
Craig Cockburn, M.Sc. report, 1994 2
Abstract.............................................................................................................................1
Acknowledgements............................................................................................................4
1. Introduction and background .......................................................................................5
1.1. Aims................................................................................................................6
2. Research......................................................................................................................6
2.1. Project History ..............................................................................................7
2.2. Prior knowledge ............................................................................................7
2.3. Filterer research.............................................................................................8
2.4. Existing Filterers............................................................................................11
2.4.1. Diagram of a mail system with filterer..............................................13
2.4.2. Text based mail filterers...................................................................14
2.4.3. GUI based mail filterers...................................................................14
2.5. MIME...........................................................................................................15
2.6. Development environment .............................................................................16
3. System specification.....................................................................................................17
3.1. User requirements..........................................................................................17
3.2. System requirements......................................................................................19
4. Analysis and Design.....................................................................................................20
4.1. Research of Analysis and Design Methods .....................................................20
4.2. Outcome of Analysis and Design Research.....................................................20
4.3. Analysis.........................................................................................................22
4.3.1. Analysis diagram .............................................................................22
4.4. Design...........................................................................................................23
4.4.1. Overview design model ...................................................................23
4.4.2. Table of correspondences between design meta-classes....................24
4.4.3. Design model for problem domain and interface classes ...................24
4.4.4. Design model for application interface classes..................................25
4.5. HCI aspects...................................................................................................25
5. Implementation............................................................................................................28
5.1. HCI aspects...................................................................................................28
5.1.1. Examples showing user interface .....................................................32
5.2. Coding issues.................................................................................................34
5.2.1. Use of wxWindows demonstration programs...................................34
5.2.2. Discussion of code...........................................................................35
5.3. C++ issues.....................................................................................................38
6. Discussion ...................................................................................................................40
6.1. Major problems encountered..........................................................................40
6.2. Evaluation of achievement.............................................................................42
6.3. Major Outcomes............................................................................................44
6.3.1. Departmental outcomes...................................................................44
6.3.2. Personal outcomes...........................................................................45
6.4. Future Directions...........................................................................................45
6.4.1. Development of Application............................................................45
6.4.2. Filtering trends ................................................................................46
6.4.3. MIME and Industry trends ..............................................................47
6.5. Summary.......................................................................................................48
7. References...................................................................................................................49
Craig Cockburn, M.Sc. report, 1994 3
Appendices........................................................................................................................52
A. User Documentation.........................................................................................52
1. Installation ............................................................................................52
2. First time use.........................................................................................52
3. Subsequent use......................................................................................53
B Description of program menu items ..................................................................54
C Sample program output ....................................................................................57
D Project Schedule...............................................................................................58
E Survey responses.............................................................................................59
F Points from December review...........................................................................61
G Project Diary ....................................................................................................63
H Overview of wxWindows .................................................................................68
I Overview of procmail .......................................................................................69
J Overview of Pine..............................................................................................70
K Sample MIME message....................................................................................71
L MIME types.....................................................................................................72
M Code Samples...................................................................................................73
Craig Cockburn, M.Sc. report, 1994 4
Acknowledgements
I would like to thank the following people for their assistance with this project
My project supervisor Alison Crerar for her support, guidance and comments and
particularly for the thorough reviews and extensive comments on this report.
Julian Smart at the Dept of Artificial Intelligence, University of Edinburgh for help
with using his wxWindows tool.
Neil Rumney for his help with keeping the link between the Department of Computer
Studies and the Internet running so that I could access discussion lists and for installing
a considerable amount of software on the Suns.
Finally, all the people on Internet who have replied to my mail messages and postings
to news with questions related to this project.
Craig Cockburn, M.Sc. report, 1994 5
1.
Introduction and background
The author of the project has been interested in the subject of Electronic Mail for some
time. For several years, he has jointly run a world-wide language teaching mailing list.
(GAELIC-L). In 1992 he founded The UK Internet List, the first guide to the UK's
Internet providers, and has been using electronic mail in various forms for over 10
years. At Digital, he used world-wide E-mail on a daily basis as part of his job. It was
through seeing the usefulness of this powerful business medium that the motivation
arose to research future mail developments for this project. The project was the
author's own proposal.
As the result of being on several mailing lists, ranging from the purely technical and
work-related to others of a more recreational nature, the author receives about 40 mail
messages a day. Many of these are automatically generated by the software controlling
the GAELIC-L list. There is a danger, when receiving many messages of differing
priority, that some important ones may get lost in the volume. In just the same way
that many managers have a secretary to prioritise and sort their incoming letters, so it
is useful to have a program handle incoming electronic mail in a similar way. The
author has previously used a simple mail filterer and found it very useful, but it was
easy to enter the wrong command and find that all the incoming mail was being
automatically deleted. This accidental error could prove very costly in business and so
it was decided to implement a system that was much more fail-safe.
Although mail filterers have been in use for several years (e.g. Deliver [ER3], ELM-
Filter [ER4]), none have addressed issues concerned with the draft MIME standard. It
is likely that MIME will become a full standard by the end of 1994. MIME covers
more than just Multi-Media however, it can also be used to send and receive structured
messages and foreign character sets.
Research into filterers also revealed that there are were no known free filterers that
have a user friendly interface. To address this issue, the filterer implemented for this
project was designed to be easy to use. Two important considerations were that the
filterer should provide simple access to the most frequently used actions which users
require, whilst allowing more specialised users access to other, less frequently used
facilities.
Procmail (see Appendix I) was identified as a tool capable of supporting filtering based
on the draft MIME standard. This tool was chosen because it was recommended as
being very powerful, it was free, it runs on UNIX and is not tied to a particular mail
tool.
When the author started to research MIME based filterers, there was very little
information available. Indeed, one of the developers of a MIME compliant mail tool
wondered what the usefulness of such a tool would be. The idea of designing a filterer
with MIME in mind seemed to be novel. The author is fairly sure that when the project
started there were no other user-friendly free MIME filterers.
Craig Cockburn, M.Sc. report, 1994 6
The project was undertaken by the author and was assisted by staff in the Department
of Computer Studies. In the early stages of the project, assistance was also provided
by the Computing and Communications department of Washington University in
Seattle who developed the MIME based mail tool, Pine (see Appendix J for more
information). Pine was used to investigate Multi-media mail issues, including header
contents and message structure.
Various people on the world-wide electronic network, the Internet, assisted with the
project. Two discussion lists were particularly useful. One list was for the wxWindows
product by Julian Smart at Edinburgh University (see Appendix H) which was chosen
as the development environment, and the other for the procmail product by Stephen R.
van den Berg at RWTH-Aachen, Germany (see Appendix I) to understand how the
mail filtering worked. Access to these lists has proved invaluable, particularly as the
author introduced both wxWindows and procmail to the Department during the
project. This meant that there was no one in the Department who knew anything about
the products, and so support from experienced users elsewhere was essential. The
author also pioneered student access to usenet newsgroups and became the first
student at Napier University to access usenet. Access to the object oriented usenet
newsgroups has also proved a valuable source of information.
1.1.
Aims
The following were identified as aims of the project:
i. To research traditional mail filtering.
ii. To research mail tools which support MIME.
iii. To use a MIME based mail tool and a traditional mail filterer to implement a new
prototype filterer capable of supporting MIME.
iv. To ensure the prototype implemented meets the needs of users, including
functionality and usability.
v. To apply the skills learnt during the M.Sc. to a large Software Development
project and to research and solve problems associated with implementing Object
Orientation.
2.
Research
Much of the research for the project was carried out by non-conventional means. The
primary research was carried out by identifying key usenet newsgroups to ask
questions in, such as comp.object for Object Oriented (OO) related questions,
comp.lang.c++ for C++ related questions and comp.mail.mime for questions related to
MIME. By asking questions in these forums and having electronic discussions, it was
possible to research issues by locating relevant papers, files, key individuals and
Craig Cockburn, M.Sc. report, 1994 7
research establishments. Recommended papers not available on-line were obtained by
inter-library loan.
To carry out research on subjects that are very new, it was apparent that using libraries
alone was not going to yield enough current information. Therefore access to usenet
newsgroups had to be obtained to ask questions, learn about announcements and
participate in discussions. As Napier University was not projected to have Internet
access available until Spring 94, facilities at Edinburgh University were identified as a
means of conducting this on-line research. The author became the first student in
Napier University to have access to such resources.
2.1.
Project History
The project has undergone several major revisions before being accepted in its present
form. In October, the proposal was to port the MIME-based mailer Pine to a windows
environment, to make it easier to use. However, porting Pine proved to be too big and
difficult a problem, particularly as the underlying Pine code was still rapidly changing.
Four Pine update releases were issued during the development of the present project.
The Pine developers at the University of Washington in Seattle suggested that a filterer
for Pine would be a useful self-contained addition to the product. Having used filterers
before, the author could see the benefits of such a system and so agreed to take this on
as the project. This proposal was put to the Department and accepted as a project on
22nd October 1993, giving a total of 14 weeks for the project to be completed. As
time was so limited, it was crucial to thoroughly plan the project and identify
milestones at the outset. To ensure that these milestones were achieved, a schedule
was written, and is detailed in Appendix D. A copy of this schedule was given to
Alison Crerar the project supervisor, so that she could monitor progress.
2.2.
Prior knowledge
Although the author had used E-mail for some time before joining the course, this was
exclusively on VAX/VMS machines. When the author started the LSSD course, he
had not used any of the following systems before: PCs, UNIX, C++, Borland's
Resource Workshop, procmail, wxWindows or Microsoft's Word for Windows.
Although the author had previously used mail filterers, he had never heard of MIME
before - this was first encountered while researching for an option taken earlier on in
the M.Sc. In addition, the author had not met with Object Oriented methods previously
and only had a basic knowledge of traditional analysis and design. Virtually every
module on the course, and in particular the research project, has been undertaken with
no prior knowledge at all. Many of the resources used on the project, wxWindows,
procmail, MIME and Pine were introduced by the author to the Department and so
there was no local access to any Departmental experts on these topics, all help had to
be sought via the external network.
As a result of coming across so many new subjects during this project, many of which
were not formally taught as part of the LSSD degree, the author has had to carry out a
Craig Cockburn, M.Sc. report, 1994 8
great deal of personal research to identify issues, relevant papers and software.
Although the author wrote part of a windows application as part of the Post Graduate
Diploma part of the course, much of the windows programming was handled by other
team members. To overcome the considerable learning curve for so many different
aspects, a development environment was chosen which would allow the rapid creation
of a windows application without having to become an expert in Windows
programming first.
2.3.
Filterer research
As Napier University is not fully on the Internet yet, many staff and students at Napier
University are not aware of the benefits or indeed pitfalls of full Internet connection.
For example, the volume of mail for most users within Napier University is
manageable. The problem of excessive volume and managing large quantities of mail
however becomes apparent when messages from outside the University are considered
too, and when additional messages arrive as the result of postings to news, then the
volume increases still further. These overloading problems are mainly due to the very
large volume of users, interest groups and discussion lists on the Internet. Conservative
estimates of Internet usage [ER7]
1
put the amount of Internet traffic at about 60Mb a
day of news. With the rapid growth of the Internet (9% a month), and its continual
development [Press93] the volume of material on the Internet is not only growing
rapidly, but is becoming more complex too. A few years ago, the information on the
Internet was simple text, whereas today it is possible to send sounds and graphics by
electronic mail, with ease.
Recently, the Internet has been getting considerable publicity and with the recent
publication of many books on the Internet it seems that the Internet is going to become
much more a way of life. E-mail addresses are becoming more common on business
cards and even non-computing publications are including articles about the Internet
and mentioning it on their covers
2
. With increasing numbers of non-computing users
connecting to the network [Pope94], it seems the 9% growth per month is set to
increase considerably.
There is clearly a major problem of "information overload" that is getting worse, and
tools are required to help people to manage the huge volume of material which is
available. Moreover, the problem is not confined to mail, usenet news carries about
60Mb of news a day, and the only support most newsreaders have for filtering
messages is simply to delete ones which match certain criteria. It is apparent however,
that news and mail technologies are merging and some mail readers such as Pine offer
an interface to usenet newsgroups, as the MIME standard also applies to news articles
on usenet. Clearly any lessons and applications based around E-mail filtering could also
apply to usenet news.
______________________________
1
References of the form [ERnn] can be found in the "Electronic References" section.
2
The National Information Infrastructure (NII) is a future development for the Internet
and was on the cover of Time, 12-Apr-93 and Newsweek 31-May-93
Craig Cockburn, M.Sc. report, 1994 9
A filterer may help to correct what has been called the "productivity paradox"
[Constant93]. That is, that despite the huge investments in Information Technology,
the expected huge productivity improvements have not been realised. A possible
reason for the paradox is stated as "IT is being used to provide managers with a
greater sense of control, without actually improving their decision making". This idea
is also mentioned in [Brynjolfsson93] who states:
A valuable heuristic in 1960 might have been "get all readily available
information before making a decision." The same heuristic today would lead
to information overload and chaos. Indeed the rapid speedup enabled by IT
can create unanticipated bottlenecks at each human in the information
processing chain. More money spent on IT will not help until these bottlenecks
are addressed.
The aim of a filterer is ultimately to help people decide which messages to read, their
priority and how they should be presented. This structuring of mail will go a long way
to improving its manageability and helping people at all levels of an organisation to be
better informed.
A filterer works by processing electronic mail and performing some action based on
various properties of the mail message. This action could include forwarding the
message to other users, filing the message in a mail folder, running an application,
printing the message or attaching fields to the message so that it can be prioritised or
handled by an advanced mail reader. Probably the most popular use for a mail filterer
though is to "answer" mail while the user is away on holiday or a business trip. Such a
facility sends back a standard reply containing a message, usually explaining that the
person is away and when they are likely to return.
A mail filterer is activated by running a set of rules on the mail messages. The rules are
composed of two parts, the matching criteria and actions to perform. The matching
algorithm compares information about the mail message, usually stored in the mail's
header fields to decide whether a rule should apply to the mail message. The action to
perform specifies what the mail filterer should do with the message if the match criteria
hold true for the message. Some simple filterers are batch jobs which run at regular
intervals and process messages in the user's mailbox, however such filterers have the
dual drawbacks of activating even if there is no mail to process and of causing delays
in the processing of messages. Therefore, the filterers presented in this document are
all of the kind which do not run at regular intervals, but are instead called on-demand
when a message arrives for the user.
To fully understand how the functionality of a filterer can help users, a few examples
will be given to illustrate the uses of a filterer.
The author has been working at home using a modem to keep in touch with Napier
University and mailing lists for the last 8 weeks of this project, and is amongst the
Craig Cockburn, M.Sc. report, 1994 10
growing numbers of teleworkers
3
. A filterer is of use to teleworkers, particularly the
self employed, who have to pay their own phone bill. By having a mail filterer it is
possible to restrict the messages that are downloaded. The author has used a "kill file",
a simple form of filtering mechanism, to prevent very large messages from being
downloaded as these would tie up the phone for a long time and run up a large bill.
Many people have been using kill files for news articles for a long time, but some mail
systems now support kill files too. The author also inadvertently sent a 120K MIME
message containing graphics to an experimental MIME based list, without realising
that the list was based at a site that paid 9p per kilobyte for international E-mail. With
a dozen people on the list, this message could easily have cost 9p * 12 * 120 = £144 to
forward on to all recipients if it had not been detected. Clearly, many telecommuters
would welcome a facility which could save this much money automatically.
Instead of simply deleting long messages, a more intelligent mechanism might be to
automatically redirect long messages to an account based at work or to extract just the
text portion from a message containing sound and graphics. The facilities people have
at home may also be less sophisticated than the facilities available at a University or
computer company. It unnecessarily adds expense to a user's phone bill if they are
paying to down-load a multi-media message (often over 500K bytes) containing sound
and graphics to a machine that does not have the capability to display the graphics or
play the sound in the message.
Another possible use for people with computers at home is to use a filterer at the office
to forward mail which is likely to be non-work related onto their electronic address at
home. This saves time in the office sorting the work related from the non-work related
messages and allows non-work related messages to be replied to at leisure in personal
time at home. In addition, messages that the teleworker is likely to want to read at
home as well as in the office can easily be copied by a filterer so that work can
continue at home if necessary.
Experienced mail filterers seem to adopt strategies for managing their mail.
[Mackay89] reports three main strategies, namely:
i. Keep it all
ii. Move unimportant messages
iii. Move important messages.
The strategy (i) is not explained in the article, but is likely to include appending
characteristics to mail messages (e.g. additional headers) or printing out particular
messages. Neither of these actions has been implemented in this project, although
procmail itself is capable of doing this.
______________________________
3
Working at home accounted for 45% of all new jobs from 1987 to 1992.
[Source: Deloitte & Touche. Printed in: Atlanta Constitution, 2-Jan-94]
Craig Cockburn, M.Sc. report, 1994 11
Strategy (ii) moves unessential messages out of the in-box and uses the in-box as a
store for unprocessed messages and things to do. Rules are used to identify low
priority messages and to move them into folders or delete them. This prevents low
priority messages from building up and cluttering a user's in-box.
Strategy (iii) involves writing rules to recognise high priority messages and moving
them to a "priority" folder. Many people classify mail that is addressed personally to
them (as opposed to mail received from a distribution list) as important. All these users
disciplined themselves to read the "priority" folder first. The article reports that one
user who was initially distrustful of mail filtering on incoming mail decided to create
two rules to manage all his mail. One rule identified all personally addressed mail and
moved it to a "priority" folder and another identified all mail related to a conference he
was running and this rule sent all these messages to the conference administrator. This
user said that this strategy was "very, very useful", that his mail before was "out of
control" and that using these two rules "changed my life".
2.4.
Existing Filterers
To find out what tools are available, questions were asked on a number of usenet
newsgroups and other on-line resources were located and searched. By querying the
HCI Bibliography Database [ER8] under the keyword "rules" and "filter", a useful
study was found on how people use mail filterers, the kind and number of rules they
define and the way they are grouped [Mackay89]. This paper mentions that moving
and deleting mail messages are the two most common uses for a filterer and that the
majority of users prefer to have rules execute automatically as mail arrives, rather than
applying rules retroactively to mail which has already been delivered.
In October 1993, the author spent two days in Seattle, Washington visiting the Pine
development team. During discussions, it became apparent that a mail filterer could be
considered as a completely different program to the Pine mailer itself. Rather than
making the filterer integral, a filterer could simply run between the incoming mail
daemon which accepts the messages on the system and the Pine mailer which allows
the users to read and reply to mail. With many filtering actions, it is possible that the
mails might never reach Pine at all, this would be the case if the message being filtered
was being forwarded to another system, or being piped into an application or being
deleted. It seemed sensible therefore not to consider the filterer to be a part of Pine,
but to consider writing it as a separate program which would generate output not only
for Pine but for other mail systems too. To fully evaluate the implications for writing a
separate filterer, two mail systems [ER5] [Mackay89] were studied which do have
integrated filterers to understand the possible disadvantages of not implementing an
application integrated with Pine.
The outcome of researching into separate filterers against integrated filterers resulted
in the following key points:
Craig Cockburn, M.Sc. report, 1994 12
Advantages of a combined mail reader and filterer
i) The mail reader and the filterer are likely to have a consistent and integrated user
interface. This is the case with Lotus's cc:Mail V2.0 [ER5] which was evaluated
and which has filtering capabilities. The method of entering rules is very much with
the same "look and feel" as using the rest of the mail system.
ii) Many users think of rules whilst reading mail [Mackay89]. An integrated filterer is
therefore likely to be quicker and more convenient for users to access and add rules
to when reading messages.
iii) The mail reader is able to view "deleted" messages. In The Information Lens
[Malone87a] [Dix93], messages that have been deleted are still accessible by the
user, but they are presented with a line drawn through them. This allows the user
to verify that the correct messages are being deleted and allows "deleted" messages
to be retrieved if necessary. If the filterer is independent, then messages are nearly
always deleted before reaching the user's mailbox and the user never sees them.
iv) An integrated filterer is able to access the functionality in the mail reader that deals
with handling structured MIME messages and decomposing them. This allows
much more sophisticated handling (e.g. deleting part of a message)
Advantages of having the mail reader and filterer as separate applications
i) A separate filterer is not dependent on one mail system, it can work can work with
whatever mail tool the user prefers to read their mail with (e.g. Pine, ream, Elm)
ii) It is easier to build a separate mail filterer than it is to write a standalone
application, due to the increased dependencies. This is particularly true when the
mail reader has bugs and is still being developed.
iii) A separate filterer is usually more easily ported to other platforms, as it is smaller.
iv) A separate filterer means that the filtering can take place on a system that the mail
reader might not run on.
v) A separate filterer does not require integration with the existing code. This can be
particularly problematic if the two applications are written in different languages
and under different paradigms (e.g. OO and non-OO). Research conducted in this
area showed that many people who had tried to implement OO code on top of
existing non-OO code had to abandon the project and rewrite everything in OO.
This is of particular issue with this project as Pine was written in C and is non-OO.
Interestingly, the author does not know the language which procmail is written in
as it has never been necessary to know this.
vi) A separate filterer does not require the same learning time to write, as it is not
necessary to learn how to interface with the mail reading code.
Craig Cockburn, M.Sc. report, 1994 13
Having considered the advantages and disadvantages, it was decided that an integrated
mail filterer can potentially offer more functionality but the logistics of writing an
integrated filterer were outweighed by the need to minimise difficulties associated with
integrating an OO application with a non-OO application and having to understand the
internals of a volatile mail reader. Therefore a decision was reached to write a filterer
as a separate application.
2.4.1.
Diagram of a mail system with filterer
The following diagram shows where a separate mail filterer fits into the mail system
and demonstrates some of its capabilities.
Network
Network
F
i
l
t
e
r
files
Applications
Mail tool
user
Gateway
Email
Gateway
mail
file
other
users
mail tool
user
Conventional mail system
Mail system with filter
The decision to separate the filterer from Pine resulted in the project proposal detailed
in section 1.1 being revised on the 8th of November. This revision was to make a
separate mail filter the key deliverable of the project.
Writing a separate filterer from Pine also meant being no longer dependent on the Pine
development team, and this lessened risks associated with the project. From the
logistical point of view, there was also insufficient Internet access from Napier
University to write extensions to Pine. Pine is approximately 4 Mb and the only ways
of receiving updates to the code would either have been to have them posted on disk,
sent by FTPmail or copied to Edinburgh University and downloaded to several floppy
disks from there. Having used FTPmail once to install Pine for evaluation purposes, it
was evident that this was an unacceptable method of working as the mail gateway
Craig Cockburn, M.Sc. report, 1994 14
resulted in the files being split into dozens of mail messages that have to be manually
edited and reassembled in the correct order. This method was tried once and it was
decided that it was not feasible to use FTPmail again. Both of the disk options would
also have been very time consuming.
A number of filterers exist already, and these have been categorised here by whether
they have a Graphical User Interface (GUI) or whether they are dependent on the user
editing a text file and writing the rules manually. A useful review of GUI E-mail
packages was published in [Collin94]. Three of the five products reviewed in this
article have integrated filterers.
2.4.2.
Text based mail filterers
i. Deliver [ER3]. This is a tool that the author has used previously, but which only
runs on VAX/VMS. Filtering is via Boolean logic and is limited to the fields in
VAX/VMS mail, namely "from" "to" and "subject".
ii. Procmail (see Appendix I). This is the tool chosen for the project for reasons
mentioned earlier.
iii. Elm Filter [ER4]. This filterer is based on the popular Elm mail tool. This
filterer only allows filtering on the "to" "from" "subject" fields, the size of the
message and the message content. There is no facility to understand MIME.
2.4.3.
GUI based mail filterers
i. The Information Lens [Malone87a], [Mackay89]. This provides a forms based
interface and allows messages to be filtered based on Boolean logic match criteria
on the following header fields: "from" "to" "cc" and "subject" as well as the
message contents.
ii. The Andrew Message System (AMS) [ER6]. Little information was obtained
about this system, however it is of particular note as it is the only mail tool known
to be capable of splitting messages and processing components of a composite
MIME mail message. If a user sends a message with non-text components to a
non-AMS recipient, AMS can cut out the non-text and replace it with a message
indicating what was removed. This intelligent processing of MIME messages could
prove very useful for instance to direct just the text components of a MIME
message to a home based mail address.
iii. BeyondMail 2.0 and BeyondRules. [Collin94] states "BeyondMail broke all the
rules in its first version. It included what everyone needed but didn't realise they
did- intelligent, programmable rules". This product has by far the most
sophisticated rule handling of any application studied, and includes system wide
rules that apply to all users, rules that become activate or inactive over time and
reminder rules. The filtering interface to the program is the only known commercial
filtering application, BeyondRules. BeyondRules offers filtering capability to
Craig Cockburn, M.Sc. report, 1994 15
Microsoft Mail 3.2 which does not have rules. Mail filtering in BeyondMail is cited
in [Lindholm93] as one of the three key goals necessary for a commercially
successful product.
A key point learnt from conducting research into filterers is that it is quickly becoming
the norm for commercial mail tools to have either a filterer built in, or have an optional
add-on filterer. Mail tools that do not have filterers are now regarded as commercially
inferior.
2.5.
MIME
MIME is a new protocol designed to handle many of the shortcomings with existing
mail, which is limited to sending US-ASCII characters. The MIME standard was
officially published in June 1992 as RFC1341. [ER1].
For an example of a MIME encoded mail message, see Appendix K. The most
important header in the message is the one that explains the Content-Type. The
example in the appendix is MULTIPART/MIXED. This means that the type is
MULTIPART (meaning the message has more than one component) and the types of
those components (the subtypes of the message) are MIXED. For a full list of MIME
types and subtypes, see Appendix L.
Although the implementation of MIME does not require a great leap in technology,
there have been several failed attempts at introducing a MIME standard. MIME itself
was designed for graceful inclusion in the Internet protocol suite. It does this by not
building an entirely new protocol but by adding features to RFC822 mail. This is
called a bottom up approach by Nathaniel Borenstein, the author of the draft MIME
standard. Earlier experimental models for Multi-media mail (RFC767, RFC759) took a
different approach by building a new transport and document format that did not
behave compatibly with the existing mail protocol (RFC822) and would have required
disposing of a popular and working model for mail. Ensuring backwards compatibility
may not always result in the most academically pleasing implementation (compare the
evolutionary C++ with Smalltalk), however an evolutionary approach is usually more
likely to result in a working implementation and one that is widely accepted.
It was considered important to investigate mail tools that supported MIME so that one
could be installed in the Department to generate MIME mail messages and to provide
a means of testing out a MIME based filterer
The main source of information for MIME based mail systems was the
comp.mail.mime Frequently Asked Questions list (FAQ) [ER9]. This is nearly 80Kb of
extremely useful information and has a section on commercial and freely available
MIME products, including mail systems and news readers. Another excellent source
was an M.Sc. report by Magnus Hedberg which covers Multi-media mail systems and
Asynchronous Computer Supported Cooperative work [Magnus92].
Craig Cockburn, M.Sc. report, 1994 16
From the research carried out into MIME mailer systems, the Pine system was chosen
as there was no other system detailed which matched the hardware available in the
Department and which was easy to use. The installation kits for the PC version and
UNIX versions of Pine were obtained and Pine was installed on a PC in the
Department. Pine was then configured to send messages to the Departmental Suns and
the external network to evaluate the system more fully. Once Neil Rumney had
installed the required software on the Sun, the UNIX version was also configured.
Word of Pine's capabilities and user-friendly interface soon began to spread and it has
now become the preferred mail tool of many people in the Department, particularly
those who are new to electronic mail. The author now uses Pine on a daily basis, and
has configured it to send and receive accented characters such as á, è, í, etc. This has
made conversing in languages other than English much more convenient as accented
characters can be sent and received through the mail to other MIME users without
corruption.
2.6.
Development environment
Four commercial packages for development under Microsoft Windows were available
in the Department. These were:
i. Visual Basic
ii. Visual C++
iii. Borland C++
iv. Asymmetrix Toolbook
Visual Basic and Asymmetrix Toolbook are ideal for rapid prototyping of user
interfaces, however neither supports Object Orientation and so neither was considered
suitable for a project of this size. It seemed that for the purposes of the project, there
was no difference in the suitability of Borland C++ or Visual C++ and so Borland C++
was chosen as the author had used Borland C++ to develop two applications earlier in
the course.
Not having chosen Visual Basic or Asymmetrix Toolbox caused a major problem in
that the user interface is a major part of the program and Borland C++ does not
provide a good environment to quickly develop a complex user interface. Therefore,
an environment had to be found which would allow rapid development of the user
interface in the 14 weeks available to develop and document the project.
A request was posted to the Internet newsgroups comp.object and comp.lang.c++ to
see if there were any suitable applications that could be used with Borland to assist
with rapid prototyping. Although these groups are distributed world-wide, the only
reply received was from Julian Smart at Edinburgh University AI Department, and
mentioned his wxWindows tool. WxWindows is a multi-platform C++ development
environment designed to help users write portable code and hide users from many of
Craig Cockburn, M.Sc. report, 1994 17
the difficulties of windows programming. WxWindows also allows HCI based
applications to be developed quickly as it provides its own well-documented class
library. Although there would be a learning curve associated with wxWindows, the
demonstration programs of wxWindows showed that it could provide the required
functionality and seemed an ideal choice. WxWindows also had the benefit of being
compatible with the Department's existing mail platform (UNIX) and the Department's
proposed mail platform (PCs). A further benefit is that plans are underway to port Pine
from MS-DOS to Microsoft Windows. By already having a MIME compatible filterer
in Windows, it could integrate well with Pine when the Windows version of Pine is
complete.
A decision was therefore made that wxWindows would be used as the development
environment. However, there were two problems with wxWindows, namely that no
one in the Department knew anything about it (therefore I was dependent on the
Internet for help) and the other problem was that wxWindows was developed for use
with Turbo C++. This resulted in the author becoming the first person to port
wxWindows for use with Borland C++.
It is certain that without wxWindows, the tool would not have been developed as
quickly as it has been.
3.
System specification
To specify the system that was to be built, it was necessary to investigate the
requirements of potential users. This would ensure that the aim of designing a system
that provided the most useful functionality was fulfilled. From these user requirements,
the software and hardware required for such a system was then identified.
3.1.
User requirements
Clearly with the information overload mentioned in section 2.3, some tools are
required to manage the volume of mail. Time is valuable, and the more a computer can
assist people to do their job, the more productive they are likely to be. However, it was
first necessary to establish the functionality that a mail processing tool should provide.
No one in the Department is known to use a mail filterer at the moment. Procmail has
not been announced to the Department and no other filterers in the Department are
known to exist. This is likely to change however soon, when Napier University
becomes fully connected to the Internet and more and more people start to use the
Internet and participate in newsgroups. Asking questions in newsgroups, and posting
notes to newsgroups can generate many replies via E-mail, some of a low-priority
personal nature and others of a high priority work related nature. It has been
interesting to note that while this project was underway, a great increase in usage of E-
mail within the Department and use of the Internet has taken place.
Craig Cockburn, M.Sc. report, 1994 18
To fulfil the aim of giving users access to the most frequently used filtering actions, the
author conducted a survey in the Department to determine which features would be
the most useful to implement. Three replies were received from members of the
Department who receive large amounts of mail, and their replies are summarised in
Appendix E. The action "move message to a folder" was rated as priority 5 (the
highest) by all recipients. Other actions that were rated highly include auto-replying to
messages and forwarding messages to other users. Whilst the results for this survey
were useful, it was felt that as the people surveyed had never used a filterer before, it
was necessary to carry out additional research.
To anticipate the needs of Departmental users once they had become experienced
users, surveys of actual usage of experienced filterer users were sought out. One
survey [Mackay89] agreed with the responses from the Department and showed that
moving messages was a popular feature. 57% of rules involve filing a message to a
folder based on the recipient field. The next most common rule in [Mackay89] was
deleting messages, with 28% of the rules in the sample being used to delete. This
contrasts with the Departmental responses which indicated that there was very little
demand for automatically deleting mail. This difference can perhaps be explained by
the fact that if people have never used a filterer before then they are probably reluctant
to trust a filterer to delete messages, whereas if they use a filterer frequently they can
more readily "trust" the filterer to delete genuinely unimportant messages.
In [Mackay89], it is also reported that in a sample of 13 users, they generated 190
rules between them and each user had between 2 and 35 rules. With 35 rules, it seemed
sensible to consider whether groups of rules would be related to a particular
"scenario", such as working in the office, being away on a business trip or being on
holiday. If a user was on holiday, then they might want to invoke a set of rules that
deleted all mail from certain distribution lists, forwarded certain work related mails to
other team members and auto-replied to others. However, if a user was working in the
office then they might want a different set of rules. It was therefore decided to group
rules into scenarios that could contain sets of rules. These scenarios could help users to
manage their rules more effectively, particularly since sets of rules could be quickly
activated and deactivated simply by activating or deactivating the scenario.
As a result of research carried out into mail filterers, particularly Deliver [ER3], it was
realised that the order of rules and scenarios was important. Consider a scenario with
two rules, one which saves messages from a mailing list into a file and another rule
which auto-replies to mail while you are away on holiday. If you receive a message
which matches both rules, then it is likely that you would not want an auto-reply going
back to the mailing list and possibly hundreds of users on that list. Therefore it is
important to save the message to a file and to stop processing at that point so that the
auto-reply function is never called. This means that the "save to file" rule must come
before the "auto-reply" rule. As a result of the importance placed on rule ordering, it
was important to implement a means of examining the order of rules and scenarios and
to change the order if necessary.
Craig Cockburn, M.Sc. report, 1994 19
3.2.
System requirements
The hardware required to implement the project was identified as being:
i. The Departmental Suns, as this is the hardware platform which people
currently use for electronic mail
ii. A PC connected to the Suns via PC-NFS. The filterer was not implemented on
a Sun as there were indications that people would rather use their PCs for mail.
However, the filterer implemented is designed to be easily ported to a Sun.
The aim was to write a system which people could run from PCs to process mail on
the Suns as it arrives. However if the Department does not migrate to PCs for mail,
then the application could still be used on the Suns under X without major
modification.
The software requirements for this project were as follows:
i. The C++ development environment was chosen as Borland C++ V3.1 and
wxWindows for reasons mentioned in section 2.6
ii. An underlying system capable of filtering mail. This was identified as procmail,
iii. Windows 3.1 running on a PC.
iv. A tool capable of generating icons and bitmaps for use with the application. This is
Borland's resource workshop V1.02
v. A system capable of receiving mail messages from various sources. The
Departmental Suns were used to fulfil this.
vi. Network software to allow the PC where the application is running to modify files
on the Sun, where the mail is processed. PC-NFS was used to achieve this.
vii. Word for Windows V2.0 to generate this report
viii. OMTool V2.0 from the GE Research and Development Center to generate the
diagrams for the object models.
ix. Paint Shop Pro V1.02a by JASC Inc, Minnetonka, Minnesota. This was used to
transfer the models in OMTool into this document by screen capture.
Craig Cockburn, M.Sc. report, 1994 20
4.
Analysis and Design
4.1.
Research of Analysis and Design Methods
The Rumbaugh method [Rumbaugh91] was the main OO analysis and design method
taught on the LSSD course. This method seems to be one of the major methods in use,
and offers a useful way of decomposing the problem into an object model, a dynamic
model and a functional model. However, this approach has a major weakness in that it
results in three models that are difficult to integrate. Michael Blaha, one of the co-
authors of [Rumbaugh91] sent the current author the following message regarding
integrating the models:
Our integration of the three models in the book is incomplete and
unsatisfactory. We openly acknowledge this. We have improved integration in
our tutorials and will incorporate the new ideas in our future books. Our
current understanding of integration of the models is much better than in our
book, but quite honestly still has much room for improvement.
This weakness in the Rumbaugh method has resulted in several papers attempting to
integrate the Object, Dynamic and Functional models [ER10], [Hayes91],
[D'Souza93]. However, the author believes that building three models and only
mentioning objects in one of them is not the most suitable method for OO Analysis and
Design. In the case of this project, the Rumbaugh method was also not considered
suitable as the project can be viewed as a form of translator, taking input from the user
and translating it into procmail code. This means that after the objects, much of the
work is of a functional nature transferring data from one form to another. Rumbaugh
however, places the functional model last in the analysis and design phases and so
places low importance on this model.
The process of combining three different models is noted in [Monarchi92], here it is
classified as the "combinative approach" of Object Oriented Analysis and Design. This
article also mentions the "pure" approach, which the author of this report favours. The
Booch method [Booch94] is an example of the "pure" OO approach, where instead of
developing three different models, the object is kept central to the analysis and design
and aspects of that object are added to the object at various stages.
4.2.
Outcome of Analysis and Design Research
The key steps were taken from the Booch method and integrated with the steps
detailed in [Henderson-Sellers93]. These steps were then classified to produce the
following three stage method for OO Analysis & Design:
Craig Cockburn, M.Sc. report, 1994 21
A) Object identification stage
i. Identify candidate object classes (usually nouns in problem domain)
ii. Identify class attributes
iii. Identify operations provided by and required of each class
B) Object relationship stage
i. Establish associations between objects. These can be established by running
through interaction scenarios or "use cases" [D'Souza93]
ii. Identify aggregations between objects
iii. Identify generalisations between objects
Repeat stages A and B twice. On the first repetition add meta-classes representing the
solution domain and their relationships with the problem domain. These meta-classes
are shown in section 4.4.1. On the second repetition add candidate objects in the meta-
classes.
C) Object definition stage
i. Evaluate the outcomes of stages A) and B) and redesign and optimise as necessary.
Any recurring patterns in the code should be identified and considered for
optimisation. For details of the kinds of patterns seen in Object Oriented code, see
[Coad92].
iii. Implement the object classes
As a rough guide, steps A) and B) define the code that will appear in the header file,
and step C) defines the code in the main code file associated with the class (usually .cc
or .cpp).
It was this stepwise development of objects which was used to perform the analysis
and design for this project. This method results in objects always being kept central to
the method and so does not result in three unrelated models like Rumbaugh. However,
it does make it more difficult to optimise the implementation from the functional or
dynamic view but this is not considered to be a major problem.
A method consists of two parts, the "process" and the "notation". The author considers
the notation for Rumbaugh to be powerful and concise for the Object Model, whereas
Booch's method has been strong on process and this is taken further in the latest book
describing Booch's method [Booch94] which has approximately twice the space
devoted to describing the process as the previous edition. As the author is unfamiliar
with using the Booch notation [ER11], and as there are no tools in the Department
that support the Booch notation, the Rumbaugh notation has been used to develop the
graphical models illustrated in this report. As a full object model detailing every object,
attribute and operation would be far too complex to draw, "layering" [Henderson-
Craig Cockburn, M.Sc. report, 1994 22
Sellers93] has been used instead to show the model at a comparatively high level of
abstraction.
4.3.
Analysis
The identification of objects to model for analysis was accomplished quickly. The
analysis was achieved by a "bottom-up" approach to identify the objects and their
relationships. The final program is a means of generating a procmail script, and so it
was the elements of a procmail script that were taken as the first candidates for
analysis. A procmail script consists of match criteria and actions and so these were
identified as the first objects in the analysis phase. The rule object was introduced next
as a means of grouping together the match criteria and actions. Objects were then
considered which would have an association with rules. These objects were then
considered for associations with other objects and the process repeated until a stable
model was reached. This resulted in the first iteration of phases A and B detailed in
section 4.2 producing the following diagram representing the object model for the
problem domain.
4.3.1.
Analysis diagram
Craig Cockburn, M.Sc. report, 1994 23
4.4.
Design
Whilst Analysis is concerned with modelling objects in the problem domain, design is
concerned with modelling objects in the solution domain. This solution domain
includes all the semantic classes in the problem domain, but also adds classes dealing
with "interface", "application" and "base/utility" [Monarchi92]. Design can also cause
the classes identified during analysis to be redesigned or extended if abstractions are
found.
Using the first iteration of the Analysis and Design method described in section 4.2, the
meta-classes for "application", "interface" and "base" classes were added. The outcome
of this stage is shown in the following diagram.
4.4.1.
Overview design model
Overview Design Model
Myform.h
Myform.h
Procapp.h
wxWindows
library
(list handling etc.)
The advantage of separating the problem domain classes from Interface and Utility
classes is that the Interface and Utility classes are likely to be much more dependent on
the hardware or software platform used for the final implementation. As a result, by
having implementation specific code in these classes, it becomes easier to swap these
classes for other classes if the resulting application is to be ported to a different
hardware or software platform. The implementation of the problem domain classes
should be independent of the hardware or software platform chosen for the solution.
An interesting outcome of the overview design phase was that a 1-1-1 mapping
emerged between many of the classes in the problem domain, the classes in the user
interface domain and menu items and tool bar items in the application interface
domain, as shown in the following table
Craig Cockburn, M.Sc. report, 1994 24
4.4.2.
Table of correspondences between design meta-classes
Problem domain class
User Interface domain class
Menu item
Rule
EditRule
Rule
Scenario
EditScenario
Scenario
Rule Match Criteria
EditRuleMat
Called via "rule" menu
Rule Actions
EditRuleAct
Called via "rule" menu
Scenarios
EditScenarios
Scenario "list" option
Rules
EditRules
Rule "list" option
The second iteration of the OO A&D method detailed in section 4.2 resulted in the
following two detailed design diagrams and the introduction of abstract classes such as
"MyForm" to allow generic handling of forms for all input. The final iteration of the
Analysis and Design method detailed in section 4.2 produced the detailed design
diagrams that follow in sections 4.4.3 and 4.4.4.
4.4.3.
Design model for problem domain and interface classes
Supplied by wxWindows library
All classes inherited from this point
were defined by the author in procapp.h
Interface boundary
Problem domain classes
Craig Cockburn, M.Sc. report, 1994 25
4.4.4.
Design model for application interface classes
Diagram showing implementation of design model
from the application implementation perspective
Activates
Activates
Classes defined by the author are "MyFrame" and "Ribbon", and these are
declared in the application interface header, procapp.h
wxFrame, wxToolBarTool and wxToolBar were inherited from the wxWindows library
4.5.
HCI aspects
Initial design of the user interface was achieved by testing various layout scenarios for
the forms and menus on paper. These ideas formed the basis for the initial forms to be
built into the first software prototype.
To design an effective user interface Alison Crerar, a lecturer in Human Computer
Interfaces (HCI), was asked to take part in an expert walk-through of the first
prototype on 16th December. This first prototype was available for evaluation
approximately two weeks after coding started. It was considered important to obtain
comments from a potential user as soon as possible to ensure that the interface was
well designed from the user's point of view and provided suitable functionality.
The key points arising from this review are detailed in Appendix F together with the
resolutions reached. Although Alison is highly knowledgeable about User Interfaces,
she had never come across a mail filterer before and so she was representative of
future users in the Department.
Craig Cockburn, M.Sc. report, 1994 26
The original prototype was designed with the standard "File" menu on the menu bar.
This was to try and present a consistent user interface for users who were already
familiar with Windows applications. However, as a result of the review, this menu was
changed to say "Scenario", as most of the functions on the menu were related to
scenarios. Later in development, the few functions on this menu not connected with
scenarios were moved onto the "Export/Quit" menu.
To fulfil the user requirements detailed in section 3.1, it had to be easy for the users to
access the most frequently used match criteria and the most frequently used actions
based on those criteria. As a result, predefined forms were designed for the main fields
used for matching. These are "to", "from" and "subject". In these cases, the user only
has to enter the text in a box next to the required field. These commonly used fields
were placed near the top of the form so that the user notices them first. Next, a field
was introduced which allows the users some flexibility. This is an open field that allows
the users to specify the header item to match on and the header text. Although these
are generally separated by a colon, there is no need for the user to remember to do this
as one is automatically inserted by the application. This isolates the user from