An expert system for general symbol recognition

Pattern Recognition 33 (2000) 1975}1988
An expert system for general symbol recognition
Maher Ahmed , Rabab Kreidieh Ward*
Wilfrid Laurier University, Physics and Computer Department, Waterloo, Ont, N2L 3C5 Canada
University of British Columbia, Electrical and Computer Engineering Department, Vancouver, BC, V6T 1Z4 Canada
Received 9 July 1998; accepted 27 August 1999
Abstract
An expert system for analysis and recognition of general symbols is introduced. The system uses the structural pattern
recognition technique for modeling symbols by a set of straight lines referred to as segments. The system rotates, scales
and thins the symbol, then extracts the symbol strokes. Each stroke is transferred into segments (straight lines). The
system is shown to be able to map similar styles of the symbol to the same representation. When the system had some
stored models for each symbol (an average of 97 models/symbol), the rejection rate was 16.1% and the recognition rate
was 83.9% of which 95% was recognized correctly. The system is tested by 5726 handwritten characters from the Center
of Excellence for Document Analysis and Recognition (CEDAR) database. The system is capable of learning new
symbols by simply adding their models to the system knowledge base. 2000 Pattern Recognition Society. Published
by Elsevier Science Ltd. All rights reserved.
Keywords: Expert systems; OCR; Structural; Pattern recognition; Models; Mapping
1. Introduction
Handwritten symbol recognition has received and
continue to receive much attention by many researchers.
There is a wealth of papers published and studies carried
on this subject. In this paper we introduce a method
which is designed to recognize any handwritten symbol
including numerals, Latin, Arabic, Chinese or electrical
symbols. It is thus di!erent from the vast majority of
these studies where each of the later was meant for a
speci"c application e.g. Arabic numerals or Latin letters.
We here thus do not mention many such studies but
give quick overview of di!erent pattern recognition
techniques before we introduce our method.
Pattern recognition techniques fall under the following
main categories: statistical pattern recognition, structural
pattern recognition, and hybrid systems. In most statist-
* Corresponding author. Tel.: #1-604-822-6894; fax: #1604-822-9013.
E-mail
addresses:
[email protected]
(M.
Ahmed),
[email protected] (R.K. Ward).
ical pattern recognition systems, the symbol features are
"rst de"ned, then the decision boundaries in the feature
space are determined. Many such features are described
by Trier et al. [1]. Many methods that are based on
statistical pattern recognition techniques have been developed and proved to be e!ective. An example of these
methods is suggested by Cao et al. [2]. This system uses
the local histograms in each of the di!erent zones (grids)
as features. Another statistical pattern recognition system described by H. Al-Yosef and S. Udpa [3] uses the
normalized ratios of moments that describe the vertical
and horizontal projections as symbol features.
Arti"cial neural networks can be e!ectively used to
extract and also to cluster the features of symbols. A successful ANN for handwritten English character recognition, the Necognitron [4], is an attempt to imitate the
human visual system for pattern recognition.
Statistical pattern recognition systems (including arti"cial neural networks) need a large number of training
samples to extract or cluster the features. If the features
are not selected (or extracted) properly the symbol feature space regions will be overlapped and there will be
many mis-recognized symbols in the overlapped regions.
Generally, each statistical pattern recognition system is
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 9 1 - 0
1976
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
developed for a certain application and cannot learn new
symbols easily. As an example, if a system uses an arti"cial neural network for extracting English characters
features and if it is later required to add one or more new
symbols representing, for example, Arabic numerals then,
the system should be retrained with the whole training
data (old and new samples). In addition, if the new
symbols are added to the system, the system may not
function properly, especially if one or more of the new
symbol features are close to other old symbol features. In
this case, the arti"cial neural networks system itself may
need to be redesigned, e.g. by changing the number of
neurons, the number of layers, etc.
In structural pattern recognition, complex patterns are
divided into simpler ones, and the relations between the
sub-patterns are described. Structural pattern recognition methods generally use rules and grammars to describe symbols as shown by Fu [5].
Four expert systems are developed for recognizing
unconstrained handwritten numerals by Suen et al. [6]
Di!erent primitives are used. The "rst system obtains the
skeleton by thinning, then determines the end points and
the junction points. The recognition rate is 86.05% and
the substitution rate is 2.25%.
The second expert system performs the following steps:
1.
2.
3.
4.
thinning,
tracing the skeleton,
approximating the curves by lines,
size normalization and arrangement of the line segments into primitives.
The recognition rate is 93.1% and the substitution rate
is 2.95%.
The third expert system uses di!erent structural
features. Examples of these features are the locations
of the end points and the junction points in each region
of the digit, the width of stroke and the symmetry of
character. Over 14 other features are used. The system
recognition rate is 92.25% and the substitution rate
is 2.15%.
The fourth expert system relies on features extracted
from the contours. The recognition rate is 93.9% and the
substitution rate is 1.6%.
Another structural method which is also designed for
numeral recognition is described by Jianming and Hong
[7]. In this method, four points for bends, points for
curves, terminal points, and intersections are de"ned.
A primitive is de"ned as the skeleton segment which
starts and ends at feature points. A feature code of 11
elements is used to describe the local information of the
numeral and a "ve-element vector is used to describe the
global information.
The recognition rate using prototype matching is
97.86% and based on neural networks is 97.29% and
substitution rate is 2.71%.
Another system that integrates the use of neural networks and expert systems is reported by Amine et al. [8].
The original image is transferred into a binary image
using a parallel thinning algorithm. Then, the skeleton is
traced. Some primitives as straight lines, curves and
loops are extracted. Finally, a "ve-layer arti"cial neural
network is used for classi"cation. The neural network
is trained with 2000 isolated Arabic characters and
tested with other 1000 characters. The recognition rate
is 92%.
A new technique integrating a neural network and a
knowledge-based system for image recognition was introduced by Kim et al. [9]. The system was designed to
recognize handwritten digits (numerals). The model is
capable of inductive learning from example data and
logical inference from the rule base. The system has the
ability to justify its answer (e.g., why a numeral is recognized as 5 and not 6, or 9.) This system is well suited for
applications when only partial knowledge is available.
For a typical 200 handwritten digits (numerals), the recognition rate ranges from 69.5 to 81.5% and depends on
the number of rules (80 rules).
A system by Burel et al. [10] uses a combination of
statistical and morphological features and has proved to
be successful in handwritten digit recognition. There are
20 regions. Hence, 20 features are de"ned as ratios of
areas. The morphological features include cavities (west,
east, north, south and center) and the hole. A "ve-layer
perceptron neural network is used for classi"cation. The
neural network is trained by 1414 digits and the evaluation is performed on 1175 digits. The recognition rate is
93.6%.
A structural feature extraction technique for English
character recognition is described by Starzyk et al. [11].
An e!ective method for thinning using 3;3 template
windows as well as pixels outside the windows are used.
These windows are applied sequentially and are possible
to be implemented by a hardware circuit. After thinning
the image, critical points are marked; then the segments
are determined. These segments are scaled, matched, and
classi"ed.
Many character recognizers are based on mathematical formalisms that minimize a measure of misclassi"cation. Arti"cial neural networks employ mathematical
minimization techniques and are used in commercial
OCR systems. Recognition rates for machine-printed
characters can reach over 99%, but handwritten character recognition rates are typically lower because every
person writes di!erently, as reported by S. Lam in the
Center of Excellence for Document Analysis and Recognition, State University of New York at Bu!alo.
In this paper, we introduce a new structural pattern
recognition system. While successful existing pattern
recognition methods are each designed for a speci"c
application e.g. Chinese symbols, or Arabic numerals
recognition, our system is general in that it is designed to
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
recognize any symbol whether it is an English character,
an Arabic numeral, or an electrical symbol.
Our system has the following characteristics: it does
not use features but describes the symbol by di!erent
models. It does not use syntactic grammars but rules to
rotate, scale, thin, and other rules to model symbols. One
advantage of our system is that it has the capability of
recognition even when few model samples are used. It
measures the similarities as well as the di!erences between the representation of the symbol to be recognized
and those of the stored models. Our system can justify its
answer. Another advantage is that if a new symbol is
added, our system can easily be updated by simply
adding the models of the new symbol to the system
knowledge base.
Our system uses di!erent stages for analyzing and
recognizing symbols. Each stage reduces the symbol details until the symbol is described by one or more representation which contain the most necessary knowledge
needed to enable the symbol recognition. Not many
models of a symbol are required to achieve high recognition rate. These models are stored in the system knowledge-base.
1977
A symbol can be handwritten in an enormous number
of di!erent ways in the same space. Our system maps this
enormous number of di!erent representations of a handwritten symbol to a smaller number of possible representations. We partition the symbol into a number of
square zones. There are 2*"5 possible symbols for an
image that has = pixels in width and ¸ pixels in length.
We map this large number (2*"5) of symbols to just 16,
"24, symbols, where N is the number (integer) of zones
used to model the symbol. This is achieved by allowing
a limited number of sub-symbols in a zone. As an
example, consider a symbol which has one zone (N"1)
and ¸"10 and ="10 pixels. For this zone there exists
16 possible symbols as shown in Fig. 1. As another
example, assume that a symbol is written within a 30;20
pixels where each zone is 10;10 pixels. This area will
have 6 zones. Since each zone has 16 possible subsymbols, there are 16 allowable models instead of 2
possible ones. As an example, consider the letter &B'. One
of a possible model for the letter &B' is
2. The developed system
Expert systems are the most suitable tools for implementing structural pattern recognition techniques. Complex patterns are described in terms of simpler ones and
simple patterns are described by sub-patterns. Expert
systems help solve di$cult pattern recognition problems.
More rules and human experience can be added easily
using rule-based systems. Hence, the system performance
could be improved without rebuilding or redesigning the
system.
An expert system for 2-D symbol analysis and recognition is here developed. As mentioned above, our system
is not application-speci"c and can be used for the
recognition of any bi-level symbol. These include the
recognition of mathematical symbols, electrical circuit
symbols, and characters such as English, Chinese, Arabic,
etc.
The basic idea behind our system is that a symbol can
be constructed from smaller components. Here, four
basic components are used. These components are
the horizontal line & * ', the vertical line & " ', the 453
diagonal line &/ ', and the 1353 diagonal line & !'.
These components are from now on referred to as `segmentsa or symbol primitives. The system transforms
the symbol into a set of these segments. All these segments have the same length. The segments representing
a symbol are partitioned into groups or zones, where
each zone belongs to one of 15 di!erent possibilities.
Fig. 1 shows the 16 di!erent possible zones (including
the empty zone).
The top two zones are
the middle zones are
and the bottom zones are
Our system stores each such model for each symbol as
a vector in the system knowledge base. For example, one
of the models of the letter &B' shown above, its vector
representation will be formed of the shapes of the above
six zones.
When a symbol is presented to the system for recognition, the di!erent steps that the system performs are
1978
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 1. The 16 possible images in a zone of 10;10 pixels.
when its principal axis (de"ned below) is vertical (or
having a multiple of 203 to the vertical direction). Each
symbol will be rotated (by an angle )203) around its
central point until the symbol direction, i.e. its principal
axis, is at a certain direction which is vertical or having
a multiple of 203 to the vertical axis. In our system, the
tilted angle is assumed not to exceed 203.
(a) The central point. The physical concept of the center
of mass refers to that point in an object that has the same
amount of matter around it in any direction. If the origin
for the entire image is considered to be the pixel at
location (0,0), then the center of mass of an object C is
(C , C ), where C "1/nx, C "1/ny, for an n;n im age.
(b) The principal axis. The principal axis of a bi-level
object is a line passing through the object's center of mass
having a minimum total distance from all pixels belonging to the object.
2.2. The scaling algorithm
Fig. 2. The structural pattern recognition steps.
shown in Fig. 2. After rotating, scaling, and thinning the
system decomposes the symbol into strokes, Each stroke
is decomposed into short straight lines (segments). The
segments are grouped into zones. A vector is used to
store the zones shapes and hence to represent the symbol.
The distance between two vectors enables the system to
measure the di!erences and similarities between the two
symbol representations.
2.1. The rotation stage
The objective of the "rst stage is to adjust tilted symbols. Symbols may be drawn tilted in di!erent directions.
This problem is solved by rotating the symbols. The
central point (de"ned below) is considered as the origin of
the symbol. The symbol direction is considered to be zero
There are two kinds of documents. The "rst kind
includes documents that have symbols of similar sizes
(such as documents of English text, typed or printed by
a machine). The second kind includes documents that
have symbols of di!erent sizes (such as documents containing graphs, mathematical symbols, electrical circuit
elements and handwritten text).
If the document has symbols of similar sizes, our system will scale all these symbols so that all the symbols
have the same dimensions. The new dimensions of the
symbols may be selected by the user or defaulted to 32
pixels width and 48 pixels length symbol. The new
dimensions are usually smaller than the dimensions of
the symbols in the original document. For the case of
a document that have symbols of di!erent sizes, the new
dimensions of each scaled symbol will also be the same
for all symbols but these dimensions will determined by
the program. Hence, the system will automatically
choose certain values for the symbol length and width.
Examples of these values are 32, 48, 64, 72, or 96 pixels
for each of the width and the length.
Scaling is useful for mapping some of the di!erent
handwritten styles of the same symbol to the same
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
representation. A straight line
and lines with some deformation
will have the same representation
after down scaling them to a certain length. Consequently, this can be useful if the above three lines represent the same symbol.
Scaling down a symbol usually deletes some of its
details but keeps the global shape, this tends to increase
the opportunity of matching the considered symbol with
the stored models. However, when the information in the
small details is of relevance then scaling down the symbol
decreases the opportunity of correct matching. Hence,
the choice of the new symbol dimensions is important.
Assuming that the document contains symbols with
di!erent symbol sizes. After rotation, the system scales
the symbols to some prede"ned size (as mentioned also
before). The scaling algorithm is described by the following steps:
1. For each symbol, create a new empty (generally smaller) image with the prede"ned dimension values as
mentioned above (Fig. 3).
1979
2. Find (S , S ) the row and column scale values as the
ratios of the original and the new symbol dimensions.
3. Use a sliding window of size (S , S ) on the original
symbol to determine the gray level of each pixel in the
new symbol image. The window used to transform
Fig. 3(a) to (b) has (S , S )"(2.28, 1.31), the old dimen sions are (73, 42), the new ones are (32, 32).
4. The new gray level value of a pixel in the scaled
version is equal to the area of all ink (black) inside the
window (normalized by the area of the window) in the
original version, thus it is a value between 0 and
1 inclusive.
5. Since the obtained value of the pixel, so far, falls
between 0 and 1, we use a threshold value &Th' to
determine whether the "nal value of the pixel will be
black or white. If the calculated gray level * Th, the
pixel will be black. Possible values of the threshold
are in the range [0,1]. Experiments show that a value
for Th "0.2 is a good choice. Results when using
threshold values of 0, 0.2, 0.6, 0.8 and 1.0 are shown
in Fig. 4. If the original symbol is thin, using a
high threshold value may cause discontinuity.
However, if the original symbol is not thin, all
values between [0, 1] can be used. Increasing the threshold value tends to thin the pattern but preserve its
shape.
2.3. The thinning algorithm
Thinning is the process of reducing the thickness of
each line of patterns to just a single pixel. A comprehensive survey of thinning algorithms is described by Lam
et al. [12].
Fig. 3. A symbol after scaling to di!erent dimensions and using a scaling threshold"0.2: (a) the original symbol, (b) 32;32 size,
(c) 48;32 size, (d) 24;16 size, (e) 8;8 size, and ( f ) 8;4 size.
1980
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 4. Scaling a symbol with di!erent scaling threshold values and using 32;32 size: (a) the original symbol, (b) threshold"0,
(c) threshold"0.2, (d) threshold"0.4, (e) threshold"0.6, ( f ) threshold"0.8, and (g) threshold"1.0.
A fast parallel algorithm for thinning digital patterns
by deciding as to whether or not a pixel can be eroded
is described by Zhang and Suen. [13]. In our system we
"rst used this algorithm. This is then followed by a sequential one pass of our fast knowledge base system [14]
so as to reduce the number of pixels in the "nal thinned
pattern.
2.4. Symbols representation
The objective of this stage is to represent a symbol by
a vector. The details required to describe the symbol are
proportional to the vector length. Similar symbols will be
closer to each other in the N-dimensional space.
After thinning, each symbol is described by its strokes.
A stroke is a sequence of pixels that starts and ends at
`speciala pixels. A `speciala pixel is a pixel that has three
or more neighbors or exactly one neighbor. Each
`speciala pixel is marked by the letter &A' as shown in
Fig. 6(e). After marking all the special pixels, the symbol
is decomposed into strokes. Then, each of the unmarked
pixels in every stroke is marked by a code (an integer
number that takes a value between 1 and 8 inclusive) as
shown in Fig. 5. The code indicates the direction between
the present pixel and the following one. The marked
pixels are shown in Fig. 6( f ).
After isolating the symbol strokes and marking each
pixel by a code as shown in Fig. 6( f ), a set of rules is
applied to each symbol stroke to transfer it into segments
(straight lines), of "xed equal lengths. Depending on the
chosen length for the segments, a stroke may be converted to one or more segments or may vanish. These
Fig. 5. The di!erent direction possibilities.
segments are the symbol primitives. A vector of these
primitives will be used to represent the symbol. Using
many and thus shorter segments to represent a stroke
preserves all details of the stroke, including unnecessary
ones, while using long segments may delete important
details. However, if very short segments are used then
many more stored models will be needed. Using di!erent
segment lengths (2}9 pixels) is shown in Fig. 7. The rules
used to represent a stroke by segments are described in
the following section.
2.5. The system mapping rules
In this section, we will see how di!erent styles for
a symbol for example the letter &A'
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
1981
Fig. 6. Di!erent processing of a symbol: (a) A symbol, (b) after rotation, (c) after scaling, (d) after thinning, (e) after marking the special
pixels, and ( f ) after isolating the symbol strokes & coding each pixel.
are all mapped to
and the letter &A'
are mapped to
i.e. to the same representation.
We will show how we transfer each stroke into one or
more segments. All of these segments will have the same
length ¹. In forming a segment, consecutive pixels are
processed one at a time. To form a segment of length
¹ pixels, ¹ or more pixels are processed and converted
to a segment (except at the end of the stroke, where ¹/2
pixels are here su$cient).
A set of rules that map a symbol stroke into segments
of "xed length ¹ is developed. To "nd the direction of the
line segment, the consecutive pixels of each stroke are
analyzed pixel by pixel. For each pixel, a certain probability is assigned for each of the eight possible directions.
Then, the sum of the probabilities over the previous
pixels and the present one is calculated for each of the
eight possible directions. If one of these sums exceeds the
threshold value &¹' (the segment length), then, a complete
segment is formed and the segment direction is assumed
to be the direction of that of the largest probability sum.
Next, the same analysis for the remaining pixels in the
1982
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 7. Di!erent mappings of a symbol by using different segment lengths of: (a) 1-pixels, (b) 2-pixels, (c) 3-pixels, (d) 4-pixels, (e) 5 pixels,
( f ) 6-pixels, (g) 7-pixels, (h) 8-pixels, and (I) 9-pixels.
stroke continues so as to form new line segments and so
on. For the remaining pixels between the end of the last
formed segment and the end of the stroke, if any of the
eight values exceeds half the value of the threshold it will
be considered as a segment otherwise, no segment is
formed. The following algorithm describes the mapping
rules in more details:
1. De"ne the segment length (integer) which is the same
as the threshold &¹'.
2. Find the code of every pixel of the stroke as shown in
Fig. 5.
3. Consider the "rst pixel after the special pixel of the
stroke.
4. For i"1}8, initialize every one of the eight directions that the segment can take s[i] to 0.0.
5. Assign a probability for the segment direction "1
for the direction of the considered pixel, a probability
"0.7 for its two adjacent directions, a probability
"0.49 for the next two adjacent directions, and a
probability"0 for the remaining directions. For
example if the code of the pixel is 5 then p[5]"1,
p[4]"0.7, p[3]"0.49, p[6]"0.7, p[7]"0.49, and
p2"p[1]"p[8]"0. These probability values are
determined experimentally.
6. For i "1}8, update s[i] by adding the new p[i] to
the old s[i].
7. Calculate the maximum of s[1],2, s[8], if it is
greater than or equal to the threshold &¹' then we
form a segment of length ¹ pixels in that direction.
The length of the segment formed is always equal to
¹ pixels. Then, consider the next pixel of the stroke,
and go to step 4.
8. If the end of the stroke is not reached, consider the
next pixel of the stroke and go to step 5 (note that
each of the s[i] is still less than &¹').
9. At this point, the end of the stroke is reached. Determine the maximum of all of s[1] through s[8], if it is
greater than or equal to half of the threshold &¹', then
form a segment in that direction.
10. End of a stroke analysis. The result is a new stroke
consisting of segments (straight lines), each has one of
the eight directions.
11. After processing all the symbol strokes and "nding
all the segments, the segments are grouped so that
each group corresponds to one of the 16 zones as
shown in Fig. 1.
12. The horizontal and vertical number of zones for the
symbol is determined and the zone shape is determined as one of the 16 zones shown in Fig. 1.
2.6. The system models and recognition
Models for each symbol are stored in the system. Each
model is represented by a vector and may have di!erent
number of zones. The number of zones is determined by
the prior choice of the segments (straight lines) length ¹.
The vector of a model contains the number of vertical
and horizontal zones followed by a series of integers.
Each of these integers (0}15) represents one of the 16
possible segment images (shown in Fig. 1).
Our system can use two methods for constructing the
models. In the "rst method, the system extracts models
from the symbols and store the models in a database. In
the second method, the human designer constructs the
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
1983
Fig. 8. (Continued.)
Fig. 8. (a) Some models used by the system and the C of models
used for each symbol, (b) Models used by the system for the
letter A.
di!erent shapes representing a symbol by accessing the
database directly, i.e. for each "xed number of vertical
and horizontal zone, the di!erent possible representations of the symbol are found. Then, these models are
stored as vectors in the system knowledge base. In our
present implementation, we used the second method. The
number of each of the vertical or horizontal zones had
a range between 1;1 and 8;8 zones. The size of any
zone in the models is irrelevant as we only need the shape
of its content for later comparison with the vector representing the symbol to be recognized. The symbol to be
recognized is compared with the stored models with the
same number of vertical and horizontal zones.
In this work, as mentioned above, we used the second
method for constructing the system models. In our present implementation, an average of 97 models for each
symbol was used (Fig. 8).
Ideally, the system's stored models should include all
possible shapes for each symbol. The length of the segment controls the detail descriptions of the symbol. Using long segment length, hence few zones (such as 2;2 or
1984
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
2;3) results in a fewer number of models for a symbol, but
may not preserve some important symbol details. On the
other hand, using short segments (hence, many zones such
as 4;5 or 5;5) to model this symbol will preserve the
symbol details but results in a large number of models.
Some system models for di!erent symbols are shown
in Fig. 8. These models include Latin letters, digits 0}9,
the electric symbol for a diode
, the Arabic digit ,
the Greek letter , Arabic letters , and a general
symbol .
Models that use a large number of zones (i.e. short
segments) are suitable for representing symbols that have
small strokes as in the case of most Arabic characters or
electrical symbols.
In our present implementation, for each input symbol
to be recognized we use three representations (vectors).
Each vector corresponds to using segment length ¹ equal
to 6,10, or 14 pixels. Depending on the segment length
¹ used, each tested vector will have a certain number of
vertical and horizontal zones. Each vector will only be
compared to the stored model vectors which have the
same number of vertical and horizontal zones. The distance between the input vector and the stored ones will
be calculated by comparing each zone in the input symbol vector to the corresponding zone in the stored symbol vectors. Finally, the stored vector with the minimum
distance with the input vector will be selected as long as
the minimum distance is less than a certain `rejectiona
threshold. If the minimum distance is larger than the
threshold then the system rejects the symbol as unrecognizable (the symbol does not correspond to any of its
stored models). In case of a tie, the model with the larger
number of zones (i.e. with more details) will be chosen.
the option of selecting the size, however, we here select
the option where the system determines the prede"ned
size (according to the original symbol size). Then the
symbols are thinned as shown in Fig. 9(d). Then, the
special pixels are marked and the strokes of the symbols
are isolated. At this stage, the system will transfer the
strokes into segments where all segments have one "xed
length ¹. The symbols representations using segments
lengths equal to 6, 10, and 14 pixels are shown in Fig. 9(e),
(g), respectively. Most symbols are recognized by the
system as shown in Fig. 9(h) when a threshold `rejectiona
distance "100 was used. Using a small `rejectiona
distance value, as shown in Fig. 9(i) increases the recognition rate, however, it also increases the number of
symbols rejected by the system.
Generally, if a symbol is mis-recognized but a high
recognition rate is required, then more models with larger number of zones for that symbol should be added.
After studying the mis-recognized symbols we found
that a symbol may be recognized incorrectly due to one
of the following reasons:
1. The symbol is written in such a way that it closely
resembles another symbol. In this case, the human will
also be confused about the meaning of the symbol and
has to study the syntax or the semantic to understand
the symbol and select the closest meaningful symbol.
2. The symbol model is not in the system. This model can
be added to the system.
3. The "nal models for di!erent symbols may be similar
if we use long segments (as for some handwritten u's
and v's). In this case, we should use short segments to
keep as much details as required to di!erentiate between them so as to increase the recognition rate.
3. Results
4. The system limitations and future work
As mentioned above an average of 97 models were
stored per symbol. The system was tested with 5726
handwritten English characters and digits. When a rejection threshold value "15 was used the rejection rate was
16.1% and the recognition rate was 83.9% of which 95%
was recognized correctly. When a higher rejection threshold value "100 was used, the rejection rate was 0%
and recognition rate was 100% of which 87.6% was
recognized correctly. The tested (input) data were constituted of 5726 handwritten bi-level English characters
from the Center of Excellence for Document Analysis
and Recognition (CEDAR) database as well as another
120 symbols representing Arabic letters, Chinese characters, and mathematical and electrical symbols.
A subset (101 symbols) of this database is shown in Fig.
9(a). After rotating the symbols, some of its similar symbols become closer in shape as shown in Fig. 9(b). Next,
the system scales down the symbols to a prede"ned size
as shown in Fig. 9(c). As mentioned earlier, the user has
Our system has the main advantage of its capability
of recognizing any symbol in any language. It can also
justify the answer. On the other hand, in our system,
the following characters have the same models (c, C),
(f, F), (m, M), (u, U), (v, V), (k, K), (p, P), (s, S), (t, T),
(w, W), (x, X), (y, Y), and (z, Z). i.e. for these characters it is
now not possible to determine whether or not it is written
in the lower or upper case. This problem can be solved by
comparing the size and location of the symbols to determine whether the symbol is in the lower or upper case.
Also, since our system performs the recognition at di!erent stages, for some English characters, the scaling stage
can be useful to "nd whether a character is in upper case
or lower case.
In our system, we also use the same models for the
following symbols (1 and l), (2, z), (5, s), (0, o, O), (q, 9),
(g, 9), and (8, B). This problem can be overcome by having
some prior knowledge about the symbols. For example,
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
1985
Fig. 9. (a) A document, (b) The symbols after rotation, (c) The symbols after scaling, (d) After thinning, (e) After modeling using 6-pixels
segment length, ( f ) 10-pixels segment length, (g) 14-pixels segment length, (h) Recognition with rejection distance"100, (i) rejection
distance"15.
1986
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
in the case of the Canadian postal code, the "rst symbol is
a letter which is followed by a numeral then a letter, then
the symbol &}' then a numeral, then a letter, which is
followed by a numeral.
The current system is so general that some models for
A and R will also be similar. The same applies for O,
Q and D. This problem arises because our system may
omit small strokes and approximate small changes in the
strokes by straight segments. Hence, further checks may
be used to determine the "nal decision. For example, for
the case of A and R after recognition, we can examine the
direction and straightness of certain strokes. If the left
stroke is vertical with respect to the right stroke, then it
is R and not A.
Similarly, for O, D, and Q after recognition, if there is a
small stroke in the bottom-right part, then, it is Q.
Otherwise, it is O or D. If there is a straight stroke in the
left part then it is D and not O. The "nal decision will be
determined after these tests.
In future work, the employment of basic components
which are di!erent from the currently used 4 basic
straight lines segments (/, !, ", *) will be considered to
recognize more complicated symbols. These components
will include symbols. Then, the symbols will be described
by sub-symbols. Examples of these complicated symbols
are maps, roads, and pictures.
For large symbols, the algorithm which measures the
distance can be improved by including the possibilities of
deleting a row or a column (as deleting a character in
spell checking algorithms).
5. System characteristics
As mentioned in Section 2, the number of all possible
handwritten symbols in a zone of N;N pixels is 2,",.
This includes discontinuous symbols, however, for connected symbols, the number of possibilities is less than
2,",, but it is still a very large number. Our system maps
this large number of possibilities into 16 since only 16
shapes are allowed in a zone. However, to represent a
symbol meaningfully, we need more than one zone.
Fig. 10 shows the reduction in the number of possible
handwritten symbols and the number of allowed symbols
versus the number of zones (the selected zone dimensions
are 10;10 pixels). We have found that using 3;3 &
4;4 zones are su$cient to achieve high recognition rate
for the English characters and digits.
Using the C## language running on the 120 MHz
Pentium PC and the 64 English letters shown in the
middle of the document in Fig. 9, the system, on average,
converts each symbol into its structural feature vector
representation ("nal representation of the symbol by line
segments) in 1.9 s. This includes rotating the symbol,
scaling, thinning, decomposing it into strokes, coding
each stroke, mapping each stroke into line segments of
Fig. 10. The number of allowed symbols and the number of
possible symbols versus the number of zones.
Fig. 11. Symbols with di!erent number of strokes.
three di!erent lengths and "nding the representation of
the line segments by zones. This results in three representations of the symbol. Each representation corresponds to using a certain length for the segments.
However, the time required to represent a thinned
symbol by a structural feature vector depends on 1) the
number of strokes in the symbol and 2) the length of the
prede"ned line segments.
An experiment was conducted to "nd out how does the
time required to represent a sample of thinned symbols
shown in Fig. 12 by their structural feature vectors varies
with the number of strokes. The original symbols are
shown in Fig. 11, where the "rst symbol has one stroke,
the second symbol has three strokes and each other
symbol has di!erent number of strokes. We have used
line segments of length 6, 10 and 14 pixels to represent a
symbol by a 3 structural feature vectors. The experiment
indicates that the time required to represent a symbol by
a structural feature vector varies approximately linearly
with the number of strokes in the symbol as shown in Fig.
13. It is not easy to "nd this relationship mathematically.
However, it is expected that symbols that have more
strokes would require more processing time to be recognized than symbols with fewer strokes.
To address the second point, another experiment was
conducted to show how does the time required to represent a thinned symbol by a structural feature vector varies
with the prede"ned length of the line segment. The eight
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 12. The symbols after thinning.
Fig. 13. The processing time for converting a thinned symbol
into a structural feature vector as the number of its strokes in the
symbol increases.
Fig. 14. The time required to convert the thinned symbols
shown in Fig. 12 to structural feature vectors versus the length of
the line segments.
thinned symbols shown in Fig. 12 were used simultaneously and only one line segment length was de"ned at
a time. It is shown that using short line segments to
represent a thinned symbol by a structural feature vector
is faster than using long line segments. The time varies
approximately linearly with the length of the line segment as shown in Figs. 13 and 14.
6. Conclusion
A rule-based system for the recognition of any 2-D
bi-level line symbol is introduced. Examples of these
symbols are typed or handwritten mathematical and
1987
electrical symbols and characters such as Greek, Arabic,
English or Chinese.
The system performs the recognition in di!erent steps.
The "rst step adjusts the tilted symbols by rotation, then
scales them to prede"ned sizes. This is followed by thinning. At this stage, each symbol is described by its strokes.
Then, we apply some rules to transform the symbol
stroke into a combination of straight lines (segments).
A vector representing these segments is used to model the
symbol. The distance between this vector and the vectors
of stored models is used to identify the input tested
symbol.
The system was tested with di!erent symbols. The
tested data symbols consisted of all bi-level 5726 English
characters available from the Center of Excellence for
Document Analysis and Recognition (CEDAR) database. In addition, there were 120 other arbitrary symbols.
The rejection rate was 16.1% and the recognition rate
was 83.9% of which 95% were recognized correctly. In
order to increase the recognition rate and decrease the
rejection rate, more models should be used for each
symbol.
7. Summary
An expert system for 2-D bi-level symbol analysis and
recognition is introduced. The proposed system is general in that it is not designed for a speci"c application,
but can be used for recognition of any symbol. The
system uses the structural pattern recognition technique
to represent each symbol by a set of short straight lines
that we call segments. To obtain a representation of
a symbol, the system performs four basic steps. First, the
system adjusts the symbol by rotating it around its central point until its principal axis makes a certain angle
with the vertical axis (03 or having a multiple of 203).
Secondly, the system scales the symbol to a prede"ned
size. The third step is thinning. After that, the system
extracts and describes the thinned symbol in terms of
strokes. Finally, each stroke is approximated by segments (short straight lines). The resulting representation
of the symbol is compared with di!erent stored models
of the di!erent symbols. For each symbol many models
are stored. Results and analysis of the recognition of a
document are described. The boundaries (surface of Ndimensional sphere) for each symbol are determined by
a threshold (the radius of this sphere). Using a low threshold will decrease the space for this symbol, increase the
rejection rate and increase the recognition rate.
After storing an average of 97 models/symbol, the
system was tested with 5726 bi-level handwritten English
letters and digits taken from the Center of Excellence for
Document Analysis and Recognition (CEDAR) database
and another 120 handwritten characters of another
symbols.
1988
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
For a low threshold (radius"15) the rejection rate
was 16.1% and the recognition rate was 83.9% of which
95% were correctly recognized. When the threshold was
high (radius"100) the rejection rate was 0% and the
recognition rate was 100% of which 87.6% were recognized correctly.
The performance of our system can be improved by
simply storing more models. The system is capable of
learning new symbols by simply adding models for these
symbols to the system knowledge base. The system is
implemented by using the C## language and is running on the 120 MHz Pentium PC.
References
[1] O. Trier, A. Jain, T. Taxt, Feature extraction methods for
character recognition * a survey, Pattern Recognition
29 (4) (1996) 641}662.
[2] J. Cao, M. Ahmadi, M. Shridhar, Recognition of handwritten numerals with multiple feature and multistage
classi"er, Pattern Recognition 28 (2) (1995) 153}160.
[3] H. Al-Yosef, S. Udpa, Recognition of arabic characters,
IEEE Trans. Pattern Anal. Mach Intel. 14 (8) (1992)
853}857.
[4] K. Fukushima, Necognitron: a hierarchical neural
network capable of visual pattern recognition, Neural
Networks 1 (2) (1988) 119}130.
[5] K.S. Fu, Syntactic Pattern Recognition and Applications,
Prentice-Hall, Englewood cli!s, NJ, 1982.
[6] C. Suen, C. Nadal, R. Legault, T.A. Mai, L. Lam, Computer recognition of unconstrained handwritten numerals,
Proceedings IEEE 80 (7) (1992) 1162}1180.
[7] H. Jianming, Y. Hong, Structural primitive extraction and
coding for handwritten numeral recognition, Pattern Recognition 31 (5) (1998) 493}509.
[8] A. Amin, O! -line Arabic character recognition: the state
of the art, Pattern Recognition 31 (5) (1998) 517}530.
[9] H. Kim, H. Yang, A neural network capable of learning
and inference for visual pattern recognition, Pattern Recognition 27 (10) (1994) 1291}1302.
[10] G. Burel, I. Pottier, J. Catros, Recognition of handwritten
digits by image processing and neural network, International Conference on Neural Networks, Vol. 3, 1992,
pp. 666}671
[11] J. Starzyk, Y. Jan, Algorithm & architecture for feature
extraction in image recognition, Southeastern Symposium
on System Theory, 1994, pp. 448}452.
[12] L. Lam, S. Lee, C. Suen, Thinning methodologies * a
comprehensive survey, IEEE Trans. Pattern Anal. Mach
Intell. 14 (9) (1992) 869}885.
[13] Y.T. Zhang, C. Suen, A fast parallel algorithm for thinning
digital patterns, Commun. ACM 27 (3) (1984) 236}239.
[14] M. Ahmed, R. Ward, A fast one pass knowledge-based
system for thinning, Electronic Imaging 7 (1) (1998)
111}116.
About the Author*MAHER AHMED is an assistant professor at Wilfrid Laurier University, Waterloo, Canada. He received his Ph.D.
at the University of British Columbia, Vancouver, Canada (1999). He holds two M.Sc. degrees, one in Systems and Control from
Queen's University, Kingston, Ontario, Canada (1994), and the other in Computer Science from the University of Cairo University,
Egypt (1988). He was with Ontario Hydro, Canada, from 1990 to 1991 and with the National Research Center, Egypt from 1987 to 1990.
His research interests include pattern recognition, arti"cial neural networks and expert systems.
About the Author*RABAB KREIDIEH WARD was born in Beirut, Lebanon. She received the B.E. degree from the University of
Cairo, Egypt (1966), and her Masters and Ph.D. degrees from the University of California, Berkeley (1969, 1972, receptively). She is the
Director of the Centre for Integrated Computer Systems Research and Professor in the Electrical & Computer Engineering Dept. at the
University of British Columbia, Vancouver, Canada. Her research interests are mainly in the areas of signal processing and image
processing. She has made contributions in the areas of signal detection, image encoding, compression, recognition restoration and
enhancement, and their applications to infant cry signals, cable TV, HDTV, medical images, and astronomical images. She holds "ve
patents related to cable television picture monitoring, measurement and noise reduction. Applications of her work have been transferred
to U.S. and Canadian industries. She is the fellow of the EIC, IEEE and the Royal Socity of Canada.