Chart Parser

by Jim Chapman
Home| Where we live | Jim| Isabel| James| Edward]

//
// Chart.h
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     15 May 97  Released.
//
// This file holds the declarations for the Chart Parser application's core classes,
// together with comments giving a general description of the program.
//
// How to Use the Program
// ======================
//
// User-Interface    The application displays a dialog box containing a text control
// --------------    labelled 'Sentence', a 'Parse' button, and a 'Quit' button.  Hit
//                   the Quit button to terminate the application, otherwise enter a
// sentence in the text control and then hit the Parse button.  The sentence text must
// must be in lower case, the words must be separated by space characters and the text
// should contain no punctuation characters.  The program will parse the sentence and
// will put up a dialog box displaying the results.  If it could not parse the sentence,
// the dialog box will contain a message to that effect.  Otherwise, it will contain a
// picture of the parse tree of the sentence.  If the sentence was ambiguous (i.e. it
// could be parsed in several different ways), then there will be a spinner control
// allowing you to switch between the various parse trees.  Dismiss the results dialog
// box by using its 'OK' button.
// The grammar and lexicon supplied will allow the system to parse sentences like
// "i saw the man in the park with an umbrella through my telescope" or
// "the cat sat on the green mat and ate the mouse".
//
// The Grammar File    The program loads its grammar from a file called 'grammar.txt' in 
// ----------------    its local directory.  This file contains grammar rules, one per
//                     line.  Blank lines and lines beginning with a ";" character are
// ignored.  Each rule begins with a head category, which is followed by 0, 1 or many
// sub-categories.  The grammar-loader does not understand meta-symbols like [, ], {, },
// (, ), +, *, etc.  The program assumes that the head category of the first rule is the
// 'sentence category' - i.e. it will look for parse trees for that category spanning
// the entire input sentence.  An example grammar file follows:
//
// <Cut here>
//     |
//     |
//     V
//      ; A trivial grammar for a subset of English:
//      
//      S               NP VP
//      S               S CONJ S
//      NP              DETopt ADJstar N PPstar
//      NP              NP CONJ NP
//      VP              V NPopt PPstar
//      VP              VP CONJ VP
//      PP              P NP
//      NPopt
//      NPopt           NP
//      DETopt
//      DETopt          DET
//      PPstar          PP PPstar
//      PPstar      
//      ADJstar         ADJ ADJstar
//      ADJstar
//
// If the program cannot find this file or cannot read it, it will display a message
// box complaining about this fact, and will then terminate.
//
// The Lexicon File    The program loads its lexicon from a file called 'lexicon.txt' in
// ----------------    its local directory.  This file contains lexical entries for all
//                     the words in the parser's vocabulary.  Blank lines and lines
// beginning with a ";" character are ignored.  Other lines are considered to be
// lexical entries.  A lexical entry consists of a word (which should be in lower-case),
// followed by the names of one or more lexical categories to which that word belongs.
// The items in a lexical entry should be separated by spaces.  An example lexicon file
// follows:
//
// <Cut here>
//     |
//     |
//     V
//      ; A lexicon for a trivial subset of English:
//      
//      the             DET
//      a               DET
//      an              DET
//      and             CONJ
//      or              CONJ
//      bob             N V
//      man             N
//      park            N
//      umbrella        N
//      telescope       N
//      mat             N
//      cat             N
//      mouse           N
//      i               N
//      car             N
//      saw             V N
//      sat             V
//      ate             V
//      park            V
//      went            V
//      gave            V
//      green           ADJ N
//      brown           ADJ
//      happy           ADJ
//      sad             ADJ
//      fat             ADJ
//      my              ADJ
//      your            ADJ
//      his             ADJ
//      through         P
//      on              P
//      in              P
//      with            P
//      of              P
//      to              P
//
// If the program cannot find the lexicon file, or it cannot read it, or if the file
// is badly formatted, it will display a message box complaining about this fact, and
// will then terminate.
//
// Overview of Program
// ===================
//
// The Algorithm    The program implements a standard bottom-up chart parsing 
// -------------    algorithm.  The detailed pseudo-code for this algorithm is presented
//                  in the comment text on the method Sentence::m_parse - the present
// section offers a conceptual overview that deliberately glosses over various points.
// The basis of the chart parsing algorithm is to hypothesise that certain
// sections of the input sentence are generated by particular rules from its grammar.
// These sections are identified by 'vertices' and the hypotheses are represented by
// 'edges'.  The following diagram indicates the position of an edge that suggests the
// presence of an NP (noun phrase) for the words "the mat" in the sentence below
//
// edges:                           <----NP----->
// vertex #:  0    1     2     3    4     5     6
// words:      the   cat   sat   on   the   mat
//
// This edge has leftmost vertex 4, and rightmost vertex 6.
// When the algorithm creates a new edge, it firstly places it in a data structure called
// the 'agenda'.  At a later time, it will get round to 'processing' the edge.  At this
// time, it will:
// 1) remove the edge from the agenda
// 2) see what further hypotheses it can deduce from the edge (see below).  For all such
//    hypotheses, it will create new edges and put them on the agenda.
// 3) put this edge on to the 'chart'.
// Once an edge has been put on the chart, it is visible to stage (2) of the process
// above, in the course of of processing later edges.
// There are two kinds of edge - 'complete' edges, and 'incomplete' edges.  A complete
// edge is a hypothesis that says "from vertex X to vertex Y there exists a string that
// definitely could be generated by rule X".  An incomplete edge is a hypothesis that
// says "from vertex X to vertex Y there exists a string that looks like it might be
// generated by rule Z, but I'm going to need to find some more of rule Z's symbols
// after vertex Y in order to confirm this hypothesis".
// Stage (2) above therefore consists of the following, for any edge e:
//     if e is complete
//        find any incomplete edges immediately to the left of e that are looking
//        for e's category as the next symbol in their rules.  Create new edges by
//        extending those incomplete edges over e.
//     else
//         find any complete edges immediately to the right of e that have the category
//         that e is looking for, and create new edges by extending e over them.
//
// The chart parsing algorithm consists, in essence, of repeatedly taking an edge from
// the agenda and processing it.  When there are no further edges on the agenda, the
// algorithm halts, and searches the chart to see if it has any complete edges that span
// all the way from the start vertex to the end vertex, and represent the
// category called "Sentence".
//
// The Data Structures    The program uses the MFC collection classes extensively
// -------------------    in the implementation of the classes needed for the chart
//                        parser algorithm.  Refer to Microsoft's documentation for the
// details of these collection classes.  They key points about the collection classes
// are (1) they are mostly defined using templates (2) they define non-intrusive
// collections (3) you iterate through the MFC collection classes by using a variable
// of type POSITION.
//
// CATEGORY:  In the grammar and lexicon files, and in the program's displayed output,
// categories are represenred by CString objects.  Internally, they are represented as
// integers.  I do this because integers can be more easily copied and compared than
// CStrings, and because they can be used as array indices.  An array of CStrings is used
// to map between category names and category numbers; this array is kept inside the
// Grammar class (q.v.).
//
// RULE:  A rule consists of information like "'S' generates 'NP VP'": it has a
// left-hand side (the category 'S'), and a right-hand side (the sequence of categories
// {NP, VP}).  Internally, this information is stored as a category number and a list
// of category numbers.
//
// GRAMMAR:  The grammar is represented by an object of class Grammar.  It contains all
// the grammar rules used by the chart parser.  The algorithm needs to access the
// grammar in two ways: (1) by asking for a list of all the grammar rules that have
// some particular category as the first item on the left hand side; (2) by asking for
// a list of all the grammar rules that have empty left hand sides.  The Grammar object
// supports this type of access, implementing (1) by storing an array of lists of rules
// - where the array-index is a category number, and implementing (2) by storing
// a list of rules.
// The Grammar object also stores an array containing the names of all the categories
// that appear in the grammar and lexicon.
//
// LEXICON:  The lexicon maps words to lists of category numbers.  It is implemented,
// therefore, as an MFC map structure, mapping CStrings to lists of integers.
//
// VERTEX:  A vertex indentifies a position within the sentence.  The start of the first
// word is vertex 0, the end of the Nth word is vertex N.  Within the program, vertices
// are represented as integers.
//
// EDGE:  There are two kinds of edge - lexical edges and non-terminal edges. The
// two kinds of edge share the same access semantics, so they inherit from a virtual base
// class, Edge.  An edge has a start vertex, an end vertex and a category number.  It
// also has a boolean value to say whether it is 'complete' (i.e. whether it is trying
// to find further subedges in order to substantiate the hypothesis that it represents).
// Lexical edges represent words.  Internally, they contain the index of the word in
// the sentence and the category number represented by the edge.  Lexical edges are
// always complete.
// Non-terminal edges represent the attempt to apply some grammar rule to some subsection
// of the input sentence.  Internally, they contain start and end vertex numbers, the
// category number of the edge, the rule represented by the edge, and a POSITION that
// indicates how much of the edge's rule has not yet been substantiated.  If that
// POSITION  is NULL, the edge is complete.  Otherwise, the edge is incomplete. and
// is seeking the category whose number is at the point indicated by the POSITION
// iterator in the rule.
//
// CHART:  This data structure stores all the edges that have been processed so far.  The
// chart parser algorithm searches the chart in two ways: (1) asking for a list of all
// complete edges that represent some category and start at some vertex; (2) asking for
// all the incomplete edges that are seeking some category and that end at some vertex.
// Internally, the chart stores the edges in two matrices - one for complete edges, the
// other for incomplete edges.  Each matrix is indexed by vertex number and category
// number; their elements are lists of edges.
//
// AGENDA:  All edges that have not yet been processed are kept on the agenda.  The
// core of the chart parsing algorithm consists of repeatedly removing the 'next' edge
// from the agenda, and processing it.  The algorithm itself is agnostic about what
// 'next' means in this context - we are free to experiment with implementing the agenda
// as a stack, as a queue, or as some clever data structure to implement intelligent
// scheduling that always processes the most promising-looking edge first.
// In this program, we always run the parse to completion before reporting what sentence
// parses have been found, so there is no benefit to be had from such intelligent
// scheduling.  Accordingly, the agenda is implemented as a queue.
//

#include "stdafx.h"

#include <afxtempl.h>
#include <assert.h>

class Grammar;
class Sentence;

//
// CatDescription
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// This class stores a <number,CString> pair, which is used for mapping category
// numbers to category names.  A list of pointers to these objects is stored in
// a Grammar object.
//
// Do not copy or assign CatDescription objects (there's no need to do so anyway).
//
// Methods:
// -------
// Contructor:  trivial.
//
// Data Members:
// ------------
// m_number:  the category number.
// m_name:    the category name.
//
class CatDescription {
public:
    CatDescription(int number, const CString &name)
        : m_number(number), m_name(name) {};
    CatDescription(const CatDescription &c) {assert(FALSE);};
    CatDescription &operator =(const CatDescription&that) {assert(FALSE);return *this;};

    int m_number;
    CString m_name;
};

//
// Rule
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// This class stores a grammar rule.  A rule object states some category can generate a
// (possibly empty) sequence of other categories.  For example the rule "S -> NP VP"
// states that the category S can generate "NP VP".  In this example, we would say that
// the rule's category was S, and its sub-categories were  the list "NP VP".
//
// Although I have included "->" in the text of the rule in the above example, note that
// in actual fact, when the program decodes a rule from a string, the string must not
// contain "->" - the rule's category is distinguished from its subcategories only by
// appearing first in the line.
//
// Do not copy or assign Rule objects (there's no need to do so anyway).
// Do not create rules on the stack, and do not delete them.  This is because when you
// construct a rule, you always do so by giving the rule a grammar to add itself to - and
// the rule then becomes the property of that grammar.
//
// Methods:
// -------
// Contructor:    builds a rule from a line like "S NP VP".  The ctor adds the
//                constructed rule to the Grammar that is passed as a parameter to the
//                ctor.
// m_getCat    :  gets the rule's category number.
// m_getSubcats:  gets the rule's sub-categories (as a list of category numbers).
// m_traceOut:    prints the rule to the debug window.
//
// Data Members:
// ------------
// m_cat:         the category number.
// m_subcats:     the list of subcategory numbers.
//
class Rule {
public:
    Rule(CString ruleText, Grammar &);
    Rule(const Rule &r) {assert(FALSE);};
    Rule &operator =(const Rule&that){assert(FALSE);return *this;};

    const CList<int, int> & m_getSubcats() {return m_subcats;};
    int m_getCat() {return m_cat;}

#ifdef _DEBUG
    void m_traceOut(Grammar &g);
#endif

private:
    int m_cat;
    CList<int,int> m_subcats;
};

//
// A RuleList is a list of pointers to Rule objects.
//
typedef CTypedPtrList<CPtrList, Rule *> RuleList;

//
// Grammar
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// A Grammar object stores:
// 1) a list of category descriptions (to map between category names and numbers).
// 2) a load of grammar rules, in a data-structure that is specialised for the kind of
//    lookup that chart parsers need to do.
//
// You construct a grammar by loading it from a file; the Grammar ctor does this (by
// default, from a file called "grammar.txt").
// 
// Do not copy or assign Grammar objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor:             reads the grammar in from the file whose name is given as the
//                          ctor's parameter.  If a problem occurs while reading the file
//                          the program displays a message box and then exits.
// Destructor:              deletes all the grammar's constituent rules.
// m_lookUpCatNumber:       gets a category number for a given category name.  If the
//                          Grammar hasn't seen the category before, it generates a new
//                          category number for it.
// m_lookUpCatName:         given a category number, look up the name of the category.
// m_getEmptyRules:         gets a list of all the rules that have no sub-categories
//                          (i,e. the empty sequence appears as the rule's left hand
//                          side).  The list is not valid after the Grammar is destroyed.
// m_getRulesStartingWith:  gets a list of all the rules with a particular category
//                          appearing as their first sub-category (i.e. as the first
//                          category on the rule's right hand side).
//                          The list is not valid after the Grammar is destroyed.
// m_addRule:               adds a rule to the grammar.  Do nothing with the rule
//                          afterwards; it is now the property of the Grammar object.
// m_getNumCats:            tells you how many categories there are in the grammar.
// m_traceOut:              prints the grammar to the debug window.
//
// Data Members:
// ------------
//  m_catNames:             a list of pointers to CatDescription objects, used for
//                          looking up category names from category numbers, and vice
//                          versa.
//  m_emptyRules:           a list of all the rules with null right hand sides.
//  m_rulesByFirstCat:      an array (indexed by category number) each element of which
//                          is a pointer to a list of rules.  The list contains all the
//                          rules that have a particular category first on their left
//                          hand sides.
class Grammar {
public:
    Grammar(const CString & fileName="grammar.txt");
    ~Grammar();
    Grammar(const Grammar &) {assert(FALSE);};
    Grammar & operator = (const Grammar &) {assert(FALSE); return *this;};

    int m_lookUpCatNumber(const CString &);
    CString m_lookUpCatName(int) const;
    const RuleList & m_getEmptyRules() const;
    const RuleList & m_getRulesStartingWith(int cat) const;
    void m_addRule(Rule *rule);
    int m_getNumCats() const {return m_catNames.GetCount();};


#ifdef _DEBUG
    void m_traceOut();
#endif

private:
    CTypedPtrList<CPtrList,CatDescription *> m_catNames;
    RuleList m_emptyRules;
    CTypedPtrArray<CPtrArray, RuleList*> m_rulesByFirstCat;
};

//
// Lexicon
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// A Lexicon object stores all the words in the lexicon, in a map that allows us to
// find, for any particular word, a list of all the categories for that word,
// described by their category numbers.
//
// You construct a lexicon by loading it from a file; the Lexicon ctor does this
// (by default, from a file called "grammar.txt").
// 
// Do not copy or assign Lexicon objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor:       reads the lexicon in from the file whose name is given as the
//                    parameter to the ctor.  If anything goes wrong while reading the
//                    file, the program displays a message box and exits.
// Destructor:        deletes all the entries in the map.
// m_getCatsForWord:  given a word, this method returns a list containing all the lexical
//                    categories for that word.  The list is not valid after the Lexicon
//                    is destroyed.
// m_traceOut:        prints the lexicon to the debug window.
//
// Data Members:
// ------------
// m_entries:         a map object that maps CString to pointer to list of integers.  The
//                    CString is a word, and the integers are that word's lexical
//                    categories.
//  
class Lexicon {
public:
    Lexicon(Grammar &, const CString & fileName="lexicon.txt");
    ~Lexicon();
    const CList<int,int> * m_getCatsForWord(const CString &word) const;

#ifdef _DEBUG
    void m_traceOut(Grammar &);
#endif

private:
    CMapStringToOb m_entries; // maps strings to CList<int,int>*
};

//
// Edge
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// This is a virtual base class for the two kinds of edge that are used by the chart
// parsing algorithm.
// Edges must implement the following methods:
//
// Virtual Methods:
// ---------------
// m_getLeftVertex:  the leftmost vertex of the edge
// m_getRightVertex: the rightmost vertex of the edge
// m_isComplete:     does the edge need any further sub-categories?
// m_getCat:         what category does the edge represent?
// m_getDepth:       how far above the bottom of the parse tree is this edge?  (to be
//                   precise, how long is the longest path from this node to a leaf
//                   node?)
// m_display:        draw this edge and its subedges on the device context, expanding
//                   the bounding rectangle to include the edge.  Note that edges
//                   which have leftmost vertex = rightmost vertex will not be drawn.
// m_traceOut:       print the edge to the debug window.
//
class Edge {
public:
    virtual ~Edge() {};

// for parsing:
    virtual int m_getLeftVertex() const =0;
    virtual int m_getRightVertex() const =0;
    virtual BOOL m_isComplete() const =0;
    virtual int m_getCat() const =0;

// for displaying:
    virtual int m_getDepth()=0;
    virtual CPoint m_display(CDC *pDC, int sentenceDepth, int parentDepth, Sentence &s,
                             Grammar &g, CRect &boundingRect)=0;

#ifdef _DEBUG
    virtual void m_traceOut(Grammar &)=0;
#endif
};

//
// EdgeList
//
// An EdgeList is a list of pointers to Edge objects.
typedef CTypedPtrList<CPtrList, Edge *> EdgeList;

//
// LexicalEdge
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// A LexicalEdge object stores lexical edges - i.e. edges that represent
// words - they are leaf nodes in the parse tree.
//
// Do not copy or assign LexicalEdge objects (there's no need to do so anyway).
//
// Methods:
// -------
// Comstructor:   given a category and a word number (i.e. the index of a word in the
//                sentence), constructs a corresponding lexical edge.
//
// Virtual Methods:
// ---------------
// Please refer to the documentation for the Edge class.
//
// Data Members:
// ------------
// m_wordNumber:  the index of the word in the sentence (0=first word).
// m_cat:         the category number, stating what category is represented by the edge.
// 
class LexicalEdge : public Edge {
public:
    LexicalEdge(int cat, int wordNum) : m_cat(cat), m_wordNumber(wordNum) {};
    LexicalEdge(const LexicalEdge &) {assert(FALSE);};
    LexicalEdge &operator=(const LexicalEdge &) {assert(FALSE); return *this;};

    virtual m_getLeftVertex() const {return m_wordNumber;};
    virtual m_getRightVertex() const {return m_wordNumber+1;};
    virtual m_isComplete() const {return TRUE;};
    virtual int m_getCat() const{return m_cat;};

    // for displaying
    virtual int m_getDepth() {return 0;};
    virtual CPoint m_display(CDC *pDC, int sentenceDepth, int parentDepth, Sentence &s,
                             Grammar &g, CRect &boundingRect);

#ifdef _DEBUG
    virtual void m_traceOut(Grammar &);
#endif

private:
    int m_wordNumber;
    int m_cat;
};

//
// NonTerminalEdge
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// A NonTerminalEdge object stores non-terminal edges - i.e. edges that represent
// non-terminal categories of the grammar.
//
// Do not copy or assign NonTerminalEdge objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor (1):   constructs an edge starting and ending at a given vertex, which
//                    needs the entire string of categories on the right-hand side of the
//                    given rule.
// Constructor (2):   constructs an edge by extending the given incomplete edge over the
//                    given complete edge.  Something horrible will happen if you pass
//                    a combination of edges such that this extension is not valid.
//
// Virtual Methods:
// ---------------
// Please refer to the documentation for the Edge class.
//
// Data Members:
// ------------
// m_rule:            a pointer to the grammar rule to which this edge relates.
// m_leftVertex:      the leftmost vertex of this edge.
// m_rightVertex:     the rightmost vertex of this edge.
// m_subedges:        a (possibly empty) list of all the edges contained in this one -
//                    such edges correspond to categories from m_rule that have been
//                    found between m_leftVertex and m_rightVertex, and explained by
//                    this edge.
// m_positionInRule:  an iterator that keeps track of where we have got to in this
//                    edge's rule.  'Extending' this edge equates to moving the
//                    iterator one place to the right.  When the iterator is NULL, that
//                    means that the edge is complete.
// m_depth:           used by NonTerminalEdge::m_getDepth to cache its result, and so
//                    to avoid recursing every time that method is called.
//
class NonTerminalEdge : public Edge {
public:
    NonTerminalEdge(Rule *r, int vertex);
    NonTerminalEdge(NonTerminalEdge *incompleteEdge, Edge *completeEdge);

    NonTerminalEdge(const NonTerminalEdge &) {assert(FALSE);};
    NonTerminalEdge &operator=(const NonTerminalEdge &) {assert(FALSE); return *this;};

    virtual m_getLeftVertex() const {return m_leftVertex;};
    virtual m_getRightVertex() const {return m_rightVertex;};
    virtual m_isComplete() const;
    virtual int m_getCat() const;

    int m_getNeededCat() const;

    // for displaying:
    virtual int m_getDepth();
    virtual CPoint m_display(CDC *pDC, int sentenceDepth, int parentDepth, Sentence &s,
                             Grammar &g, CRect &boundingRect);

#ifdef _DEBUG
    virtual void m_traceOut(Grammar &);
#endif

private:
    Rule *m_rule;
    int m_leftVertex;
    int m_rightVertex;
    EdgeList m_subedges;
    POSITION m_positionInRule;

    // for displaying;
    int m_depth;
};

//
// Chart
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// A Chart object stores the 'chart' of the chart parser, in a data-structure optimised
// for the kinds of access that the chart parsing algorithm needs.  The algorithm
// accesses entries in the chart in two ways:
// 1) "Give me a list of all complete edges whose category is X and which start on
//     vertex #Y"
// 2) "Give me a list of all incomplete edges that next need to find a subedge of
//     category X, and which end on vertex #Y".
// The Chart object contains two arrays of arrays (indexed [vertex][category]), each
// element of which is a pointer to a list of edges.  These data-structures give a
// quick and efficient way to perform the necessary types of access to the chart.
//
// Do not copy or assign Chart objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor:                      constructs a Chart object with data structures large
//                                   enough for parsing a sentence having the given
//                                   number of vertices, using a grammar with the given
//                                   number of categories.
// m_completeEdgesForCatStartingAt:  gets a list of all complete edges starting at some
//                                   vertex, with some particular category.
// m_incompleteEdgesNeedingCatAt:    gets a list of all incomplete edges ending at some
//                                   vertex and needing some particular category.
//
// Data Members:
// ------------
// m_completeEdges:                  the array of arrays that stores all the complete
//                                   edges.
// m_incompleteEdges:                the array of arrays that stores all the incomplete
//                                   edges.
//
class Chart {
public:
    Chart(int maxVertex, int maxCat);
    Chart(const Chart &) {assert(0);};
    Chart &operator=(const Chart &) {assert(0); return *this;};
    ~Chart(); // delete every edge in both edge arrays

    void m_addEdge(Edge *);
    EdgeList * m_completeEdgesForCatStartingAt(int cat, int vertex);
    EdgeList * m_incompleteEdgesNeedingCatAt(int cat, int vertext);
private:

    // These members cause a warning C4786 in axftempl.h, because they result in
    // class names that are over 255 characters long.  This doesn't seem to have any
    // pathological consequences.
    CArray<CArray<EdgeList*,EdgeList*>,CArray<EdgeList*,EdgeList*>&> m_completeEdges;
    CArray<CArray<EdgeList*,EdgeList*>,CArray<EdgeList*,EdgeList*>&> m_incompleteEdges;

};

//
// Agenda
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// The Agenda class stores the 'agenda' of the chart parser algorithm.  It is a store
// for the set of edges that have not been processed yet by the main loop of
// Sentence::m_parse.
// 
// In this implementation, the agenda is a queue; I achieve this by sub-classing it from
// type list-of-pointers-to-edges, and adding suitable definitions for the
// methods that the chart parsing algorithm needs.
//
// Note that you should not delete an Agenda object unless it is empty.  There is
// no circumstance in which this should happen, because the chart parsing algorithm
// will only terminate when the agenda is empty, and you shouldn't be deleting the
// agenda unless the algorithm has terminated.
//
// Methods:
// -------
// Destructor:  does nothing special - but it asserts(FALSE) if there are any edges
//              left on the agenda.
// m_addEdge:   adds an edge to the agenda.  Pass a pointer in to this method, and do
//              nothing to your pointer after that.
// m_nextEdge:  gets the next edge from the agenda.  The method returns a pointer to the
//              edge, or 0 if the agenda is empty.
//  
class Agenda : public CTypedPtrList<CPtrList, Edge*>
{
public:
    ~Agenda() {assert(IsEmpty());};
    void m_addEdge(Edge* e) {AddTail(e);};
    Edge* m_nextEdge() {return IsEmpty() ? NULL : RemoveHead();};
};

//
// Sentence
//
// Who             When       What
// ---             ----       ----
// Jim Chapman     13 May 97  Released.
//
// In order to parse a string of text, construct a Sentence from it, then call the
// Sentence's m_parse method.  This will return a list of (pointers to) edges.  Each
// element of that list represents a possible parsing of the sentence.
// Examine, or display, or whatever, the contents of the list, and then destroy the
// Sentence object.  Do not do anything with the edge pointers after you have destroyed
// the Sentence, because the Sentence dtor will have freed the memory they point at. 
//
// Do not copy or assign Sentence objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor:        given a string (which must not be empty), this method builds a
//                     Sentence object for it.  It fills in m_words with the individual
//                     words of the sentence, and m_verticesArray with the character
//                     positions of each of the vertices of the sentence.  It initalises
//                     m_chart and m_agenda to 0.
// Destructor:         deletes the chart (and therefore, every edge in it), and the
//                     agenda (which must be empty when we are deleting the sentence
//                     object).
// m_parse:            parses the sentence, and returns a list of possible parsings
//                     of it.
// m_vertexToCharNum:  given a vertex number, call this method to determine how many
//                     character positions appear to the left of that vertex.  This
//                     method is used for positioning edges when displaying them on the
//                     screen.
// m_getWord:          given a zero-based index, i, looks up the character string of the
//                     (i+1)th word of the sentence.
//
// Data members:
// ------------
// m_verticesArray:    this array lets us look up a vertex's character position, given
//                     vertex number.
// m_words:            the words of the sentence, stored in an array.
// m_chart:            the chart used for parsing the sentence, or 0 if m_parse has
//                     not been called yet.
// m_agenda:           the agenda used for parsing the sentence, or 0 if m_parse has
//                     not been called yet.
//
class Sentence {
public:
    Sentence(CString);
    Sentence(const Sentence &) {assert(FALSE);};
    Sentence & operator = (const Sentence&) {assert(FALSE); return *this;};
    ~Sentence();

    EdgeList * m_parse(Grammar &, Lexicon &);
    int m_vertexToCharNum(int) const;
    CString m_getWord(int) const;
private:
    CArray<int,int> m_verticesArray; // maps vertex numbers to character offsets.
    CStringArray m_words;
    Chart * m_chart;
    Agenda * m_agenda;
};



home@jim-chapman.net