%second part of the paper
\section{Horn Clauses
}
The PDA based compilation approach proved itself
a fruitful theoretical and experimental support for the analysis
and understanding of general CF parsing \`{a} la Earley.
In accordance with our strategy of uniform study of
the ``Horn continuum'', we extended this approach
to general Horn clauses, i.e. Definite Clause (DC) programs.
This lead to the definition of the
{\em Logical Push-Down Automaton (LPDA)}
which is an operational engine intended to play for Horn clauses
the same role as the usual PDA for CF languages.
Space limitations prevent giving here a detailed presentation of LPDAs,
and we only sketch the underlying ideas.
More details may be found in \cite{lang-datalog,lang-lpda}.
\vspace{1ex}
As in the CF case, the evaluation of a DC program may be decomposed
into two phases:
\begin{itemize}
\item
a compilation phase that translate the DC program into a LPDA.
Independently of the later execution strategy, the compilation
may be done according to a variety of evaluation schemata:
top-down, bottom-up, predictive bottom-up, ...
Specific optimization techniques may also be developed for
each of these compilation schemata.
\item
an execution phase that can interpret the LPDA according to some
execution technique: backtrack (depth-first), breadth-first,
dynamic programming, or some combination \cite{oldt}.
\end{itemize}
This separation of concerns leads to a better understanding of issues,
and should allow a more systematic comparison of the possible alternatives.
In the case of dynamic programming execution,
the LPDA formalism uses to very simple structures
that we believe easier to analyze, prove,
and optimize than the corresponding direct constructions on DC
programs \cite{parsing-deduc,porter,oldt,vieille-logic},
while remaining independent of the computation schema, unlike
the direct constructions.
Note that predictive bottom-up compilation followed by dynamic programming
execution is essentially equivalent to Earley deduction as presented
in \cite{parsing-deduc,porter}.
\vspace{1ex}
The next sections include a presentation of LPDAs and their dynamic
programming interpretation, a compilation schema for building a LPDA from
a DC program, and an example applying this top-down construction to a
very simple DC program.
\subsection{Logical PDAs and their dynamic programming interpretation
}
\label{gen-dp}
A LPDA is essentially a PDA that stores logical atoms
(i.e. predicates applied to arguments)
and substitutions on its stack, instead of simple symbols.
The symbols of the standard CF PDA stack may be seen as
predicates with no arguments (or more accurately with two argument
similar to those used to translate CF grammars into DC
in \cite{definite-clause}).
A technical point is that we consider
PDAs without ``finite state'' control:
this is possible without loss of generality
by having pop transitions that replace the top two atoms by only one
(this is standard in LR(k) PDA parsers\cite{aho-ull}).
\vspace{1ex}
Formally a LPDA $\pla$ is a 6-tuple:\hspace{2em}
\(\pla{} = (\plvars, \plfuncs, \plstacks,
\plstackini, \plstackfin, \pltransits) \)
%\hspace{2em}
\newline
where
$\plvars$ is a set of variables,
$\plfuncs$ is a set of functions and constants symbols,
$\plstacks$ is a set of stack predicate symbols,
$\plstackini$ and $\plstackfin$ are respectively the initial and final
stack predicates,
and $\pltransits$ is a finite set of {\em transitions} having one of the
following three forms:
\vspace{-1.5mm}
\begin{tabbing}
\hspace{1.5cm} \=
{\em horizontal transitions:} \=
$\pltrans{\platB}{\platC}$ \hspace{2em} \=
-- -- replace $\platB$ by $\platC$ on top of stack \\
\> {\em push transitions:} \>
$\pltrans{\platB}{\platC\platB}$ \>
-- -- push $\platC$ on top of former stack top $\platB$\\
\> {\em pop transitions:} \>
$\pltrans{\platB\platD}{\platC}$ \>
-- -- replace $\platB\platD$ by $\platC$ on top of stack\\
\> \hspace{-2em}
where $\platB$, $\platC$ and $\platD$
are $\plstacks$-atoms, i.e. atoms built
with $\plstacks$, $\plfuncs$ and $\plvars$.
\end{tabbing}
%\vspace{-9mm}
Intuitively (and approximately)
a pop transition $\pltrans{\platB\platD}{\platC}$
is applicable to a stack configuration with atoms $\platA$ and $\platA'$
on top,
iff there is a substitution $s$ such that $\platB s = \platA s$ and
$\platD s = \platA' s$. Then $\platA$ and $\platA'$ are removed from the stack
and replaced by $\platC s$, i.e. the atom $\platC$ to which $s$ has been
applied. Things are similar for other kinds of transitions.
Of course a LPDA is usually non-deterministic w.r.t. the choice of the
applicable transition.
In the case of dynamic programming interpretations, all possible computation
paths are explored, with as much sub-computation sharing as possible.
The algorithm proceeds by building a collection of {\em items}
(analogous to those of Earley's algorithm) which are pairs of atoms.
An item $\plitemp{\platA}{\platA'}$ represents a stack fragment of two
consecutive atoms~\cite{lang-pap,lang-incomplete}.
If another item
$\plitemp{\platA'}{\platA''}$ was also created, this means that the sequence
of atoms $\platA \platA' \platA''$ is to be found in some possible stack
configuration, and so on
({\em up to the use of substitutions, not discussed here}).
The computation is initialized with an initial item
$\plitemini = \plitemp{\plstackini}{\plstackend}$.
New items are produced by applying the LPDA transitions to
existing items, until no new application is possible (an application
may often produce an already existing item).
The computation terminates under similar conditions as specialized
algorithms \cite{parsing-deduc,oldt,vieille-logic}.
If successful, the computation produces one or several {\em final items}
of the form
$\plitemp{\plstackfin}{\plstackini}$, where the arguments of $\plstackfin$
are an answer substitution of the initial DC program.
In a parsing context, one is usually interested in obtaining parse-trees
rather than ``answer substitutions''.
The counterpart of CF a parse tree
is here a proof tree corresponding to proofs with the original
DC program.
%\footnote{
%However, in the case of non-CF formalisms (e.g. Tree Adjoining Grammars)
%we should compare proof trees with TAG derivation trees, rather
%than the object parse-trees \cite{jvw87}.}
Such proof trees may be obtained by the same techniques that are
used in the case of CF parsing~\cite{lang-pap,lang-shared,billot-these},
and that actually interpret the items and their relations as a shared
parse forest structure.
Substitutions are applied to items as follows
(we give as example the most complex case):
a pop transition $\pltrans{\platB\platD}{\platC}$
is applicable to a pair of items $\plitemp{\platA}{\platA'}$
and $\plitemp{\platE}{\platE'}$, iff there is a unifier $s$ of
$\plitemp{\platA}{\platA'}$
and $\plitemp{\platB}{\platD}$, and a unifier
$s'$ of $\platA' s$ and $\platE$.
This produces the item $\plitemp{\rm C\it ss'}{\rm E'\it s'}$.
\subsection{Top-down compilation of DC programs into LPDAs
}
Given a DC program,
{\em many different compilation schemata may be used to build a
corresponding LPDA} \cite{lang-lpda}.
We give here a very simple and unoptimized top-down construction.
The DC program to be compiled is composed of
%a query which is a logical literal, and
a set of clauses
$\gamma_k$:\hspace{1em}
$\plrule{\plA_{k,0}}{\plA_{k,1},\ldots,\plA_{k,n_k}}$,
where each $\plA_{k,i}$ is a logical literal.
The query is assumed to be the head literal $\plA_{0,0}$
of the first clause $\gamma_0$.
\vspace{1ex}
\label{old-lpda}
The construction of the top-down LPDA is based
on the introduction of new predicate symbols $\plposit{k}{i}$,
corresponding to positions between the body literals of
each clause $\gamma_k$.
The predicate $\plposit{k}{0}$ corresponds to the position before the
leftmost literal, and so on.
Literals in clause bodies are refuted from left to right.
The presence of an instance of a position literal
$\plposat{k}{i}$ in the stack
indicates that the first $i$ subgoals corresponding to the body of some
instance of clause $\gamma_k$ have already been refuted.
The argument bindings of that position literal are the
partial answer substitution
computed by this partial refutation.
For every clause $\gamma_k$:\hspace{1em}
$\plrule{\plA_{k,0}}{\plA_{k,1},\ldots,\plA_{k,n_k}}$,
we note ${\rm t}_k$ the vector of variables occurring
in the clause.
Recall that $\platA_{k,i}$ is a literal using some of the
variables in $\gamma_k$,
while
$\plposit{k}{i}$ is only a predicate which needs to be given
the argument vector ${\rm t}_k$ to become the literal
$\plposat{k}{i}$.
Then we can define the top-down LPDA by the following transitions:
\label{td-pla}
\begin{enumerate}
\item
$\pltrans{\plstackini\:}{\plposat{0}{0}\,\plstackini}$
\item
$\pltrans{\plposat{k}{i}} {\plA_{k,i+1}\,\plposat{k}{i}}$
\hfill {\em ---
for every clause $\gamma_k$ and
}
\hfill {\em
for every position $i$ in its body: $0\leq i BC ***********
\vspace{1ex}
~~~predicate :nabla.2.0
\\nabla.2.0(X1) -> q(f(X1)) nabla.2.0(X1)
\vspace{1ex}
~~~predicate :nabla.0.0
\\nabla.0.0(X2) -> q(X2) nabla.0.0(X2)
\vspace{1ex}
~~~predicate :dollar0
\\dollar0() -> nabla.0.0(X2) dollar0()
\vspace{2ex}
********* Horizontal Transitions B->C ******
\vspace{1ex}
~~~predicate :q
\\q(f(f(a))) -> nabla.1.0()
\\q(X1) -> nabla.2.0(X1)
\vspace{1ex}
~~~predicate :query
\\query(X2) -> nabla.0.0(X2)
\vspace{1ex}
~~~predicate :nabla.0.1
\\nabla.0.1(X2) -> answer(X2)
\vspace{2ex}
********* POP Transitions BD->C ************
\vspace{1ex}
~~~predicate :nabla.2.1
\\nabla.2.1(X1) nabla.0.0(X2) -> nabla.0.1(X2)
\\nabla.2.1(X4) nabla.2.0(X1) -> nabla.2.1(X1)
\vspace{1ex}
~~~predicate :nabla.1.0
\\nabla.1.0() nabla.0.0(X2) -> nabla.0.1(X2)
\\nabla.1.0() nabla.2.0(X1) -> nabla.2.1(X1)
\vspace{1ex}
~~~predicate :nabla.0.1
\\nabla.0.1(X3) nabla.0.0(X2) -> nabla.0.1(X2)
\\nabla.0.1(X2) nabla.2.0(X1) -> nabla.2.1(X1)
\vspace{1ex}
}}
%nabla.0.1(X2) dollar0() -> query(X2)
\caption{
Transitions of the LPDA.
\label{ex1-transits}
}
%\end{figure}
}
%The final predicate is {\tt nabla.0.1} .
%
\hfill
%
\parbox[b]{6.5cm}{
%\begin{figure}
\fbox{\parbox{6.3cm}{
\small
\tt
\begin{tabbing}
Clauses: \= q(f(f(a))):-. \\
\> q(X1):-q(f(X1)).
\vspace{1ex}\\
Query: \> q(X2)
\end{tabbing}
}}
\caption{
The Definite Clause program.
\label{ex1-dcprog}
}
%\end{figure}
\vspace*{1.4cm}
%\begin{figure}
\fbox{\parbox{6.3cm}{
\small
\tt
\vspace{1ex}
dollar0() , ()()
\\nabla.0.0(X5) , dollar0()
\\q(X6) , nabla.0.0(X6)
\\nabla.2.0(X7) , nabla.0.0(X7)
\\nabla.1.0() , nabla.0.0(f(f(a)))
\\q(f(X8)) , nabla.2.0(X8)
\\nabla.0.1(f(f(a))) , dollar0()
\\nabla.2.0(f(X9)) , nabla.2.0(X9)
\\nabla.1.0() , nabla.2.0(f(a))
\\nabla.2.1(f(a)) , nabla.0.0(f(a))
\\nabla.0.1(f(a)) , dollar0()
\\q(f(f(X10))) , nabla.2.0(f(X10)) *
\\nabla.2.1(f(a)) , nabla.2.0(a)
\\nabla.2.1(a) , nabla.0.0(a)
\\nabla.0.1(a) , dollar0()
\\answer(a) , dollar0()
\\answer(f(a)) , dollar0()
\\answer(f(f(a))) , dollar0()
\vspace{1ex}
\\
\footnotesize
*
\rm subsumed by:
\tt q(f(X8)),nabla.2.0(X8)
%\vspace{1ex}
}}
\caption{
Items produced by the dynamic programming
interpretation.
\label{ex1-items}
}
%\end{figure}
}
%Actually 27 items were produced, but 4 were eliminated by the
%admissibility test to leave only the above 23.
\end{figure}
The following example has been produced with
a prototype implementation realized by Eric Villemonte de la Clergerie
and Alain Zanchetta~\cite{stagex}.
This example, as well as the top-down construction above, are taken from
\cite{lang-lpda}.
The definite clause program to be executed is given
in figure~\ref{ex1-dcprog}.
Note that a search for all solutions in a backtrack evaluator
would not terminate.
\vspace{1ex}
The solutions found by the computer are:~~
\parbox[t]{3cm}{
\tt
X2 = f(f(a))
X2 = f(a)
X2 = a
}
\\
These solutions were obtained by first compiling the DC program
into an LPDA according to the schema defined in section~\ref{old-lpda},
and then interpreting this LPDA with the general dynamic programming
algorithm defined in section~\ref{gen-dp}.
The LPDA transitions produced by the compilation are
in figure~\ref{ex1-transits}.
%
The collection of items produced by the dynamic programming computation is
given in the figure~\ref{ex1-items}.
\vspace{1ex}
In the transitions printout of figure~\ref{ex1-transits},
each predicate name {\tt nabla.i.j}
stands for our $\plposit{i}{j}$.
According to the
construction of section~\ref{old-lpda},
the final predicate should be {\tt nabla.0.1}.
For better readability
we have added a horizontal transition to a final predicate
noted {\tt answer}.
\section{Other linguistic formalisms
}
Pereira and Warren have shown in their classical paper~\cite{definite-clause}
the link between CF grammars and DC programs.
A similar approach may be applied to more complex formalisms
than CF grammars, and we have done so for Tree Adjoining
Grammars (TAG)~\cite{lang-tag6}.
By encoding TAGs into DC programs, we can specialize to TAGs the
above results, and easily build TAG parsers (using at least the general
optimization techniques valid for all DC programs).
Furthermore, control mechanisms akin to the agenda of chart parsers,
together with some finer properties of LPDA interpretation,
allow to control precisely the parsing process and produce
Earley-like left-to-right parsers, with a complexity O($n^6$).
\vspace{1ex}
We expect that this approach can be extended to a variety of other
linguistic formalisms, with or without unification of feature structures,
such as head grammars, linear indexed grammars, combinatory categorial
grammars. This is indeed suggested by the results of
of Joshi, Vijay-Shanker and Weir that relate these formalisms
and propose CKY or Earley parsers for some of them
\cite{jvw87,vw89}.
\vspace{1ex}
The parse forests built in the CF case correspond to proof forests
in the Horn case. Such proof forests may be obtained by the
same techniques that we used for CF parsing \cite{lang-shared}.
However it is not yet fully clear how parse trees or derivation trees
may be extracted from the proof forest when DC programs are used
to encode non-CF syntactic formalisms.
On the basis of our experience with TAGs,
we conjecture that for non-CF formalisms, the proof forest obtained
corresponds to derivation forests (i.e. containing derivation trees
as defined in~\cite{jvw87}) rather than to forest representing the
possible superficial syntactic structure of object trees.
\section{Concluding Remarks
}
Our understanding of syntactic structures and parsing may be considerably
enhanced by comparing the various approaches in similar formal terms,
even though they may differ in power and/or superficial appearance.
Hence we attempt to formally unify the problems in two ways:
--- by considering all formalisms as special cases of Horn clauses
--- by expressing all parsing strategies with a unique operational
device: the pushdown automaton.
Systematic formalization of problems often considered to be pragmatic
issues (e.g. structure and construction of
parse forests) has considerably improved our
understanding and has been an important success factor.
It is our belief that such formalization is essential to
harness the algorithmic intricacies of language processing,
even if the expressive power of a formal system cannot
not fully cover the richness of natural language constructions.
The links established with problems in other areas of
computer science (e.g. partial evaluation, database recursive queries)
could also be the source of interesting new approaches.
\vspace{5ex}
\noindent
{\Large \bf Acknowledgements}
\vspace{2ex}
This work has been partially supported by the Eureka Software Factory
(ESF) Project.