A Network-Based Approach to Text Handling for the Online Scientific Community

Randall Trigg

Department of Computer Science
University of Maryland
College Park MD 20742

November 1983

Ph.D. dissertation
University of Maryland Technical Report, TR-1346
University Microfilms #8429934

Copyright © Randall H. Trigg

This research was supported by the National Science Foundation, the Air Force Office of Scientific Research and NASA Goddard Space Flight Center under grant MCS- 8219507, AFOSR-82-0303 and NAS-25764 respectively.

CHAPTER 4: A Taxonomy of Link Types

One of the most important features of the Textnet approach is the extraction of semantic content from text by making the relationships between nodes explicit. This is accomplished by joining the nodes with typed links. In this chapter, we outline a proposed taxonomy of primitive link types.

We should first note that there is an alternative to such a fixed set of link types. One can imagine a facility whereby users are allowed to define new link types. Though this does provide for extensibility, it seems unlikely that scientific writing is evolving at such a rate that new link types are regularly discovered. Rather, we feel that the set of link types is fairly static. For this reason and the following practical considerations, we decided not to provide such a facility.

(1) Explosion of link types: Without restrictions, users could flood the system with unmanageably many new link types.

(2) Reader confusion: It seems unlikely that the choice of link type name by the creator would be sufficient to convey the meaning of this new link to future readers. This in turn could lead to misuse of the new link type by later critiquers.

(3) System confusion: The semantics of some link types are partially understood by the system (see Section 9.7). Creators of new types would have to somehow define the type to the system at least in so far as such special features are concerned.

If our list of primitive types is sufficiently rich then users will find among these primitives the type they need in the vast majority of cases. At worst, users can propose new types (usually subtypes of existing ones) to the system for future inclusion. Usually, however, users simply annotate the link itself with further descriptive information (by building a chunk connected to the link in question).

In what follows, we first describe a simple division of link types into two categories; normal and commentary links. The second section discusses link directionality and its relationship to link type semantics. A proposed division of the major functions of a work is presented in the third section. Finally, the remainder of the chapter describes each link type in turn using these work functions.

4.1. Classifying Link Types

At the highest level, link types fall into two main categories; normal and commentary links. Normal links serve to connect nodes making up a scientific work as well as to connect nodes living in separate works. (Notice that the notion of a "separate" work loses much of its meaning in a Textnet environment. Works are quickly linked and intertwined into the network as they are read.) Commentary links connect statements about a node to the node in question. Table 4.1 shows the list of link types separated into these two categories. One special link type not shown is the Child link. As explained in Section 3.4, Child links connect toc nodes to their children (either tocs or chunks).

As an example of the difference between normal and commentary links, consider the link types Support and Supportive. A Support link, say, "A supports B," offers new information or arguments in A supporting the ideas and arguments presented in B. A Supportive link, on the other hand, simply connects a positive, affirming statement to the node on which it comments.

Recalling the types of motion described in Section 3.4, we find another difference between normal and commentary links. Almost invariably, commentary links serve as side links rather than train of thought links. (Of course readers can later build paths which include commentary nodes, but this will generally not be the case for the original author's intended path.) On the other hand, normal links tend to be along the train of thought with the notable exceptions of citations and certain special cases of other normal links (see Section 4.4).

For both normal and commentary link types, instances of links to particular nodes of a work can scale up naturally to groups of nodes, chapters of a document, entire works, or even groups of works. Consider, for example, the link type E-vacuum, used to suggest that a node should be linked to related work in the field. This same type can be used for critiquing a single chunk, a node corresponding to an entire work, or even an "area" node corresponding to a scientific field (see Section 9.2). For the most part, in the listing of link types below, we make no restrictions on the "size" of the linked nodes.

Normal link types
Citation
C-source
C-pioneer
C-credit
C-leads
C-epon

Background
Future

Refutation
Support

Methodology
Data

Generalization/Specification
Abstraction/Example
Formalization/Application

Argument
A-deduction
A-induction
A-analogy
A-intuition

Solution

Summarization/Detail
Alternate-view
Rewrite

Simplification/Complication
Explanation

Correction
Update

Continuation

Commentary link types
Comment
Critics
Supportive

Environment
E-comment
E-misrepresent
E-vacuum
E-ignored
E-Isupersede
E-Irefute
E-Isupport
E-Irepeat

Problem Posing
P-comment
P-trivial
P-unimportant
P-impossible
P-ill-posed
P-solved
P-ambitious

Points
Pt-comment
Pt-trivial
Pt-unimportant
Pt-irrelevant
Pt-redherring
Pt-contradict
Pt-dubious
Pt-counter
Pt-inelegant
Pt-simplistic
Pt-arbitrary
Pt-unmotivated

Arguments
A-comment
A-invalid
A-insuff
A-immaterial
A-mislead
A-alternate
A-strawman

Data
D-comment
D-inadequate
D-dubious
D-ignored
D-irrelevant
D-inapplicable
D-misinterpreted

Style
S-comment
S-boring
S-unimaginative
S-incoherent
S-arrogant
S-rambling
S-awkward

Table 4.1: Link types.

4.2. Link Directionality and Semantics

The notion of link directionality was first mentioned in Section 3.4. Recall that the physical direction of a link defines the manner in which readers are expected (by the link's author) to follow the link. Readers of a link directed from A to B, for example, are expected to proceed to B after having read A.

There is another kind of link direction dictated by the link's type. We call this the semantic direction of a link. For example, a Refutation link connecting A and B can be read "A refutes B" and thus the semantic direction is from A to B. However, the physical direction can differ. The refuting author may wish readers to see refutation A only after having read the refuted node B.

In fact, it is generally the case that for commentary links, the physical direction is opposite to the semantic direction. Consider, for example, a comment link of the form "C comments on N." Semantically, the link points from C to N, while the physical direction usually goes the other way. This is because it is rare (though possible) to read the commentary C before reading its object N.

Normal links, however, do not exhibit this uniformity. For example, the Solution link type's semantic direction goes from solver to problem; "A solves B." In some cases, the physical direction also goes from A to B, say, when the author notes that the point A just made provides a solution to the long-standing problem B. On the other hand, in the act of problem posing (described further below), problems usually precede their solutions.

In fact, for several normal link types, we provide companion types differing only in semantic direction. These pairs appear at the top of the second column of Table 4.1. For example, for the link pair Simplify and Complicate, we have that "A simplifies B" is equivalent to "B complicates A."

To summarize, the prospective author of a link must make two choices. One is the link's type (implying a semantic direction). The other is the physical direction of the link. The latter captures the manner in which readers are expected to follow the link.

We should note that the above formulation still leaves one possible source of confusion. Given, say, a Solution link connecting A to B, it may not be clear whether this is to be read "A solves B" or "A solved-by B." Though readers can probably disambiguate from context, the system has no such ability. In the future, Textnet may require an additional tag specifying whether the semantic direction is along the physical direction or reversed.

4.3. Functions of a Work

For the purposes of further classifying normal and commentary links, a simple division of the functions of a work is beneficial. We assume here that the goal of a work is to communicate information and/or beliefs to the reader. Each part of the work serves a different function in this regard. Among the possible functions of a work are the following.

(1) Specifying context. This involves connecting the nodes of a wotk to existing related literature. We call this related literature the work's environment. Links of this sort (citation links) specify the manner in which nodes in a work relate to external nodes and are a natural extension of the notion of bibliographic reference. For example, a citation link could point to an important landmark work in an area or to a relevant experimental study present in the literature.

(2) Problem posing. The posing of a problem may or may not be followed by the presentation of a solution. In technical papers such problem posing usually occurs early and the paper itself comprises the proposed solution. Often, however, authors will pose problems at the end of their paper for which they currently have no solution. Such "needed work" is often a form of solicitation to the community (see Section 8.4).

(3) Theory declaration. In this mode, the author states a work's thesis. Of course this occurs throughout the work and in parallel with the other functions.

(4) Arguments. Here, the anthor attempts to argue from a set of premises to a set of conclusions. There are several types of arguments some of which can be differentiated using links.

(5) Data. Some works present data as evidence for a theory. To draw on data existing in a previous work, the author simply links to the appropriate node in that work.

It should be emphasized that these categories are not intended as clean divisions of activities found in a work. Often the activities occur in parallel during the course of the work. Furthermore, the same piece of text can serve several functions. For example, an argument can serve to specify some portion of the author's thesis. Also, as we have seen, presenting data as evidence for a theory may require a citation link, thus further delineating the work's context. Finally, imagine a work W1 commented on by node N. And suppose that N poses a problem which is solved by a new work W2. Then, one might form a link from N to W2. Such a link could be considered as specifying context, as a link from problem posing to solution, or as response to criticism.

In what follows, the above division of a paper's activities will be referred to when discussing both normal and commentary links.

4.4. Normal Links

In this section, the category of normal (or non-commentary) links is discussed. Some of these link types correspond to the functions of a work described in the last section while others are more generally applicable.

We start with link types serving to delineate different parts of a work. As described in Section 4.3, the main parts of a work include context specification, problem posing, theory specification, arguments and data. These categories overlap quite a bit, but provide an adequate framework for our purposes of link type classification. We first cover the link types used in context specification, i.e. those connecting a node, often a whole paper, to its external environment.

Most authors expend some effort casting their work within an environment of related work. Certain kinds of papers (e.g. surveys) in fact exhibit a preponderance of such external references. In the Textnet system, such references are simply links to nodes of other works and are called citation links. The list or citation link types given below was derived from a typology of reasons for citation given in [Garfield 64]. Note that some of the link types given below need not point to other works. For example, an author can compose a background section while writing the original work, rather than linking (via Background) to someone else's. In general, those link types prefaced by "C-" are most likely to point to external works.

Citation: A general purpose citation link.

C-source: Gives the source of concepts and ideas in order to enable checking and authenticating of data and clams of facts, physical constants, etc.

C-pioneer. Pays homage to pioneers. This is similar to C-source though broader in scope, i.e. one cites the work or a pioneer in a field though the cited work may not be directly relevant.

C-credit: Gives credit for related work (homage to peers).

C-leads. Provides leads to uncited or unpublicized work.

C-epon. Identifies original work describing an eponymic concept or term as, e.g., Hodgkin disease, Pareto's law, Friedel Krafts reaction.

Background: Provides background, pointing to nodes by other authors (often entire works) or to nodes by the same author (often part of the present work, e.g. a toc labeled "Background").

Future: Alerts to forthconning work. One way this can be done is to point to a childless toc node under which the future work wili eventually be linked. (For another solution using "dangling" links, see Section 8.4.2).

Refutation: Refutes the work or ideas or others (negative claims).

Support: Supports or substantiates the claims, ideas, and work of others.

Providing such a set of citation links makes the job of. an automatic bibliography generator fairly straightforward. The bibliography for a work is simply the set of external works (or nodes within works) pointed at by citation links. Note, however, that it may be desirable for bibliographies to include other linked external nodes.

Note that not all links to external nodes are citations. An author can simply include (by linking) nodes from another work and/or composed by another author. Unlike citations. these correspond to direct quotes. The "lifted" nodes then function like any other nodes in the work.

Two other link types that sometimes point to external works are Methodology and Data. Methodology: Identifies methodology, equipment, etc.

Data: A link connecting to a node containing data of some sort. If the author is drawing on data existing in a previous work, the link is to the appropriate node in that work.

We now discuss in turn the areas of thesis specification, arguments and problem posing.

The next set of link types consists of matched pairs of opposites often used in thesis specification. Such pairs were introduced earlier in Section 4.2. For example, we consider moving from point P1 to P2 by generalizing to be opposite to moving from one to the other by specializing. In most cases one can substitute for a link of one type from Pt to P2, a link of its opposite type from P2 to P1 and have changed only the direction of the train of thought.

Generalize & Specialize.

Abstraction & Example.

Formalize & Apply: This pair refers to the twin acts of formalizing a set of notions to make a theory and applying a theory to obtain practical results.

Argument: This is a general link type meant to connect the premises of an argument to its conclusions. In fact, the argument will often require explanatory text (usually interspersed with the premises), thus the link may actually be from nodes containing premises togerher with the text of the argument to the conclusions.

A-deduction: This link type signifies that the argument uses deduction.

A-induction: Here, induction is the type of argument, usually implying that the argument is from examples to the general case.

A-analogy: The argument is by analogy to a similar situation.

A-intuition: Here, the author appeals to the reader's intuition.

Solution: This link connects the posing of a problem with the presentation of its golution. In some cases, however, authors will pose problems (often at the end of a paper) for which they currently have no solution. Such "needed work" is often a form of solicitation to the community (see Section 8.4). In such a case, a later author could continue the research by writing a paper containing a solution to this problem. A Solution link could then be used to point to the "problem posing" nodes in the earlier paper. (See also the description of the Future link.)

The next set of link types are used to connect pairs of nodes only one of which normally need be read. For example if chunk C2 is a rephrasing of Cl there is usually no need to read both.

Summarization & Detail: The ideas in one node are summarized (given in greater detail) in the other.

Alternate-view: The ideas are viewed in a new way.

Rewrite: Here, the ideas are the same, but the wording has been changed.

Finally, we list and describe the remaining normal link types.

Explanation: Provides an explanation for some part of the node's text.

Simpfication & Complication: Offers a simplified look at part of the node or adds complicating factors to the points provided in the node.

Update: A link to new information bringing a node up to date, or to a new node that incorporates the new information.

Correction: A link to a correction.

Continuation: Connects two nodes which follow one another in the current train of thought.

4.5. Commentary

Every work in the Textnet system is enclosed in a "commentary cloud." The nodes making up this cloud consist of comments by both the work's author and others. Over time, both the work and its comments evolve; the work is augmented, critiquers add to the comments, the author responds by changing the work, readers in turn delete their comment, and so on.

As we will see, the set of commentary link types is quite rich. This is in part because criticism is often directed at one of many tacit assumptions inherent in the critiqued node. For example, a standard citation appealing to an outside source for evidence contains several hidden claims: that the evidence is adequate, valid, relevant, properly interpreted, and that other existing evidence to the contrary has been properly addressed. Accordingly readers can criticize the citation by disputing any of those assumptions.

We should note here that the organization of the original work imposes no real restrictions on the critiquing possible by readers. If several arguments and conclusions are contained in one chunk rather than being broken into several nodes, then the critiquer merely has to be careful to specify the particular point under discussion.

We begin with some general purpose commentary links.

Comment: This is the generic comment link. It is usually by a reader (or referee), but can also be used by a work's author to offer comments on his own work. This subsumes all other links appearing in this section.

Critical: The general purpose critical link.

Supportive: The general purpose supportive or affirming link.

Refering to the framework given in Section 4.3 describing a work, recall that a work can specify context, pose problems, state a thesis, give arguments, and provide data. Each of these can serve as the object of criticism. We consider each in turn.

4.5.1. Environment

As we have seen, a work lives in an environment of related work. It should start out weil placed in the environment, refer to this literary context when appropriate, and end with conclusions which significantly affect the field. Readers can criticize a work's relation to its environment in any of the following ways.

E-comment: General comment on a node's relationship (or lack of one) to its environment.

E-misrepresent: A cited node is misrepresented. For example, the critiquer claims that the author misconstrued the cited work as being supportive when in fact it is irrelevant to the topic or even refutes the author's arguments. This link type can also be used to critique an author's priority claims.

E-vacuum: This work needs to be placed properly in its environment. It requires more citing of previous work.

E-ignored: A particular work W should have been cited, but wasn't. This is a special case of E-vacuum. In order to use this link, a pointer to a node N of W must be furnished. This can be accomplished in one of two ways. (1) The E-ignored link directly connects the critiqued work to N. Further comments (in the form of a chunk), if necessary, are linked to the E- ignored link itself as annotation. (2) The E-ignored link connects the critiqued work to a chunk containing a furthur explanation of the reader's complaints. This chunk in turn is linked to N. The next four link types are special cases of E-ignored.

E-Isupersede: The ignored work W supersedes this work.

E-Irefute: The ignored work W refutes this work.

E.Isupport The ignored work W supports this work.

E-Irepeat: This work repeats what was done in the ignored work W.

4.5.2. Problem Posing

Works also pose problems (and often propose solutions). Such a problem can be criticized as being trivial, impossible, unimportant, ill-posed, already solved, or too difficult.

P-comment: General comment on the problem.

P-trivial: Problem is trivial as stated.

P-unimportant: Problem is unimportant (or uninteresting) for researchers in this field.

P-impossible: Problem is logically impossible as stated.

P-ill-posed: Problem is badly stated.

P-solved: Problem has previously been solved.

P-ambitious: The author is overly ambitious. The problem is too difficult at present.

4.5.3. Statement of Thesis

The next function of a work is the statement of a thesis. Often, this is done throughout the work by stating and arguing for certain points. The next set of link types criticize a particular point or set of points. Often, the same link types can be used to attack the entire thesis or premise of the work.

Pt-comment: General comment on the point.

Pt-trivial: The point is trivial.

Pt-unimportant: Who cares? The point is urnmportant, uninteresting.

Pt-irrelevant: The point is irrelevant to the task at hand.

Pt-redherring: The point is not only irrelevant, but distracting from the main issue. This is stronger than Pt-irrelevant.

Pt-contradict: The point contradicts another point made by the same author ("double-talk").

Pt-dubious: The point is of doubtful validity.

Pt-counter: The point is disputable by way of a counterexample.

Pt-inelegant: The point (or theory) is overly involved or intricate.

Pt-simplistic: The point (or theory) is inadequate; doesn't explain all the evidence.

Pt-arbitrary: The point (or theory) is unsupported, speculative or ad hoc.

Pt-unmotivated: The point is a non sequitor. Or the entire theory is unmotivated. Why do it this way?

4.5.4. Argument

The next function of a work is to argue in favor of its thesis. Recall that the type of argu- ment can be any of deductive, inductive, by analogy, intuition, etc. These arguments can be criticized.

A-comment: General comment on an argument.

A-invalid: General purpose criticism of an argument.

A-insuff: The argument is insufficient to reach the intended conclusion.

A-immaterial: The argument is not to the point, i.e. valid but reaches a different conclusion than the desired one.

A-mislead: The argument is misleading, i.e. has surface validity, but the deep implications are wrong. Alternatively, the argument may lead to other conclusions (not stated by the author) which are untenable.

A-alternate: Here is an alternative argument resulting in a contradictory conclusion. This is often used when the original argument was by analogy. In that case, a new analogy is given leading to an antithetical conclusion.

A-strawman: An imaginary, unrealistic adversary has been set up for easy refuting.

4.5.5. Data

Finally a work often relies on data in some form. For example, in psychology this is often data from an experiment. An historian might appeal to the preserved personal letters of a past political figure while a sociologist might use census data. Again, there are several ways in which this data can be criticized.

D-comment: General comment on the date

D-inadequate: Data is inadequate or incomplete.

D-dubious: Data is of dubious validity.

D-ignored: Certain other data (perhaps used in another work) has been ignored. This is sirnilar to E-ignored.

D-irrelevant: Data is irrelevant to the task at hand.

D-inapplicable: Data is inapplicable to the task at hand.

D-misintrepeted: Data was misinterpreted, misconstrued or taken out of context. This is more often applicable when the data has been borrowed from another work.

4.5.8. Style and Attitude

Occasionally, a reader will criticize the author's style or attitude. Style comments are often furnished by proof readers of early drafts. A sampling of link types in this category foilow.

S-comment: The general purpose style comment.

S-boring, S-unimaginative, S-incoherent, S-arrogant, S-rambling, S-awkward: These all refer generally to the overall style or attitude of the critiqued piece of text. Readers correcting spelling and grammar will usually rewrite a portion of the node linking to the new version with a link (see Section 4.4).

4.5.7. Supportive Links

At first glance it may seem strange that there are so many more critical link types than supportive ones. The reason for this stems from the relative information yield. A reader seeing a supporting comment, gains very little substantive knowledge (beyond learning that the original author has at least one backer). On the other hand a critique must convey a reason for the disagreement and so imparts to readers another side to the story. Thus it stands to reason that critical links require the richest structure. In fact, the best support one can give a work in general, is to link it in some manner into one's own work.

Nonetheless, readers can offer supportive comments using either the general-purpose Supportive link type or one of the more specialized E-comment, P-comment, Pt-comment, A-comment, D-comment, and S-comment link types.


Contact Information

Randall H. Trigg
Work Practice & Technology Associates

phone: +1-650-325-1639
email: trigg@workpractice.com
web: www.workpractice.com/trigg