Date: Tue, 25 Jan 2000 19:51:17 -0800
From: | Eric Armstrong |
eric.armstrong@eng.sun.com Reply-To: unrev-II@onelist.com |
To: | unrev-II@onelist.com |
Subject: | DKR: Document Management Requirements, v0.1 |
Overview
This is a lengthy document aimed at adducing the requirements for a subset of an eventual Dynamic Knowledge Repository (DKR). The subset described is for a Collaborative Document System (CDS). The goal of this document is to show how such a system fits into a DKR framework, and detail its requirements.
The version number of this document (v0.1) represents the early stage of the process.
This document has the following sections:
A fully functional DKR will need to manage many different kinds of things:
Since the general outline of a DKR seems to depend on the problem domain it is targeted for, it seems reasonable to focus attention on the elements they have in common.
This set of requirements will focus on what is perhaps the major common feature: Documents -- in particular, Collaborative Documents.
Other important areas that will need attention include the integration of multimedia objects (including animations, simulations, audio, video, and the like) as well as the critical functions of abstract knowledge representation, inference engines, model-building functions, and the integration of other executable programs. But here, we'll focus on Collaborative Documents.
A wide variety of email and forum-based discussions occur on a host of topics every day. In each of these discussions, important information frequently surfaces, but that information is hard to capture where you need it. .
Document production systems, on the other hand, simplify the task of creating complex documents but make it hard to gather and integrate feedback.
For example the DKR discussions have identified several possible starting points for such a system. That kind of feedback occurs naturally in an email system, as opposed to a document production system, but each of the pointers was buried in a separate email. It required lengthy search to gather them together (below), and the list may not even be complete!
To act as a foundation for a DKR, a Collaborative Document System (CDS?) needs to combine the best features of:
In the DKR discussion, we've seen pointers to several possible starting points for such a system:
Note: We don't need the app, but Augment's requirements documents would be *highly* desirable.
http://www.infoloom.com/ http://www.topicmaps.com/
http://www.dmtf.org/spec/cim_spec_v22/#_Toc453584954)
www.mindmanager.com
The lengthy list above, the difficulty of creating it, and the rapidity with which it will go out of date, several requirements for the DKR suggest themselves immediately. In particular, it needs to be composed of information nodes that are hierarchical, mailable, linkable and evaluable (more on those subjects in a moment).
Each of those requirements leads in turn to other requirements.
The major requirements are listed here and explained below:
This message should exist in outline form. It should be easy to add and remove entries to the list of starting points as more information is gained. However, the hierarchy should function using XML-sytle "entity references" that copy the target contents into the displayed document, "inline". The result is effectively a lattice of information nodes.
Although "hard" links to objects will be needed at times, in most cases the link to the "Requirements Document" should be a "soft" link -- that is, an indirect link that points to the latest version.
That means never having to worry about looking at an old version of the spec.
Each node in the hierarchy needs to be versioned, so that previous information is available. In addition, the task of displaying differences becomes essentially trivial.
Mailable
It must be possible to "publish" the whole document or sections of it by "posting" it. It must also be possible to create replies for individual sections, and then "post" them all at one time.
Distributed
Rather than using a central "repository", the system should employ the major strengths of email systems, namely: fast access on local systems and the robust nature of the system as a result of having redundant copies on many different systems. The system will be more space intensive than email systems, but storage costs are dropping precipitously, future technologies are even brighter.
To mitigate the short-term need for storage space, it should be possible to set individual storage policies. For example, a user will most likely not want to keep previous versions of any documents they are not personally involved in authoring. It must also be possible to add names to the authoring list. Name removal should probably be limited to the original author. For those cases when the original author is no longer part of the system, it should be possible to make a copy of the document and name a new primary author.
When a new version of a document arrives, differences are highlighted. Old-version information becomes accessible through links (if saved). Differences are always against the last version that was visited. If a section of the document was never visited, the most recent version of that section is displayed on the first visit. If several iterations have taken place since the last visit, the cumulative differences are shown. (Again, node-versioning makes this user-friendly feature fairly trivial.)
Clearly support for web links is desirable, as shown by the links to the various possible starting points above. [Note: Each of those should be evaluated against this requirements list, and used to modify these requirements.]
Evaluable
The many possible starting points above highlights the need for evaluablility. It should be possible, not only to reply with a comment on any item in those lists, but also to add an evaluation, much as Amazon.com keeps evaluations for books. That feature is arguably their greatest contribution to ecommerce, and the DKR should make use of it. It should also be possible to order list items using relative evaluations. That lets the most promising starting point float to the top of the list. Not all lists should be ordered by evaluation, however!
For example, the sequence of requirements has been chosen to provide the most natural "bridge" from one to the next! So evaluability must be an option.
Collaborative
The system must increase the ability of multiple people, working collaboratively, to generate up to date and accurate revisions.
For any given document, there are several classes of interaction:
The 3rd group consists of people who suggest an alternative wording or organization. Those "suggestions" take the form of a modified copy of the original. One of the document authors may then agree to use that formulation in place of the original, or may simply keep it as commentary.
The 4th group consists of the fully-collaborative authoring group. The original author must be able to add other individuals to the document, or to subsections of it. (An author registered for a given node has authoring privileges throughout the hierarchy anchored at that node.)
Every information node that is created should be automatically attributed to it's author. When a new version of a node is created, all of the people who sent comments should be contained in a "reviewer" list. When a suggestion is accepted, the author of the suggested node should go into a "contributor" list in the parent node and be added to the "author" list for the current node. It should be possible to identify all of the reviewers, contributors, and authors for the whole document and for each section of it.
The system must be "open" in the sense that a user is not constrained to using a particular editor, email system, or central server. The specifications for interaction with the system should be freely available, along with a reference implementation to use as a basis. As much as possible, conformance with existing standards (XML, XHTML, HTTP, email) is desirable. (The tricky decisions, of course, will be between required features and standard protocols that don't support them.)
The server and client systems that implement the DKR must also be fully *extensible*. In other words, the same characteristics of hierarchy, versioning, and revisability (use of most recent version) that apply to the documents must apply to the system itself.
That extensibility can be accomplished with a "dispatch table" that names the class to use for each kind of object that needs to be created. In conjunction with open sourcing, that architecture allows a user to extend (subclass) an existing class and then use the extended version in place of the original. In addition, upgrades can occur dynamically, while the system is in operation, while allowing for modular downgrades when extensions don't work out.
Security in such a system becomes an issue, unfortunately. The system should employ whatever mechanisms exist or can be constructed to help prevent trojan horse attacks, back door attacks, and other security breaches in an open source system.
What follows is an outline of functional operations for the system:
--Add, change, delete, move nodes
--Link (indirect, "soft" links, and direct "hard" links)
--Automatic versioning
--Automatic attribution
..Deliver to group via server
--Since it is possible to receive comments on nodes that have been deleted from the current (not yet published) draft, the system must maintain "phantom" nodes that can be used to collect such comments. Phantom nodes are invisible until a comment is received, and disappear once the current version is posted. The comments themselves are always stored under the original node.
--Each node needs a trash bin that collects nodes which are deleted from under it. Trash bins are never emptied, except by explicit action requiring multiple confirmations.
--The comment/version-publishing system means that locks are not required for single-author documents. But for multiple authors to collaborate, it must be possible to prevent editing conflicts.
--One possibility is to implement distributed locks. The major issue there is handling communication outages.
--An equally viable possibility may be to allow simultaneous edits and detect their occurrence when a new version is received. The competing versions can then be displayed side-by-side with user-selectable merge options.
--Detection of competing versions may require something other than simple version numbers.
Each node in the system should be able to track the following information:
A hierarchical system is created from only two relationships:
One wonders what such a system will look like after it begins to be extended with thousands of additional relationships.
Sincerely,
Eric Armstrong
eric.armstrong@eng.sun.com