Dynamic Alternatives
P.O. Box 59237
Norwalk, CA 90652
dynalt@dynalt.com


S U M M A R Y


DIARY: June 1, 2005 06:58 AM Wednesday; Garold L. Johnson

RS - Preserving Knowledge Over Time

1...Summary/Objective
2...Physical Preservation
3...Data Formats and Readability
4...Fonts and Related Issues
5...Text Formats
6...Changing Operating Systems
7...Changing Programming Languages
8...Scanning and OCR
9...Changing Natural Language
10...Possible Mitigating Approaches


..............
Click here to comment!

CONTACTS 

SUBJECTS
Rational Synthesis
Format Text

0604 -
0604 -    ..
0605 - Summary/Objective
0606 -
060601 - Follow up ref SDS 4 0000. ref SDS 3 0000.
060602 -
060603 - A prerequisite to the immortal organization is a system of
060604 - knowledge organization and preservation, ref SDS 6 EL52.
060606 -  ..
060607 - For ancient societies, the amount of knowledge was manageable by well
060608 - trained human memory and numerous books in the form of libraries.
060609 - Even for them, it got out of hand.
060611 -  ..
060612 - Modern society is running into this problem in a really big way due to
060613 - the sheer volume of information and changing technology.  Computerized
060614 - information is stored on media that can't be read, written by
060615 - applications that no longer run, built for computers that no longer
060616 - exist.  Add to that the deterioration of the media, and you have a
060617 - monumental problem.
060619 -  ..
060620 - For our discussion, we need consider only computer data. The problem
060621 - of converting print, audio, and video information to computer form is
060622 - huge, but it, at least, is being addressed.
060624 -  ..
060625 - All of this is before we even consider the question of how useful
060626 - information as opposed to knowledge or wisdom might be.
060627 -
060628 -    Since it isn't at all unusual for information that an individual
060629 -    records to become useless due to lack of orienting context even on
060630 -    a single project, the problem of preserving meaningful information
060631 -    over long periods of time is a huge one.
060632 -
060633 -
060635 -  ..
060636 - Physical Preservation
060637 -
060638 - One of the issues is simply shelf life of the media on which
060639 - information is stored.
060640 -
060641 -    •  Magnetic media has shelf life measured in years.
060643 -        ..
060644 -    •  CDs and DVDs claim shelf life of about 100 years.
060646 -        ..
060647 -    •  There are engraved media and molecular storage under development
060648 -       that claim shelf life in the range of millenia.
060650 -        ..
060651 -    •  Crystal storage is still mostly science fiction.
060653 -  ..
060654 - The solution, then is to transfer to new media on a regular rotation.
060655 - With digital data, copying to new media creates a new record that is a
060656 - true duplicate of the original. This must be an ongoing process in
060657 - the organization responsible for the information, or something will
060658 - get left behind.
060659 -
060660 -
060662 -  ..
060663 - Data Formats and Readability
060664 -
060665 - Computer data is written to specific formatting and media standards.
060666 - As those standards change, the information must be converted to mew
060667 - media or new formats.
060668 -
060669 -    At the moment, we are migrating from 3-1/2" diskettes to CD-R/RW
060670 -    and DVD.
060672 -     ..
060673 -    Already, it is difficult to get a 5-1/4" diskette to run on a
060674 -    modern computer.
060676 -     ..
060677 -    To read 8" diskettes requires not only obsolete hardware but
060678 -    obsolete computers as most of them were written with CP/M.
060680 -     ..
060681 -    When you have to deal with media any older than that, you are
060682 -    involved in a major project:
060683 -
060684 -       •  Get hardware to read the media.
060685 -
060686 -       •  Get the computer and the operating system to support the
060687 -          hardware.
060689 -           ..
060690 -       •  Get the application that can read the file format.
060692 -           ..
060693 -       •  Export the data to a format that can be interpreted.
060695 -           ..
060696 -       •  Put it on a medium that can be read by modern machines.
060697 -
060698 -          Getting some path from the old system to a new system can be
060699 -          a challenge.
060701 -  ..
060702 - This becomes an issue for SDS as we believe that we are at the start
060703 - of a revolution that will move from a culture of information to a
060704 - culture of knowledge. Indeed, there are already numerous competing
060705 - approaches and formats for encoding relationships, metadata, etc.
060707 -  ..
060708 - We need to document the knowledge that is embedded in the formats of
060709 - the data that we record, and keep in mind ways to convert data to
060710 - newer forms as we update those formats.
060711 -
060712 -
060714 -  ..
060715 - Fonts and Related Issues
060716 -
060717 - Adobe and others have been addressing the problem of fonts being used
060718 - in documments that aren't available on the machine reading the
060719 - document. There has been some progress in that arena, but I don't
060720 - know where it stands currently.
060721 -
060722 -
060724 -  ..
060725 - Text Formats
060726 -
060727 - ASCII text is still universal. Any format based on text is
060728 - potentially more readable than a binary format.
060729 -
060730 - This isn't totally true as you still need to understand the document
060731 - structure. Rich Text, for example, is all text, but the markup
060732 - requires a large document to describe.
060734 -  ..
060735 - XML and it relations are all text, but since the grammar can be
060736 - changed, the Document Type Definition (DTD) is needed if there is one
060737 - in order to interpret the data correctly. As the standard changes, it
060738 - is easy to see that it can become difficult to impossible to read
060739 - documents written to earlier versions of the standards unless those
060740 - standards and the programs that use them are maintained.
060741 -
060742 -    If the general pattern of the langauge is maintained, and the
060743 -    standard is defined in a parseable form, programs to convert older
060744 -    standards to new ones are practical.
060746 -  ..
060747 - Even when there is a way to update information to new media and new
060748 - standards, the task of keeping the information in sufficiently
060749 - current form to be usable becomes huge.
060751 -  ..
060752 - Then we run into the massive KM issue of how to know what to save. It
060753 - is extremely difficult to judge what may become important in the
060754 - future. Once the data is saved, finding it now becomes a major task.
060755 - It isn't any easier to find old information in the internet archives
060756 - than it is to find current information on the internet.
060757 -
060758 -
060760 -  ..
060761 - Changing Operating Systems
060762 -
060763 - This is an ongoing issue as well. Changes in operating systems
060764 - obsolete existing programs in spite of efforts at compatibility.
060765 -
060766 -    DOS support gets more and more problematic.  The 16-bit assembly
060767 -    langage for the SDS editor is supported only by keeping an old
060768 -    version of the assembler around.  It isn't comapatible with current
060769 -    versions.
060771 -     ..
060772 -    Windows 3.1 support is likely worse.
060774 -     ..
060775 -    Changes to Linux routinely make binaries obsolete.
060776 -
060777 -
060779 -  ..
060780 - Changing Programming Languages
060781 -
060782 - Programming languages change more slowly than operating systems.
060783 -
060784 - Proprietary languages are often obsoleted when their computing
060785 - platforms are. Data General's DGL is an example.
060787 -  ..
060788 - While program source code is all text, most often the source for the
060789 - supporting libraries isn't available even when the source for the
060790 - application is.
060792 -  ..
060793 - The problem of porting an application written in an obsolete language
060794 - or dialect to run on a new system can be major, even when a modern
060795 - version of the language or a similar one is available on the new
060796 - machine. If there is no version of the language on the target system,
060797 - the problem gets a lot worse, requiring translating the source to
060798 - another programming language, or creating a version of the language
060799 - on the new system. There has to be some real value in the programs
060800 - and their data to justify this sort of effort and expense.
060801 -
060802 -    We face a version of this with the macro language in which SDS is
060803 -    written.
060805 -     ..
060806 -    This is precisely the problem that Doug Engelbart faces with
060807 -    Augment -- it is written in a lnaguage that runs only on an
060808 -    obsolete machine. The data may have been saved, but the programm
060809 -    will have to be rewritten or translated based on the source code
060810 -    which is written in a language that no longer exists.
060812 -  ..
060813 - So, porting applications in order to convert formats is rarely
060814 - workable. However, data alone is seldom sufficient without the
060815 - knowledge encapsulated in the programs that operate on it.
060816 -
060817 -
060819 -  ..
060820 - Scanning and OCR
060821 -
060822 - For paper documents, Optical Character Recognition (OCR) can often
060823 - recover the text. Failing that, it can be entered by humans if it is
060824 - sufficiently valuable.
060825 -
060826 - Graphics are somewhat easier in that they can be treated as pictures.
060827 - So long as there is no need to manipulate the structure of the
060828 - graphics, this is sufficient.
060829 -
060830 -
060832 -  ..
060833 - Changing Natural Language
060834 -
060835 - Then there is the problem of changes in human language. Some
060836 - languages have been lost over time. Others have changed so much that
060837 - ancient dialects can no longer be understood.
060838 -
060839 -    Even when the original language can be understood and translated,
060840 -    the original form may need to be preserved in the hope that an
060841 -    even better translation can be made in the future.
060842 -
060843 -
060845 -  ..
060846 - Possible Mitigating Approaches
060847 -
060848 - The question becomes: "Where do you draw the line?"
060849 -
060850 - Starting from the primitive, we have:
060851 -
060852 -    1.  Computer chips and hardware
060854 -         ..
060855 -    2.  Computer systems
060857 -         ..
060858 -    3.  Operating systems
060860 -         ..
060861 -    4.  Compilers and languages
060863 -         ..
060864 -    5.  Support libraries
060866 -         ..
060867 -    6.  Application programs
060869 -  ..
060870 - Starting at the software level, it is possible to go totally open
060871 - source and maintain the ability to rebuild the entire operating
060872 - system, and, in principle, even port it to new hardware.
060874 -  ..
060875 - While this seems impractical for a single individual, it is certainly
060876 - within the reach of even a modest sized organization.
060878 -  ..
060879 - The more sophisticated the systems become, however, the more code
060880 - needs to be handled to make this work. The undertaking can become
060881 - huge very quickly.
060883 -  ..
060884 - The lowest level that looks like it could make sense would be to use
060885 - a portable lanaguage with portability libraries, where both are
060886 - maintained by the community.
060887 -
060888 -    It would be possible to use the gnu compiler with a package like
060889 -    wxWindows as a base. By using standards for portable C++ code
060890 -    carefully, it should be possible to maintain code compatibility
060891 -    for long periods of time.
060893 -     ..
060894 -    For really long times, we would need to be able to upgrade the
060895 -    entire codebase to be compliant with emerging standards, and that
060896 -    could be a real challenge.
060898 -  ..
060899 - For most people in most places, keeping control over the source code
060900 - that they write is about the best that can be done.
060901 -
060902 -    When changes to the operating system or the language break the
060903 -    program, fix it.
060904 -
060905 -    If the changes are too massive, port it to something close.
060907 -     ..
060908 -    Beyond that, rewrite it in a new language or operating system
060909 -    using the source code as a guide.
060911 -     ..
060912 -    It isn't great, as it requires massive energy periodically.
060914 -  ..
060915 - It is questionable whether any software has lifetimes that long, but
060916 - some has surprised iots developers.
060917 -
060918 -
060919 -
0610 -