Copied on Tue Jun 20 09:05:50 EDT 2000 from a file written by Robert L. Walton // Introduction to Infostore // Summary of Infostore Object Types // Memory Layout Rules // Shared Infostores // Processes and Infostores // Infemes // Input/Output // Introduction to Infostore // ------------ -- --------- // // Infostore is a system for storing information in // objects of pre-defined types. There are objects for // storing numbers, words, lists, tables, vectors, // arrays, texts, and labeled graphs. // // Infostore objects can be associated with properties // and flags that make it possible to display object // values appropriately. For example, arrays can be // labeled so they may be displayed as spreadsheets, // a flag can be set on a word to cause it to be // capitalized when output, and flags and properties // can be associated with list elements to provide // formatting information. Word and list objects can // therefore store information that permits them to be // formatted for word processing. // // The general principle is that data should be both // displayable and easy to compute with at the same // time. Infostore is in part an attempt to find data // structures that will do for word processing and // mathematical expression processing what labeled // arrays and spreadsheets did for classical data // processing. // // An `infostore' proper is a collection of infostore // objects. Groups of objects in an infostore can be // designated as read-only infemes, which play the role // of files in a file system. The infostore then plays // the role of a directory. The infostore system is in // part an attempt to find a more displayable and finer // grained alternative to files. // // An infeme can be assigned a 128 unique ID, called a // NIQ. By using 128 bit truly random numbers as NIQs, // even computers that are totally disconnected from // each other can create different new NIQs. Using // NIQs, infemes can point at each other. Because // infemes are read-only, they can be cached in a // distributed system without introducing consistency // problems. The infostore system is in part an attempt // to provide a storage system for distributed systems // in which connectivity is sporadic. // // Several programming systems have been based on // storage of general information in objects of pre- // defined types. Notable examples are LISP and MatLab. // Infostore is intended as a successor to these systems // that provides labeled matrices adapted to spreadsheet // style applications, words and lists adapted to word // processing, labeled graphs adapted to calculation // with mathematical expressions, and infemes adapted // to store information in sporadically connected // distributed systems. // Summary of Infostore Object Types // ------- -- --------- ------ ----- // // The following is the type hierarchy of infostore // object values: // // object --+---- float64 Floating point number. // | // +---- niq Unique identifier. // | // +---- word Read only character // | string. // | // +---- line Read-write text line. // | // +---- vector Vector of object values // | and/or bytes. // | // +---- array Linear map from subscri- // | pts to vector elements. // | // +---- table Hash table of key/value // | pairs. // | // +---- graph Labeled graph with graph // | edit history. A list is // | represented as a graph // | node whose arguments are // | the elements of the // | list. // | // +-----text List of lines. // // // An object value is a 64-bit floating point number or // a NaN (a non-numeric value stored in a 64-bit // floating point format). NaNs with a particular 16 // bit header are pointers to object stubs (see below), // and represent objects. NaNs with a different 16 bit // header are used for specialized values in various // contexts. Other object values are actual 64-bit // floating point numbers, or standard non-numeric // values like + or - infinity. Integers are represent- // ed by the corresponding floating point numbers. // // An `object value' has type `isd::object::value' which // is typedef'ed to be the same as a C++ 64 bit integer, // or `isd::int64'. This value can be tested to see if // it holds a pointer to an object stub, and if yes, the // pointer can be extracted from the value. This value // can also be tested to see if it holds a floating // point number, and if yes, the value can be converted // to an `isd::float64' value without performing any // hardware operations on the value (though on some // machines the value may be copied from an integer // register to a floating point register). // // All objects except floating point numbers have stubs. // Objects with stubs have flag bits and properties that // can be set and read by users. Properties are key/ // value pairs, where keys are non-numeric object values // or integers and values are arbitrary object values. // List elements and labeled graph nodes also have flags // and properties. By use of flags and properties, word // processing format information can be included in // words and lists, spreadsheet format information can // be included in arrays, and expression format informa- // tion can be included in labeled graphs. // // Lists are store in labeled graph nodes. Thus there // is no separate list type. Texts are represented as // lists of lines, so there is no separate text type. // Memory Layout Rules // ------ ------ ----- // // An infostore object has a 32 byte stub and zero or // more bodies. A body is a contiguous block of memory // that can be moved, resized, or deallocated at `any // time'. // // Because bodies can be moved or deallocated at any // time, and because all stubs are the same size and can // be allocated from a vector of stubs, an infostore // memory admits high quality real-time garbage // collection algorithms. // // Actually `any time' in the current implementation // refers to the times that an interrupt check function // is called. This inline function, which normally just // checks that an interrupt request variable is zero, // and does nothing else, must be called frequently to // ensure that interrupts can happen often. Typically // the interrupt check function is called at the // beginning of any non-inline function and whenever a // loop iterates. // // A pointer to an object is actually pointing at the // stub of the object. Pointers into bodies of objects // can only exist in the stub of the object, or // temporarily between calls to the interrupt check // function, or in pointer structures that are // themselves pointed at by the object. // // A pointer to an object that is stored in a second // object is relative, so that any infostore containing // both objects can be easily relocated in virtual // memory. // // A pointer to a target object that is stored in a // source object is stored as a 32 bit `object stub // relative address', or `osradr32', value. The // osradr32 value is a signed 32-bit integer that stores // the byte address of the stub of the target object // relative to the byte address of the stub of the // source object. // // In systems where osradr32 values store relative byte // addresses exactly, all osradr32 values are divisible // by 32 bytes exactly, and the stubs of any objects // that can reference each other must have relative // addresses that do not exceed 2 gigabytes in absolute // value. // // As on option, a system may be configured so osradr32 // values store relative byte addresses divided by 32. // This permits stubs that reference each other to have // relative addresses as large as 64 gigabytes, at the // cost of some execution speed. // // Note that two objects can only reference each other // if they occur in the same infostore, or in a system // of related infostores. See Shared Infostores below. // Thus the 2 or 64 gigabyte limits apply to the size of // the stub-containing areas of a single infostore or // single group of cross-referencing infostores. Object // bodies are stored separately from stubs and can be // outside regions of memory whose size is limited. // // The infostore system has an `object value' type whose // values are 64 bit IEEE floating point numbers or // NaNs. NaNs with a particular 16 bit header are used // to store 48 bit pointers to object stubs. Thus // the virtual memory of a process is limited to 256 // terabytes. // // When an object value is stored in an object, any NaN // with a 48 bit pointer to an object stub is converted // to a NaN that contains a 32 bit header and a 32-bit // osradr32 pointer to the object stub. // // Pointers stored in an object stub that point into the // object's body, and pointers from one part of an // object's body to another part of the same body, are // stored as 64 bit relative byte addresses, called // `radr64' values. The address denoted by an radr64 // value is the byte address of where the radr64 value // itself is stored, plus the radr64 value itself. Thus // the radr64 value is a 64 bit integer relative byte // address. The `r' in `radr64' denotes `relative'. // // Pointers at aligned 64-bit values in an object body // that are relative to some base point in the same // body may also be stored as rindex32 values. These // hold relative indices of aligned 64-bit values. // // For osradr32, rindex32, and radr64 values, 0 is // usually used to denote a missing address. However, // this cannot always be done, as an object can // sometimes point at itself, and more rarely, a // location containing a relative address can sometimes // point at itself. The default is for 0 to denote the // missing address: exceptions are documented. // // All object stubs begin with a 32 bit unsigned integer // used to store flags, and an osradr32 value pointing // at an optional property list. The flag integer // contains a 5 bit type code, 11 other system flags, // and 16 user flags. The latter can be set by the // infostore user. The system flags include, for // example, a read-only flag to mark objects that // cannot be written. // // As an example, a vector object stub contains a 32 bit // unsigned number of object value elements, a 32 bit // unsigned number of bytes, and the 64 bit relative // address of a base point within the body of the // vector. A vector stores any number of object value // elements and separately any other number of bytes. // The bytes may not contain object values or other // forms of pointers to objects, but may contain numbers // of any type, including indexes of vector elements and // offsets of bytes within the vector body. The i'th // object value element is stored at offset - 8 * i - 8 // from the base point, while the j'th byte is stored at // offset j from the base point. Provision is also made // for osradr32 vector elements that can store pointers // to object stubs in half the space of a full object // valued vector element. // // There is also another kind of vector object stub // called a self-contained stub that has no body. // Instead of a body, 20 bytes of the stub itself // are used to hold information that would be held in // the vector body. This allows small vectors to be // allocated without the overhead of allocating small // bodies. // // Other types of object are similar. For example, a // word object stores a NUL-terminated sequence of // characters, and has as self-contained version that // can store up to 15 ASCII characters plus NUL. // // A body is a sequence of consecutive 128-bit aligned // memory units. Bodies can be moved at `any time' to a // different region of virtual memory. Bodies are // separated in memory by 128-bit body headers that // contain the size of the next and previous bodies and // a pointer to the stub of the next body. // Shared Infostores // ------ ---------- // // An infostore is a place to store objects. Each // infostore consists of regions that store object // stubs, and regions that store object bodies. // // One infostore X may reference objects in another // infostore Y. In this case, the locations of the stub // regions of Y must be fixed relative to the locations // of the stub regions of X. When X is created, space // is typically reserved for the stub regions of Y. // This means that the stub regions of Y become fixed // in size and location relative to each other whenever // Y becomes referenced by another infostore X. It does // not mean that the stub regions of Y need to be filled // with existing stubs: just that room needs to be // reserved to allocate the maximum number of stubs that // Y will ever have. The only way to add new stub // regions to Y after Y has become referenced by other // infostores is to track down and consult all these // referencing infostores, which may not be practical. // // Since the stub regions of Y can only be used to // allocate Y's stubs, and any unused space in these // regions will appear in memory whenever another // region X references Y, the stub regions of Y should // not be unreasonably large. // // There are two situations where new stub regions can // be allocated to an infostore Y or stub regions of Y // can be relocated relative to each other. First, if // Y belongs to a cluster of infostores that reference // each other but that are not referenced by infostores // outside the cluster, then new stub regions can be // allocated to members of the cluster by taking into // account the relative locations of all stub regions // of all cluster members, and the stub regions of // cluster members can be relocated relative to each // other. Second, some processes can load several // infostores and relocate their stub regions, without // allocating new stub regions. The process can then // write the infostores back out after undoing any stub // region relocation. Note that whenever stub regions // are relocated relative to each other, all pointers // to or from objects with stubs in the regions have to // be adjusted. // // Shared objects are marked as being shared. In // general, pointers into the body of an object may // exist in pointer structures that can be allocated to // a stack frame and that are themselves pointed at by // the object. But for shared objects, this is not // practical, for the object would have to point at // a different list of pointer structures for each // process sharing the object. So each pointer // structure pointing to a shared object is placed // on a per process list of such pointer structures. // This makes it much less efficient to find and // adjust these structures should the object body // move or be deallocated. Also, since all processes // that share the object have to be interrupted to move // or deallocate the object's body, such movement or // deallocation is much less efficient in any case. // Therefore, shared object bodies are not moved or // deallocated during normal system operation. // Processes and Infostores // --------- --- ---------- // // An infostore can be opened by one or more processes // so the processes can read or write the infostore. If // only one process has an infostore open, and that // process has the infostore open for writing, then that // process may perform any operation on the objects in // the infostore. // // A process may open an infostore read-only. In this // case the infostore is usually placed in read-only // virtual address pages for that process. If no other // process has the infostore open for writing, then the // process with the infostore open read-only may // perform any read operation on the objects of the // infostore. // // If an infostore is open for writing by one process // and also open for either reading or writing by a // second process, both processes must obey special // rules. In this case the objects in the infostore // are classified as shared or non-shared. A non-shared // object may be accessed by only one process, and that // process can perform any operation on the object. A // shared object must only be accessed according to // special rules. // // Every infostore has a root value that can point at a // root object for the infostore. An object is shared // iff it is or has been reachable from the infostore // root value. A process with an infostore open for // writing can allocated non-shared objects and perform // any operation on non-shared objects. Such a process // can make non-shared objects shared by writing a // pointer to these objects into some already shared // object, using a special operation as defined below // to perform this write. // // A shared object may have read-only components and // read-write components. The read-write components // can be accessed by special `shared' operations only. // The read-only components can be accessed by any read // operation. The read-write components must NOT be // accessed by non-shared operations, and the read-only // components must NOT be accessed by write operations. // // The shared operations are atomic read and write // operations of certain object components called // sharable components. The following is a list of the // sharable components: // // the root value of an infostore // any property of any object // any entry in a table // any element of a vector // // Note that shared read and write operations can take // much longer to execute (up to 10 times longer, // typically), than non-shared read and write // operations. // // The bodies of shared objects are not easily moved or // deallocated. This is because all processes that can // access a shared object must be interrupted temporar- // ily before a body can be moved or deallocated. // Therefore the default rule is that the bodies of // shared objects are immobile, meaning they cannot be // moved or deallocated. // Infemes // ------- // // An infeme is a rooted subgraph of infostore objects. // Specifically, an infeme has a root that is an // infostore object, and contains that root and some // objects reachable from the root through pointers. // // All the objects in an infeme are read-only, and // cannot be changed, with an exception concerning // properties detailed below. // // Infemes are given unique identifiers, or NIQs. A // NIQ is a 128 bit random number that acts as a unique // name for the infeme. Using NIQs, infemes can point // at each other, even when they are not stored in the // same virtual address space, or for that matter, in // the same computer. NIQs can be created simultaneously // by computers that are disconnected from each other, // and still be guaranteed to be distinct, because the // probability that two independent 128 bit random // numbers will be equal is too small to considered. // NIQs are attached to infemes by making them the // `INFEME-NIQ' property of the root of the infeme. // // Infeme root objects are marked with the INFEME_ROOT // system flag. An infeme with a given root object // consists of all objects reachable by tracing // infostore pointers between objects by a path that // starts at the root and does not touch any other // infeme root. When an infeme is written, the objects // in the infeme are written, and for any other infeme // roots pointed at by one of these objects, the NIQ // of that other infeme is written. When the infeme // is read, the NIQs of these other infemes are read // and used to search for the other infemes in memory. // If such an infeme is found, a pointer to that infeme // is stored in the infeme being read. If the other // infeme is not found, a stub is allocated for it, // and its NIQ is attached to the stub. // // Whenever an infeme is read, a search is made for a // stub with the same NIQ as the infeme, and if found, // this stub is used as the stub of the root of the // infeme. // // Here use is made of the ability of infostore to // allocate an object stub and give the stub object // properties before the type and contents of the object // are known. Infostore objects that do not yet have // a known type are simply called `stubs', and appear // to have the STUB type. // // Although infeme objects are read-only, they can have // properties that accumulate information. A property // of an infostore object is a key/value pair. The key // acts as the name of the property, and the value is // the current value associated with this name. A // property can be read-write, but not in an infeme. A // property can be read-only in an infeme. A property // can also have a variety of property types that permit // the property to accumulate information. For example, // a write-once property can be written only once, and // if written again with a different value, becomes // `tarnished'. As another example, a max-merge // property remembers the maximum numeric value written // into it; and if a non-numeric value is written, the // property becomes `tarnished'. There are more complex // types of property that merge information in more // complex ways, including a filter-merge property that // accumulates all the values written into the property // in a set, but permits the user to occasionally run // a filter function that the user provides to compact // this set of values. // // Infemes are created by first creating their objects // and then marking the infeme root with the INFEME_ // ROOT system flag. When the root is marked, the // objects of the infeme are located and marked READ_ // ONLY. Objects other than roots may be shared among // infemes. To prevent an object already in an infeme // from being made the root of another infeme, a root // may not be READ_ONLY when it is designated to be // marked as a root. Other objects in the infeme may // be READ_ONLY and already be parts of other infemes. // Input/Output // ------------ // // Infostore's are designed so that an infostore can be // stored in files that are then mapped into virtual // memory at different addresses in different processes. // The addresses of parts of an infostore are fixed // relative to each other, but the entire infostore may // be located at any absolute address. This is because // all pointers stored in an infostore are relative. // // Specifically, an object in an infostore can contain // two important kinds of pointers: osadr32 pointers // that contain the address of one stub relative to // another stub, and radr64 pointers that contain the // address of a location in an object body relative to // a location in the same object's stub. All other // pointer-like values hold the address of some location // relative to another location in the same stub or // body, and do not try to point from one object to // another or from a stub to a body. // // An infostore consists of two parts: an object stub // area and an object body area. The end of the object // stub area is co-located with the beginning of the // object body area. The object stub area grows down // in relative address space, while the object body area // grows up. // // If an infostore is not extremely large and is not // changing in size, it can be conveniently stored in // a single file. But if an infostore is likely to // grow, it may be convenient to add new object stub // area files whose addresses when mapped into memory // are lower than those of existing object stub files. // This is because it is hard to grow files in the // negative direction on most operating systems, so when // the object stub area runs out of space and needs to // be grown to lower addresses, either a new file must be // allocated to hold more object stubs, or everything in // the old file must be shifted up within the file. // This last is inefficient. // // The object body area may need to be divided among // more than one file because of file size limits // imposed by the operating system. For example, if // an operating system limited file size to 2 giga- // bytes, 5 files would be required for a 10 gigabyte // infostore body area. // // As far as mapping infostores into virtual memory is // concerned, a virtual address space can be configured // in one of two ways: standard or split. In the stand- // ard mapping configuration, all infostore stub and // body areas are located in virtual memory so that the // relative addresses stored in the infostore files are // correct for operation of the infostore in virtual // memory. This means that the end of each infostore's // object stub area is co-located with the beginning of // that infostore's object body area. // // The deficiency of the standard mapping configuration // is that all infostores together must fit into 64 // gigabytes of virtual address space. This is the // limit imposed by the osadr32 relative address format // which permits one object to point a stub of another // object, but requires the stubs of the two objects to // be within 64 gigabytes of each other. // // In the split mapping configuration, the object stub // area of an infostore and the object body area of // the same infostore are relocated independently. This // means that relative addresses of bodies that are // stored in stubs must be adjusted by adding an offset // which depends upon the infostore and the virtual // address space into which it is mapped. So every // time a relative address is accessed that is in an // object stub and that points into a body of the same // object, the infostore containing the stub must be // looked up, and an offset specific to this infostore // and virtual address space must be added to the // relative address. // // Generally the offset is looked up by taking the // address of the stub, finding from it the number // of a suitably sized page containing the stub and // dedicated to stubs from the same infostore, and // looking the offset up in a vector that converts // page numbers to such offsets. // // There are no relative addresses that are stored in // object bodies and point at stubs, except for the // pointer to a body's stub that is stored in the // invisible header just before the object body, and // that is used for garbage collection purposes. // Usually this is only accessed when the infostore // containing it is already known. Slow lookup methods // can be employed to handle infrequent other accesses, // such as those that might occur when printing infor- // mation about damaged memory regions during debugging. // // Infostores also have different formats depending on // the endianhood and floating point format of the // numbers stored in the infostore. An infostore that // is in the wrong format for the machine on which it // is stored must be converted to the machine's format // before it is used by a computing process. This // conversion consists of in place transformations of // numbers stored within the infostore. For example, // the bytes of numbers are reversed to change the // endianhood of the format, and floating point numbers // are converted to change the floating point format. // // In order to do format conversion of some vector // objects, the vector's data description language (DDL) // program must be available. If it is not, the vector // may get stuck in a form where part of the vector has // a format that is not the same as the format of the // rest of the infostore, and this part of the vector // is not usable by processes computing with the // vector. This sort of problem cannot happen with // other kinds of infostore objects. // // When single infemes are copied to external media, // issues similar to those just described are encounter- // ed. The formats of object parts may need to be // changed to match the format of the media. The // relative addresses of stubs and bodies generally // change, and the important pointers mentioned above // need to be adjusted. // // Since the computational costs of making these adjust- // ments is high, an infeme can be output in a form in // which the adjustments are deferred. In this form an // infeme consists of a table indicating the adjustments // to be made, followed by direct copies of the object // stubs and bodies in the infeme. When the adjust- // ments are made, the table at the beginning of the // infeme can be removed.