Data Modelling with Unibase

Data modelling with Unibase

Data modelling is a important topic for everyone involved in understanding how an entity is structured and how its collection of information should be managed.

Unlike other database systems, Unibase was designed to work by building a model of the data in the application and then to drive the application directly from the model.

How does this work? Unibase uses an active data dictionary that describes the data completely. This is a meta file containing the tables, all their attributes and calculations, indexes, and most importantly the relationship between the tables.

Unlike SQL which is essentially an interpreted (or compiled) language, the Unibase data dictionary is a static description of the application data. Programmers do not embed SQL clauses such as ‘where’ in their php or other code. Instead Unibase refers to the data dictionary to locate a valid path between tables.

The basic structure

The data dictionary is a single text file using an easy to read syntax. In a way it is not dissimilar to many configuration files. It contains table definitions which in turn have four sections.

Refer to the section on Data Dictionary for more detail.

The sections of the data dictionary are: attributes (fields), indexes (keys), associations (join definitions), and restrictions (projections).

Attribute definitions consist of a name, number of characters, optional array size (arrays are first class objects in Unibase), type of attribute, and extra information (a function of the attribute type).

Attributes can be both stored and calculated. However when using them there is no difference to the programmer. Calculated attributes are calculated as needed. The expressions for a calculated attribute can involve attributes in the current table or any other table. Unibase works out how to retrieve the appropriate table and calculate the required attribute. The required attribute could of course be a calculated attribute as well.

There is a lot of semantics as well as syntax at work here and I will elaborate later once the components have been explained.

The next section is the indexes. These are defined explicitly simply because a good selection of indexes makes an application work much faster. But they are also defined explicitly because they are used as part of the definition of a table association. ie how is information in another table related to the information in this table? Note this is a statement of fact, not a verb.

Entity-Relationship designs (or Entity-Attribute-Relationship) empahsise the name of the relationship. Unibase doesn’t care about the name of the relationship, only about what attributes define the relationship. Importantly the attributes do not have to be stored, but can be calculated. However they must exist as an index in the target table.

The next section is the associations.

This section defines how the tables relate. A destination table is related by a combination of attributes which must be an index in the associated table. This is important because when one refers to an attribute in a remote table Unibase will search for a path to the destination table using the associations.

Finally. Restrictions. These are basically a relational projection allowing access only to records that meet the conditions in the restriction.

Alias tables is a way of referring to the same table by a different name. This is necessary in Unibase because the model must be static at any point in time so that it can be analysed and a work strategy computed. Alias tables allow a table to be used in different ways in the one program and also to be defined as a projection.

Semantics

Unibase understands the different ways something can be used. eg An expression may be calculated differently depending on where it used. As a simple expression it is just calculated, but if it is being used in a sub-total or total should it be the sum of the values in the records, or should it be recalculated. This equates to the statement ‘the sum of the averages is not the average of the sums’.

Unibase also has ‘a’ solution to the problem of how to identify the correct path between remote tables. At what point does a path become invalid – one to one, one to many, many to one? What effect does traversing a foreign key have (if any)? This is critical to allowing Unibase to solve the table traversal problem. It is also fundamental to Unibase’ ability to solve much more complex data relationships than can be done by a programmer.

This solution differs from the traveling salesman problem as paths have characteristics. eg a path might be valid or invalid depending on previous paths. This solution is similar to the Djikstra solution, but with extensions.

References to an attribute in a remote table will normally return the sum of the attribute in all the associated records. This makes it really easy to write one line expressions describing what can be quite complex applications.

Numbers, however, are not always numbers. This is a major limitation of traditional databases and an important semantic differentiator. eg a product price is a number, sort of. It doesn’t make sense to add up product prices (unless you are looking for an average), but it does make sense to multiply a price by a quantity to get a value. Unibase has syntax to cover this situation.

Associations can also be ‘fuzzy’. When one does curve fitting, exact values cannot be found in the target table for the current attribute when they are calculated (or even stored). In this case Unibase looks for the records with a matching attribute less than or equal to the source attribute.

Conclusion

This has been a very quick and simple introduction to data modelling in Unibase. Unibase has been used to model large complex systems and can do it with ease. In many cases solving problems that are very difficult for programmers. This enables it to extend the range of applications that can be written.