Semantic Overlay Architecture

Draft Community Group Report,

More details about this document
This version:
https://ownyourdata.github.io/soya/
Issue Tracking:
GitHub
Editors:
Christoph Fabianek (OwnYourData)
(OwnYourData)
(OwnYourData)

Abstract

Draft recommendations for data model authoring and publishing platform

Status of this document

This specification is not a W3C Standard nor is it on the W3C Standards Track. Learn more about W3C Community and Business Groups. GitHub Issues are preferred for discussion of this specification.

History

1. Introduction

SOyA is a data model authoring and publishing platform and also provides functionalities for validation and transformation. It includes a libray for integration in other projects, a command line tool for interactive data model management, and an online repository for hosting data models.


Figure 1: Building blocks in SOyA

1.1. Terminology

This document uses the following terms as defined in external specifications and defines terms specific to SOyA.

Attribute

in a Base a single field with a name and associated type
in RDF: a single data type or object property

Base

has a name and a list of Attributes with associated type
in RDF: an RDF Class with one or more properties

DRI

A Decentralized Resource Identifier represents a content based address for a [=terms/structure]. Within SOyA Multihash [MULTIHASH] (default: sha2-256) is used for hashing a JSON object and Multibase [MULTIBASE] (default: base58-btc) for encoding the hash value.

Instance

is a data record (e.g. an data describing an employee) with a set of properties as defined in a Base or Structure
in RDF: instance of an RDF Class

Overlay

additional information about a Base
in RDF: annotation properties attached to an RDF Class or Property

Repository

online storage for Structures with versioning capabilities

Structure

a combination of Bases and Overlays
in RDF: an ontology

1.2. Design Goals and Rationale

SOyA satisfies the following design goals:

2. Composition

The Semantic Overlay Architecture (SOyA) is built on the following core components to describe and manage data models. Those components are:


Figure 2: SOyA Components

2.1. Structures

All artefacts (bases and overlays) in SOyA are declared in a structure and it holds the following information:

An example structure.
meta:
  name: Foaf
  namespace:
    foaf: "http://xmlns.com/foaf/0.1/"

content:
  bases:
    - name: Agent
      subClassOf: foaf:agent
    - name: Person
      subClassOf: 
        - Agent
      attributes:
        firstName: String
        lastName: String
        did: string
  overlays: 
    - type: OverlayAnnotation
      base: Agent
      name: AgentAnnotationOverlay
      attributes:
        gender: 
          comment: 
            en: The gender of this Agent (typically but not necessarily 'male' or 'female').
        birthday:
          comment:
            en: The birthday of this Agent.
        made:
          comment:
            en: Something that was made by this agent.
        age:
          comment:
            en: The age in years of some agent.
    - type: OverlayAnnotation
      base: Person
      name: PersonAnnotationOverlay
      attributes:
        firstName: 
          comment: 
            en: The first name of a person.
        lastName:
          comment:
            en: The last name of a person.
        did:
          comment:
            en: Identifier with keys and service endpoints
    - type: OverlayAlignment
      base: Person
      name: PersonAlignmentOverlay
      attributes:
        firstName: 
          - foaf:givenName
        lastName:
          - foaf:familyName
          - foaf:surname
    - type: OverlayValidation
      base: Person
      name: PersonValidationOverlay
      attributes:
        firstName: 
          cardinality: "0..1"
          length: "(0..30]"
        lastName:
          cardinality: "1..1"
          length: "(0..40]"
close
{
  "@context": {
    "@version": 1.1,
    "@import": "https://ns.ownyourdata.eu/ns/soya-context.json",
    "@base": "https://soya.data-container.net/Foaf/",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "@graph": [
    {
      "@id": "Agent",
      "@type": "owl:Class",
      "subClassOf": "foaf:agent"
    },
    {
      "@id": "Person",
      "@type": "owl:Class",
      "subClassOf": [
        "Agent"
      ]
    },
    {
      "@id": "firstName",
      "@type": "owl:DatatypeProperty",
      "domain": "Person",
      "range": "xsd:string"
    },
    {
      "@id": "lastName",
      "@type": "owl:DatatypeProperty",
      "domain": "Person",
      "range": "xsd:string"
    },
    {
      "@id": "did",
      "@type": "owl:DatatypeProperty",
      "domain": "Person",
      "range": "xsd:string"
    },
    {
      "@id": "Agent"
    },
    {
      "@id": "gender",
      "comment": {
        "en": [
          "The gender of this Agent (typically but not necessarily male or female)."
        ]
      }
    },
    {
      "@id": "birthday",
      "comment": {
        "en": [
          "The birthday of this Agent."
        ]
      }
    },
    {
      "@id": "made",
      "comment": {
        "en": [
          "Something that was made by this agent."
        ]
      }
    },
    {
      "@id": "age",
      "comment": {
        "en": [
          "The age in years of some agent."
        ]
      }
    },
    {
      "@id": "OverlayAnnotation",
      "@type": "OverlayAnnotation",
      "onBase": "Agent",
      "name": "AgentAnnotationOverlay"
    },
    {
      "@id": "Person"
    },
    {
      "@id": "firstName",
      "comment": {
        "en": [
          "The first name of a person."
        ]
      }
    },
    {
      "@id": "lastName",
      "comment": {
        "en": [
          "The last name of a person."
        ]
      }
    },
    {
      "@id": "did",
      "comment": {
        "en": [
          "Identifier with keys and service endpoints"
        ]
      }
    },
    {
      "@id": "OverlayAnnotation",
      "@type": "OverlayAnnotation",
      "onBase": "Person",
      "name": "PersonAnnotationOverlay"
    },
    {
      "@id": "firstName",
      "rdfs:subPropertyOf": [
        "foaf:givenName"
      ]
    },
    {
      "@id": "lastName",
      "rdfs:subPropertyOf": [
        "foaf:familyName",
        "foaf:surname"
      ]
    },
    {
      "@id": "OverlayAlignment",
      "@type": "OverlayAlignment",
      "onBase": "Person",
      "name": "PersonAlignmentOverlay"
    },
    {
      "@id": "PersonShape",
      "@type": "sh:NodeShape",
      "sh:targetClass": "Person",
      "sh:property": [
        {
          "sh:path": "firstName",
          "sh:maxCount": 1,
          "sh:maxLength": 30
        },
        {
          "sh:path": "lastName",
          "sh:minCount": 1,
          "sh:maxCount": 1,
          "sh:maxLength": 40
        }
      ]
    },
    {
      "@id": "OverlayValidation",
      "@type": "OverlayValidation",
      "onBase": "Person",
      "name": "PersonValidationOverlay"
    }
  ]
}
close

2.1.1. Meta

The meta section in a structure defines the name of the structure, specifies an optional context (default: https://ns.ownyourdata.eu/soya/soya-context.json), and allows to reference namespaces (exiting ontologies).

2.1.2. Base

A base declares a dataset and holds the following information:

When a base is represented in RDF it is a class with one or more properties. Each property itself is a single data type or an object property referencing another base. subClassOf optionally allows to inherit properties/attributes from other existing classes. Multiple bases can be combined in a structure for related concepts.

An example base.

use soya template base to show this example on the command line

meta:
  name: Person

content:
  bases: 
    - name: Person
      attributes:
        name: String
        dateOfBirth: Date
        age: Integer
        sex: String
close

use soya template base | soya init to show this example on the command line

{
  "@context": {
    "@version": 1.1,
    "@import": "https://ns.ownyourdata.eu/ns/soya-context.json",
    "@base": "https://soya.data-container.net/Person/"
  },
  "@graph": [
    {
      "@id": "Person",
      "@type": "owl:Class",
      "subClassOf": "Base"
    },
    {
      "@id": "name",
      "@type": "owl:DatatypeProperty",
      "domain": "Person",
      "range": "xsd:string"
    },
    {
      "@id": "dateOfBirth",
      "@type": "owl:DatatypeProperty",
      "domain": "Person",
      "range": "xsd:date"
    },
    {
      "@id": "age",
      "@type": "owl:DatatypeProperty",
      "domain": "Person",
      "range": "xsd:integer"
    },
    {
      "@id": "sex",
      "@type": "owl:DatatypeProperty",
      "domain": "Person",
      "range": "xsd:string"
    }
  ]
}
close


close

2.1.3. Overlay

Overlays provide addtional information for a defined base. This information can either be directly included in a structure together with a base or is provided independently and linked to the relevant base. The following types of overlays are pre-defined in default context (https://ns.ownyourdata.eu/soya/soya-context.json):

It is possible to create additional overlay types by using another context.

An overlay example.

use soya template annotation to show this example on the command line

with soya template --help all available overlay examples are displayed

meta:
  name: PersonAnnotation

content:
  overlays: 
    - type: OverlayAnnotation
      base: https://soya.data-container.net/Person
      name: PersonAnnotationOverlay
      class: 
        label: 
          en: Person
          de: 
            - die Person
            - der Mensch
      attributes:
        name: 
          label: 
            en: Name
            de: Name
        dateOfBirth: 
          label: 
            en: Date of Birth 
            de: Geburtsdatum
          comment: 
            en: Birthdate of Person
        sex: 
          label: 
            en: Gender
            de: Geschlecht
          comment: 
            en: Gender (male or female)
            de: Geschlecht (männlich oder weiblich)
close

use soya template annotation | soya init to show this example on the command line

{
  "@context": {
    "@version": 1.1,
    "@import": "https://ns.ownyourdata.eu/ns/soya-context.json",
    "@base": "https://soya.data-container.net/PersonAnnotation/"
  },
  "@graph": [
    {
      "@id": "https://soya.data-container.net/Person",
      "label": {
        "en": [
          "Person"
        ],
        "de": [
          "die Person",
          "der Mensch"
        ]
      }
    },
    {
      "@id": "name",
      "label": {
        "en": [
          "Name"
        ],
        "de": [
          "Name"
        ]
      }
    },
    {
      "@id": "dateOfBirth",
      "label": {
        "en": [
          "Date of Birth"
        ],
        "de": [
          "Geburtsdatum"
        ]
      },
      "comment": {
        "en": [
          "Birthdate of Person"
        ]
      }
    },
    {
      "@id": "sex",
      "label": {
        "en": [
          "Gender"
        ],
        "de": [
          "Geschlecht"
        ]
      },
      "comment": {
        "en": [
          "Gender (male or female)"
        ],
        "de": [
          "Geschlecht (männlich oder weiblich)"
        ]
      }
    },
    {
      "@id": "OverlayAnnotation",
      "@type": "OverlayAnnotation",
      "onBase": "https://soya.data-container.net/Person",
      "name": "PersonAnnotationOverlay"
    }
  ]
}
close

2.2. Semantic Web Standards Adoption

2.2.1. Data Model: RDFS/OWL

A SOyA structure is designed to comply with the semantic web standards (i.e., RDFS/OWL) for data model representations. The goal is to ensure compatibility and reusability of SOyA data and data models with the established semantic web technology stack, e.g., SOLID and schema.org, as well as opening up the possibility to use relevant tools and methods, e.g., SHACL, RML, Triplestores. Furthermore, this decision would allow for higher interoperability with the currently available linked (open) data.

2.2.2. Data Serialization: JSON-LD

We chose [JSON-LD] as the default serialization in SOyA for the following reasons:

i) JSON-LD status as a W3C recommendation ensures a stable standard for the foreseeable future ii) Rich supports of tools iii) Ease of use by both developers and knowledge engineers iv) Compatibility to RDF

Furthermore, tools supporting JSON data manipulation and visualizations are widely available.

With the above stated it is also important to note that it is possible to support with SOyA also other serialization formats like Turtle, N-Quads, or even Labeled Property Graphs.

2.3. SW Stack

SOyA integrates with a number of established tools to provide its functionalities.

2.3.1. SHACL

SHACL (Shapes Constraint Language) is a language for validating RDF graphs against a set of conditions - find more information here. It is used in Validation overlays.

Note: use the online SHACL Playground to test your validations

A validation overlay example with SHACL notation.

use soya template validation to show this example on the command line

meta:
  name: PersonValidation

content:
  overlays: 
    - type: OverlayValidation
      base: https://soya.data-container.net/Person
      name: PersonValidationOverlay
      attributes:
        name: 
          cardinality: "1..1"
          length: "[0..20)"
          pattern: "^[a-z ,.'-]+$"
        dateOfBirth:
          cardinality: "1..1"
          valueRange: "[1900-01-01..*]"                    
        age: 
          cardinality: "0..1"
          valueRange: "[0..*]"
        sex:
          cardinality: "0..1"
          valueOption:
            - male
            - female
            - other
close

use soya template valiation | soya init to show this example on the command line

{
  "@context": {
    "@version": 1.1,
    "@import": "https://ns.ownyourdata.eu/ns/soya-context.json",
    "@base": "https://soya.data-container.net/PersonValidation/"
  },
  "@graph": [
    {
      "@id": "PersonShape",
      "@type": "sh:NodeShape",
      "sh:targetClass": "https://soya.data-container.net/Person",
      "sh:property": [
        {
          "sh:path": "name",
          "sh:minCount": 1,
          "sh:maxCount": 1,
          "sh:maxLength": 19,
          "sh:pattern": "^[a-z ,.'-]+$"
        },
        {
          "sh:path": "dateOfBirth",
          "sh:minCount": 1,
          "sh:maxCount": 1,
          "sh:minRange": {
            "@type": "xsd:date",
            "@value": "1900-01-01"
          }
        },
        {
          "sh:path": "age",
          "sh:maxCount": 1
        },
        {
          "sh:path": "sex",
          "sh:maxCount": 1,
          "sh:in": {
            "@list": [
              "male",
              "female",
              "other"
            ]
          }
        }
      ]
    },
    {
      "@id": "OverlayValidation",
      "@type": "OverlayValidation",
      "onBase": "https://soya.data-container.net/Person",
      "name": "PersonValidationOverlay"
    }
  ]
}
close

2.3.2. jq

jq is a lightweight and flexible command-line JSON processor - find more information here. It can be used in Transformation overlays.

Note: use the online jq playground to test your jq transformation

A transformation overlay example using jq.

use soya template transformation.jq to show this example on the command line

meta:
  name: PersonA_jq_transformation

content:
  overlays: 
    - type: OverlayTransformation
      name: TransformationOverlay
      base: https://soya.data-container.net/PersonA
      engine: jq
      value: |
        .["@graph"] | 
        {
          "@context": {
            "@version":1.1,
            "@vocab":"https://soya.data-container.net/PersonB/"},
          "@graph": map( 
            {"@id":.["@id"], 
            "@type":"PersonB", 
            "first_name":.["basePerson:firstname"], 
            "surname":.["basePerson:lastname"], 
            "birthdate":.["basePerson:dateOfBirth"], 
            "gender":.["basePerson:sex"]}
          )
        }
close

use soya template transformation.jq | soya init to show this example on the command line

{
  "@context": {
    "@version": 1.1,
    "@import": "https://ns.ownyourdata.eu/ns/soya-context.json",
    "@base": "https://soya.data-container.net/PersonA_jq_transformation/"
  },
  "@graph": [
    {
      "@id": "https://soya.data-container.net/PersonATransformation",
      "engine": "jq",
      "value": ".[\"@graph\"] | \n{\n \"@context\": {\n \"@version\":1.1,\n \"@vocab\":\"https://soya.data-container.net/PersonB/\"},\n  \"@graph\": map( \n {\"@id\":.[\"@id\"], \n \"@type\":\"PersonB\", \n \"first_name\":.[\"basePerson:firstname\"], \n \"surname\":.[\"basePerson:lastname\"], \n \"birthdate\":.[\"basePerson:dateOfBirth\"], \n \"gender\":.[\"basePerson:sex\"]}\n )\n}\n"
    },
    {
      "@id": "OverlayTransformation",
      "@type": "OverlayTransformation",
      "onBase": "https://soya.data-container.net/PersonA",
      "name": "TransformationOverlay"
    }
  ]
}
close

2.3.3. Jolt

Jolt is a library for JSON to JSON transformation where the "specification" for the transform is itself a JSON document - find more information here. It can be used in Transformation overlays.

Note: use the online Jolt Transformation Demo to test your jolt transformation

A transformation overlay example using Jolt.

use soya template transformation.jolt to show this example on the command line

meta:
  name: PersonA_jolt_Transformation

content:
  overlays: 
    - type: OverlayTransformation
      name: TransformationOverlay
      base: https://soya.data-container.net/PersonA
      engine: jolt
      value:
        - operation: shift
          spec: 
            "\\@context":
              "\\@version": "\\@context.\\@version"
              "#https://soya.data-container.net/PersonB/": "\\@context.\\@vocab"
            "\\@graph": 
              "*": 
                "#PersonB": "\\@graph[#2].\\@type"
                "\\@id": "\\@graph[#2].\\@id"
                "basePerson:firstname": "\\@graph[#2].first_name"
                "basePerson:lastname": "\\@graph[#2].surname"
                "basePerson:dateOfBirth": "\\@graph[#2].birthdate"
                "basePerson:sex": "\\@graph[#2].gender"
close

use soya template transformation.jolt | soya init to show this example on the command line

{
  "@context": {
    "@version": 1.1,
    "@import": "https://ns.ownyourdata.eu/ns/soya-context.json",
    "@base": "https://soya.data-container.net/PersonA_jolt_Transformation/"
  },
  "@graph": [
    {
      "@id": "https://soya.data-container.net/PersonATransformation",
      "engine": "jolt",
      "value": [
        {
          "operation": "shift",
          "spec": {
            "\\@context": {
              "\\@version": "\\@context.\\@version",
              "#https://soya.data-container.net/PersonB/": "\\@context.\\@vocab"
            },
            "\\@graph": {
              "*": {
                "#PersonB": "\\@graph[#2].\\@type",
                "\\@id": "\\@graph[#2].\\@id",
                "basePerson:firstname": "\\@graph[#2].first_name",
                "basePerson:lastname": "\\@graph[#2].surname",
                "basePerson:dateOfBirth": "\\@graph[#2].birthdate",
                "basePerson:sex": "\\@graph[#2].gender"
              }
            }
          }
        }
      ]
    },
    {
      "@id": "OverlayTransformation",
      "@type": "OverlayTransformation",
      "onBase": "https://soya.data-container.net/PersonA",
      "name": "TransformationOverlay"
    }
  ]
}
close

2.3.4. Semantic Container

Semantic Container are transient data stores and provide interoperability and traceability features. For SOyA Semantic Containers provide the framework to store instances (data records associated with a structure through a schema DRI) and host the SOyA form feature for editing instances - find more information here.

3. Features

3.1. Authoring

SOyA as authoring platform for data models allows to describe a dataset using a simple notation in YML. Listing attributes and data types in an easy and human-readable form and providing meta attributes defines a base (data model). Additionally, a number of optionally associated overlays can define specific behaviour. This input (YML) is then transformed using the SOyA CLI into JSON-LD for a standards-based representation.

Specific Authoring Functions:

3.2. Publishing

An important aspect of SOyA is the collaboration on developing data models. A repository to host SOyA structures is an integral part of the workflow and the SOyA CLI provides a number of functions to interact with a repository. A default, public repository is hosted at soya.ownyourdata.eu with the OpenAPI Specification available. Private repositories can be hosted using sources on Github or pre-built Docker images.

Specific Publishing Functions:

3.3. Acquisition

The acquisition feature allows to transform flat JSON data into linked data (in JSON-LD) based on matching attribute names.

An acquisition example.

The following record is an example flat JSON to be transformed into linked data - also availabel via curl https://playground.data-container.net/cfa.
Run the following command to test acquire:
curl https://playground.data-container.net/cfa | soya acquire Employee

{
    "name": "Christoph Fabianek",
    "dateOfBirth": "1977-07-21",
    "gender": "male",
    "employer": {
        "Company": "OwnYourdata",
        "country": "Austria"
    }
}
close

The following base from the SOyA structure Employee is used in the example.
Run the following command to test acquire:
curl https://playground.data-container.net/cfa | soya acquire Employee

meta:
  name: Employee

content:
  base:
    - name: Employee
      attributes:
        name: String
        dateOfBirth: Date
        gender: String
        employer: Company
    - name: Company
      attributes:
        company: string
        country: string
close

The output of the acquire in the example.
Run the following command to test acquire:
curl https://playground.data-container.net/cfa | soya acquire Employee

{
  "@context": {
    "@version": 1.1,
    "@vocab": "https://soya.data-container.net/Employee/"
  },
  "@graph": [
    {
      "name": "Christoph Fabianek",
      "dateOfBirth": "1977-07-21",
      "gender": "male"
    }
  ]
}
close

3.4. Validation

Through validation a given JSON-LD record (or an array of records) can be validated against a SOyA structure that includes an validation overlay. (Currently, \SHACL (Shapes Constraint Language) is used in validation overlays.)

An example command to perform a validation.
curl -s https://playground.data-container.net/cfa | soya acquire Employee | soya validate Employee

3.5. Transformation

Transformations allow to convert a JSON-LD record (or an array of records) with a well-defined data model (based on a SOyA structure) into another data model using information from a tranformation overlay. (Currently, jq and Jolt are available engines for transformation overlays.)

An example command to perform a transformation.
curl -s https://playground.data-container.net/PersonAinstance | soya transform PersonB

3.6. Forms

Based on SOyA structures forms can be generated automatically, allowing for visulization and editing of data records. While basic editing functionality relies on SOyA bases only, more complex forms can be achieved by providing additional overlays, like validation overlays for enhanced form validation. Furthermore, annotation overlays allow for internationalization of SOyA forms, providing multi-language display. Finally, a form overlay provides extensive configuraiton to fine-tune arranging and displaying controls.

SOyA forms are based on the JSON-Forms framework and therefore come with adapters for different UI libraries and frameworks, allowing for easy integration in already existing projects.

An example command to generate a form for a SOyA structure.
soya pull Employee | soya form

4. Tools

4.1. soya-js Library

soya-js provides interfaces in JavaScript for handling SOyA structures and interacting with SOyA respositories.

4.2. Command Line Tool

The Command Line Tool provides a set of utilities for handling SOyA structures and interacting with SOyA respositories.

soya-cli is built on top of soya-js and exposes most of its features as commands on the command line. In addition there are features like:

4.3. Repository

The SOyA Repository is a storage for SOyA structures and provides the follwowing functionalities:

Further information:

5. Reference Implementation

Work in progress as part of a research project funded by the “IKT der Zukunft” program from the Federal Ministry for Transport, Innovation and Technology in Austria – FFG Projekt 887052.

The following implementation artefacts are available (published under the open source MIT License):

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[JSON-LD]
Gregg Kellogg; Pierre-Antoine Champin; Dave Longley. JSON-LD 1.1. W3C Recommendation. URL: https://www.w3.org/TR/json-ld/
[MULTIBASE]
IETF Multibase Data Format specification. URL: https://tools.ietf.org/html/draft-multiformats-multibase
[MULTIHASH]
Multihash - protocol for differentiating outputs from various well-established cryptographic hash functions. URL: https://github.com/multiformats/multihash