semantic technology core
GNOSS semantic artificial intelligence platform. The semantic technology core
Semantic Technology and Artificial Intelligence
Semantic Technology is at the heart of our systems’ abilities to understand human reasoning. In turn, our systems are the foundation for developing Cognitive Solutions that today form the core of recently revived Artificial Intelligence programmes.
In a strong comeback after a long period in hibernation, Semantic Technology has become a key component of Artificial Intelligence, not to mention other Big Data-based applications. Within this new paradigm, Semantic Technology is at the vanguard of today’s technological renaissance. It allows systems to understand the way that people reason and how we connect information. This technology simultaneously bolsters users’ ability to understand texts and audio files, as well as to discover a whole new world of relationships between the data and entities contained within them.
Semantic Technology meaningfully consolidates all of this widely-dispersed information by integrating it into a Knowledge Graph that can be queried. The Graph meaningfully relates entities to entities as well as to their attributes within the framework of a given sphere of knowledge. A Knowledge Graph is the condition that makes a system intelligent.
Technology and semantic standards. Machines that understand
Ontologies and controlled vocabularies
The primary meaning of ontology (from the Greek οντος, “pertaining to being or what exists” and λóγος “science, study, theory”) is a branch of metaphysics occupied with the study of what is. In his writing, Aristotle named metaphysics as what was, literally, after physics. This realm of knowledge corresponded to the study of what he called first philosophy, or ontology, meaning the study of what is, is real, or exists.
Ontologies
The primary meaning of ontology (from the Greek οντος, “pertaining to being or what exists” and λóγος “science, study, theory”) is a branch of metaphysics occupied with the study of what is. In his writing, Aristotle named metaphysics as what was, literally speaking, after physics. This realm of knowledge corresponded to the study of what he called first philosophy, or ontology, meaning the study of what is, is real, or exists.
Ontology is not only concerned with entities or the things that exist, but also with the way in which the entities that exist relate to one another. This idea of a science of entities and their relationships started to be used in the formal language of computational and information sciences to designate a set of definitions for classes, types, attributes, properties and relationships among entities that act within a specified domain of reality and knowledge. Indeed, this is a practical application of the concepts of philosophical ontology.
What the philosophical and the IT perspectives have in common is the representation of entities, ideas and events, together with their properties and relationships, according to a categorisation system. IT professionals, however, concentrate their efforts on “closing” ontologies by representing them with controlled vocabularies that can be utilised by computer systems.
In this sense, an ontology is a set of individuals (instances or objects); classes (sets, collections, concepts, types of objects or kinds of things); attributes (aspects, properties, features, characteristics or parameters that objects and classes can have); relations (way in which classes and individuals may be related to each other); functions (complex structures formed from certain relations that can be used in place of an individual term in a statement); restrictions (these establish formal descriptions of what must be true in order for some assertion to be accepted as input); rules (statements in the form of an if-then (antecedent-consequent) sentence that describe the logical inferences that can be drawn from an assertion in a particular form); axioms (assertions, including rules, in a logical form that together comprise the overall theory that the ontology describes in its domain of application) and events (the changing of attributes or relations).
Ontologies are usually encoded using standard ontology languages such as Ontology Web Language (OWL), that allow classes and their sets of attributes to be described. For example, the class “person” and its set of attributes: having a name and surname, a place of birth, a date of birth, etcetera. The Resource Data Format (RDF) is a file that identifies a specific individual from a class such as, “Diego de Silva y Velázquez”, born in Seville, on such and such a date, who was a painter, and so on.
Domain ontologies
A domain ontology (or domain-specific ontology) represents concepts belonging to a specific part of our world reality and may hence be considered to involve highly specialised knowledge. Similarly to the way in which ontological objectives of information sciences and technology aspire to close and control vocabularies as much as possible, domain-specific ontologies are the natural consequence of efforts to represent and compute digital resource content. In this sense, the individual meaning of a term within a domain is given by the ontology’s domain.
Because concept-based domain ontologies represent their concepts in a very specific way, the ontologies do not tend to overlap, or rather, they may be mutually exclusive.
Ontological hybridization
On the other hand, reality as a whole has a notable tendency toward continuity, and the domains in which the world is organised tend to be more mixed than our controlled vocabularies. This is why world systems, like any organisation or human institution, require expanded domain ontologies or hybrid ontologies; these are the result of the blending and integration of different domain ontologies in a more general representation. Hybrid ontologies require the design of an upper ontological foundation on top of the foundation of controlled vocabularies that are built from different ideas originating from all over the world, sometimes in different languages. These always originate from distinct or external cultural sites.
Mixing ontologies is an artistic process that attempts to digitalise a domain or sector of reality that exceeds its controlled vocabularies’ capacity for representation.
Hyperdata and Semantic Search Engines
Relations among texts on a website have evolved toward networks of data linked in a Knowledge Graph. Machines are able to understand our human world of entities.
Hyperdata
Hyperdata refers to the means by which a dataset is linked to other datasets housed in other locations or information silos, much in the same way that hypertext indicates the relationship between texts scattered throughout the internet. Hyperdata strategies make it possible to condense data into a “network of data”, also known as a Knowledge Graph, the name for the ensemble of data linked using a hyperdata strategy.
A hyperdata link always refers to an entity and, in fact, names it. It may, for example, refer to a [physical thing], such as an [artwork] (“Las Meninas”, for example); to a person (Velázquez, the artist of the aforementioned work); or even an exhibit in which this work has been displayed, the restorations or changes it has seen over time, or a description of its elements (the people, fauna, flora or places represented...).
A hypertext link indicates that there is a connection between two documents; a hyperdata link goes further and expressly marks the semantic relationship of a specific connection class. In other words, thanks to hyperdata, systems are able to know and process the relations between the entities that link two documents, thereby making it even easier for people to recognise them. Unlike hypertext-based strategies, strategies based on hyperdata do not leave it to human beings to resolve the problem of recognising the significant relationships within a set of connected resources. Because hyperdata allow systems to process this class of relations between entities, people are able to query and interpret vast quantities of information that are meaningfully linked in a graph by the systems.
Semantic Search Engines and Knowledge Graphs
A semantic search engine could technically be defined as a search engine that traces hyperdata links. In practice, a set of hyperdata links within an ensemble of resources constitutes a knowledge graph. It follows that a semantic search engine is a search engine that makes it possible to navigate a knowledge graph.
As the front of a sheet of paper is to its back, so the architecture of the Semantic Web is to the format of a web document (usually HTML). These documents are what web crawlers such as Google, among others, traditionally use; this is where they search. A semantic search engine based on hyperdata needs RDF files because these are the means by which the Semantic Web represents entities and, consequently, enable their navigation.
A Knowledge Graph based on hyperdata makes it possible to perform conversational searches using natural language. For example, a set from the graph or hyperdata can be restricted by querying only the hyperdata that meet a given condition, such as, in the case of the Prado Museum graph, having been painted in a certain year, or belonging to a particular school of art. Furthermore, a search engine that uses hyperdata is able not only to restrict the scope of its search, but also to process the exact number of relationships for a specified set of resources, and their classes. This type of semantic search is called a faceted search with summarization. A search of this type additionally allows for queries or restrictions to be aggregated or iterated, thus emulating the manner in which people naturally reason. In the previous example, a second layer could be added to the results generated by our query for artwork in the Prado that belonged to a certain time period and school; the depiction of a certain theme could be queried, for example hunting, or a specific object, a shotgun, or perhaps a given animal, let’s say a dog. What would finally be sought in this case are pieces of art at the Prado that deal with hunting in which shotguns and dogs are additionally shown, the artwork being from a certain period, and in this example, from Spain. Insofar as machines are able to understand the world of entities used by people, they restrict the number of results generated; the answers to our questions thereby become precise and semantically relevant.
Open Data and Linked Open Data - Open Data
According to the definition of the Open Data Handbook, open data is data that can be used, reused and redistributed freely by anyone, and that is subject, at most, to the requirement of attribution and to be shared in the same way in which it appears. We can summarize the complete opening of the data with the following characteristics:
- Availability and access: the information must be available in a convenient and modifiable form, as a whole and at a reasonable cost of reproduction, preferably by downloading it from the internet.
- Reuse and redistribution: the terms of use of the data should allow it to be reused and redistributed, and even integrate it with other data sets.
- Universal participation: everyone must be able to use, reuse and redistribute information, without discrimination in terms of effort, people or groups; no "non-commercial" or usage restrictions.
Knowledge Graphs and Linked Data
By providing us with reliable recommendation systems, machines collaborate with people as we seek and retrieve information, discover knowledge, and learn.
Linked Data
Tim Berners-Lee is the man behind the concept of the Semantic Web. He coined the term in 2001 in his seminal article “Semantic Web”, published in the Scientific American. He is also the author of the concept of Linked Data, which he developed based on a design note regarding the construction of the Semantic Web project.
Linked Data is the name of a structured method of publishing data that, in practice, is what makes navigating with hyperdata and Knowledge Graph construction possible. This publication method enables people to query the data of a graph semantically.
In order to publish according to the principles of the Linked Data Web, standards such as HTTP, RDF and URIs must be used, not so much for the purpose of displaying the pages that people read as for editing the pages so that they can be interpreted automatically by systems, and therefore share information. This is what makes it possible to connect data from various sources in a unified, searchable graph.
At GNOSS, from the very beginning, the Museo del Prado Online project aimed to integrate all the Museum’s resources into a Knowledge Graph. The goal of the project was to build a new presence for the Museum online, to improve users’ experience during their interaction with the Museum’s resources, and to integrate and link the Museum’s complete production into a unified graph. To put it another way: to convert all the data from all its systems into hyperdata. It has been demonstrated that this focus has had a significant impact on the way the Museum operates because it directly connects processes of creation and knowledge generation with publication and knowledge discovery processes. Focus is placed on the use of the Museum’s data for the improvement of its own processes and not solely on their reuse by third parties.
Likewise, when one transcends the perspective of publishing data for presumed reuse and adopts the vision of developing utilities for diverse audiences, including groups of interest to the institution itself, the result can significantly transform the production and consumption models for the materials that comprise institutional heritage and knowledge.
What is a graph?
Graph, in Greek, means “drawing”. From a technical perspective, “graph” in maths and computation sciences refers to a set of objects called vertices and nodes connected by arcs and edges, which represent the relationships between the elements of a set. The word “graph” is used to describe the means by which these mathematical objects are frequently represented as a set of points (vertices) united by lines (edges). Graph theory concerns the study of this mathematical structure.
When theory is applied into practice, graphs are what allow the relationships between units to be studied. A network of computers is one example, as is the set of implicit relationships between the books in library, works of art in a museum, a set of scientific articles, or a given set of clinical trials.
Graph theory allows practical applications and exploitations to be represented, formalised and developed for an extremely wide set of problems.
Knowledge Graphs
A Knowledge Graph is a system that represents a set of digital resources and content that, based on an ontological model, understand facts related to the knowledge objects or entities within a specific knowledge area, and in particular, understand the method by which this set of entities is connected. When we say that this system “understands”, we mean that it is written in technical language that enables machines or systems to “understand” and correctly handle the group of entities mentioned in order to provide reliable recommendation systems and to collaborate with people when they query, retrieve information, discover knowledge, and learn.
Knowledge Graphs are a fundamental aspect of artificial intelligence projects. They supply a searchable, cognitive means by which to navigate. Based on the requests of users, they enable inference and suggest new relations or narratives linked with the knowledge areas in question.
Big Semantic Data
Big Data or Big Data, usually, refers to any set of data of great volume and complexity. Such data sets store millions of hidden values that are not available for efficient automatic processing, except by reasoning and inference techniques capable of emulating natural human reasoning, which is always contextual. These techniques are available in our Semantic Big Data projects, which generate knowledge graphs with a number of nodes and relationships that is measured in magnitudes of billions of triples.