Building domain understanding into Search

A search engine could be described as a program utilized for collecting and organizing content in accordance with the queries of users and provided input keywords. The engine carries out an internet search to find desirable information presented as a mix of links to articles, research papers, videos, images, and all other types of web pages and files. The collected links are ranked following the collected pages’ relevance, where the ranking processes may vary significantly across different search engines. A generated result of the search engine is usually called a Search Engine Results Page (SERP).

How Do Search Engines Work?

In its basic form, a search engine is characterized by an interface and criteria for answering search queries by presenting relevant extracted information. When a user exploits a search engine, they search a database generated and indexed by the involving engine tools. From that perspective, most search engines are built from three major functionalities: crawling, indexing, and ranking

A crawling process is based on using web crawler computer programs for collecting data and indexing web resources. Crawlers are opening new links from previously examined web pages, methodically inspecting new pages, and discovering their content and codes. Finally, they return novel information to a database, and the indexing tool becomes active. Indexing is used to organize and store the content after the crawling process. When the page is validated and indexed, it could be displayed as a result of a search query through a search bar. In order to keep a database up to date, crawling and indexing processes are continuous and iterative. The third basic function, ranking, provides ordered results from the perspective of relevance for a search query using multiple ranking signals. The best results are selected, and the result page with a list of the most relevant target pages is generated. If all three functionalities are properly built within an engine system, it will be capable of analyzing the meaning of a query, assessing the relevance and the quality of indexed content, and examining pages’ functionalities.

Vertical or Domain-based Search

A vertical, topical, or domain-based search engine is a customizable, user-friendly search engine focused on a single specific industry or area of expertise. It is designed to fulfill users’ unique needs and help them achieve their individual search goals. Such engines could be used to formally represent any domain knowledge, and nowadays, they are finding application in almost every expert knowledge database. A vertical search engine uses specially tailored focused web crawlers that index only pages to the specific topic. Crawling processes in domain-specific engines are significantly efficient, keeping in mind that only a pre-defined subset of data should be processed with them. This setup provides greater search precision, faster reaching a search destination, and efficient data manipulation. Finally, vertical search engines make unique relationships between involving concepts and deliver highly relevant search results. A widely popular approach in working with domain-specific knowledge is encoding knowledge in the form of concepts (ontologies).

What are Ontologies or Concept Graphs?

Ontologies represent efficient tools for organizing and automatizing knowledge. They are a formalized way of organizing a single area of domain-specific knowledge, defining relationships and concepts logically and consistently. A knowledge representation is reusable thanks to ontologies, providing an optimal environment for adding new assumptions and expertise within the domain. As an overall fact, it should be stated that ontologies enhance data quality and improve problem-solving mechanisms in the field.

The manner in which ontologies work can be most practically explained on a human brain’s operational principles. The way a brain perceives independent and mutually linked concepts is similar to how ontologies reason and work with established relationships. To be widely useful and universally applicable, languages exploited by various information resources should possess a unified dictionary agreed by all relevant experts. Formalization principles of exploiting ontologies are commonly provided using a description logic language called Web Ontology Language (OWL). OWL is defined with strictly formulated semantics explaining how OWL statements constrain and define the environment of interest.

Three types of ontologies can be distinguished: domain, upper, and hybrid ontologies. Domain ontologies are types of ontologies that model definitions of domain-specific terms. They are created by different authors, describe concepts in unique ways, and could be easily incompatible within a single project due to different domain perceptions, usages, languages, etc. Upper (foundation) ontologies are models of popular and commonly used relations that are widely applicable within various independent domain ontologies. Upper ontologies are based on a universal core list of terms and descriptions that are applicable to an unlimited number of different use cases. The last one, a hybrid ontology, is a combination of a domain and an upper ontology representing their mix of features and properties.

What is the need for a Concept Graph?

Actively reusing and updating domain knowledge is the primary motivation for the development of ontologies. Once an ontology is fully developed in an iterative process, users can reuse it unlimitedly and adjust it according to the requirements of other unique domains of interest. If knowledge about the domain is changed or extended, it is possible to make proper modifications and update existing assumptions. When crucial relationships between domain concepts are defined, automated reasoning is enabled, and navigation through a concept is provided. The overall established logic offers the capability to extend the ontology’s size, where the model increases with the data growth. Detail specifications of knowledge are also helpful for all novice users to learn the meaning of the domain terms and all relevant assumptions. Ontologies can be successfully used to represent any data format (structured, semi-structured, and unstructured), finding their applicability in almost every community.

Why is a Concept Graph so hard to build?

The real challenge in building concept graphs or ontologies is to describe scientific and expert terms and assumptions within a domain-specific field in a way to enable computers to search the web efficiently, find relevant data, and present it to a user in an optimal manner. Besides all the benefits of using ontologies, it should also be stated what identified limitations are and why developing ontologies could be difficult and time-consuming.

First, it should be highlighted that the development and maintenance of an ontology is not the job of a single or a few developers but the overall community. Experts could propose an initial ontology, but it is required that the ontology content grows as other experts from the field and regular users define new assumptions and find deficiencies in current substructures. However, a user contribution can also be a big potential problem for an ontology’s life. Suppose a user poorly defines terms and relationships and implements these flawed assumptions to the ontology. In that case, he can jeopardize the work of complete ontology and neglect experts’ initial work.

Further, assume an ontology represents some novel research field that is not researched entirely so far. In that case, such ontology could possess systematic gaps within the structure and cause significant unreliability and unusability.

When the size of an ontology is large, a problem of navigating within the ontology and the search process across multiple related ontologies may occur.

Another difficulty in building an ontology is a lack of universal solutions to populate ontologies. As another important feature in the life of every ontology, debugging is one of the most challenging processes — keeping in mind that a lack of instructions on how to fix occurred errors is a common cause. It is particularly the case for large ontologies where it could also be difficult to identify why a single hypothesis is not operating as required. One more difficulty in working with ontologies is a lack of formal mechanisms for evaluation processes.

Finally, it is not easy to confirm if an ontology represents the domain knowledge as expected and if it can be used to correctly and reliably answer searched questions.

This post described the basic facts and information of domain-based search engines and the challenges that come with developing that domain understanding. This introduction will be further extended in the following posts explaining how Nesh builds and leverages domain understanding using unstructured data and AI.

In the meantime, if you are curious about how Nesh works and how it can help you in finding technical and business-focused answers within your documents, you can connect with our experts & request a demo here —

https://hellonesh.io/request-demo/

Creator of the AI Assistant for Search and Analytics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store