Automation in Construction 56 (2015) 14–25 Contents lists available at ScienceDirect Automation in Construction journal homepage: www.elsevier.com/locate/autcon A query expansion method for retrieving online BIM resources based on Industry Foundation Classes Ge Gao a,d, Yu-Shen Liu a,b,c,⁎, Meng Wang a, Ming Gu a, Jun-Hai Yong a a BIM Research Group, School of Software, Tsinghua University, Beijing, China Key Laboratory for Information System Security, Ministry of Education of China, China Tsinghua National Laboratory for Information Science and Technology, China d Department of Computer Science and Technology, Tsinghua University, China b c a r t i c l e i n f o Article history: Received 2 July 2014 Received in revised form 1 April 2015 Accepted 14 April 2015 Available online xxxx Keywords: Building Information Modeling (BIM) Information retrieval Industry Foundation Classes (IFC) Ontology Query expansion Local context analysis (LCA) a b s t r a c t With the rapid popularity of Building Information Modeling (BIM) technology, BIM resources such as building product libraries are growing rapidly on the World Wide Web. As a result, this also increases the difficulty for quickly finding useful BIM resources that are sufficiently close to user's specific needs. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy is often not satisfactory because of the semantic ambiguity of terminologies in BIM-specific documents and queries. To address this issue, we develop a prototype semantic search engine, named BIMSeek, for retrieving online BIM resources. The central work consists of two parts as follows. Firstly, based on Industry Foundation Classes (IFC) which is a major standard for BIM, a domain ontology is constructed for encoding BIM-specific knowledge into the search engine. Using the ontology, term`inologies in BIM documents can be disambiguated and indexed. Secondly, by combining the ontology and local context analysis technique, an automatic query expansion method is presented for improving retrieval performance. Compared with traditional keyword-based methods and WordNet-based query expansion methods, the experimental results demonstrate that our method outperforms them. The search engine is available at http://cgcad.thss.tsinghua.edu.cn/liuyushen/ifcqe/. © 2015 Elsevier B.V. All rights reserved. 1. Introduction Building Information Modeling (BIM) technology has been receiving an increasing attention in the AEC (Architecture, Engineering and Construction) industry [1]. Compared with the traditional Computer Aided Design (CAD) technology, BIM is capable of restoring both geometric and rich semantic information of building models, as well as their relationships, to support lifecycle data sharing. With the rapid popularity of BIM technology in the AEC field, BIM resources such as building product libraries are growing rapidly on the World Wide Web (WWW). For instance, the well-known Autodesk Seek [2] is an online system, which provides a large repository of building products on its website and allows users to find a wide variety of BIM products from manufacturers. It currently carries more than 65,000 commercial and residential building products from nearly 1000 manufacturers, and is still growing daily. BIMobject [3] is another widely visited website, which contains over 450,000 BIM models with the product data and properties. Other online libraries such as National BIM Library ⁎ Corresponding author at: School of Software, Tsinghua University, Beijing 100084, China. Tel.: +86 10 6279 5455; mobile: +86 159 1083 1178. E-mail addresses: [email protected] (G. Gao), [email protected] (Y.-S. Liu), [email protected] (M. Wang), [email protected] (M. Gu), [email protected] (J.-H. Yong). URL: http://cgcad.thss.tsinghua.edu.cn/liuyushen/ (Y.-S. Liu). http://dx.doi.org/10.1016/j.autcon.2015.04.006 0926-5805/© 2015 Elsevier B.V. All rights reserved. [4], Google 3D Warehouse [5], SmartBIM [6] and many active online communities (e.g., RevitCity [7]) also contain a large number of information contents of building products related to BIM models. However, the large amount of online BIM resources also increases the difficulty for quickly finding useful information that are sufficiently close to user's specific needs. In the engineering field, some studies have reported that engineers spent a large amount of time in searching for information [8–10]. In the AEC field, the percentage of time spending on search might be higher. To search online BIM documents quickly and accurately, existing information retrieval approaches should be appropriately adopted for possible improvement. The current practice in information retrieval (IR) mostly relies on keyword-based search methods, which can provide an easy way to quickly retrieve documents. However, search accuracy of traditional keyword-based retrieval models, such as Boolean model, vector space model, or probabilistic model, has been often problematic because of the semantic ambiguity of terminologies in BIM documents and queries. The semantic ambiguity in BIM documents can be alleviated by using a domain ontology. Meanwhile, the user's query can be expanded with domain-specific terms to improve the accuracy of the search result. This paper aims to study the particular problem for retrieving online BIM documents. To achieve this purpose, we develop a prototype semantic search engine, named BIMSeek, by combining the usability of keyword-based interface with automatic query G. Gao et al. / Automation in Construction 56 (2015) 14–25 expansion techniques. The central work consists of two parts as follows. Firstly, based on Industry Foundation Classes (IFC) [11] which is a major standard for BIM, a domain ontology named IFC IR Ontology is constructed for encoding BIM-specific knowledge into the search engine. Using the ontology, terminologies in BIM documents can be disambiguated and indexed. Secondly, by combining the domain ontology and local context analysis (LCA) technique [12,13], an automatic query expansion method is presented for improving retrieval performance. Here, LCA is used to select query expansion terms based on co-occurrence with the query terms within the top-ranked documents. Compared with traditional keyword-based methods and WordNet-based query expansion methods, the experimental results demonstrate that our method outperforms them. The remainder of the paper is organized as follows. Section 2 reviews the related work. Section 3 gives the objective of our study. Section 4 introduces the construction process of IFC-based ontology. Section 5 describes our query expansion algorithm based on the ontology. Section 6 illustrates our search engine and demonstrates the experimental results. Finally, Section 7 concludes the paper and discusses the limitations and future work. 2. Related work Most traditional keyword-based search approaches are based on the vector space model (VSM) [14]. In this model, documents and queries are represented as a vector of term weights and the retrieval is done according to the similarity between these vectors. In essence, such approaches try to derive the meaning of the text from the observable syntactic and statistical behaviors without representing the meaning directly. However, these approaches have the ambiguity problem for retrieving online BIM documents. The main reason for the problem is that BIM documents are different from general documents because of their syntax variations and semantic complexities. For instance, online BIM documents are often organized by different information providers, and it is not easy to formulate the well-designed queries for the endusers. As a result, some documents, which do not contain the query terms, may not be returned to the user even though they are semantically relevant to the given query. To handle such problem, one possible way is to use semantic-based IR approaches, in which the search does not completely rely on exact term matching. Instead, such approaches usually seek to improve retrieval accuracy by attempting to encode the user's searching intent and the contextual meaning of terms in documents. Semantic-based IR approaches could be roughly categorized as either explicit or implicit, according to whether an explicit concept space is defined and utilized. An implicit technique aims to analyze and explore the hidden relationships between a set of documents and the terms, without requiring the concepts defined explicitly. Existing implicit approaches include local context analysis (LCA) [12,13], latent semantic analysis (LSA) [15], probabilistic LSA (pLSA) [16], latent Dirichlet allocation (LDA) [17], etc. In contrary, an explicit technique tries to extract the explicit semantic definitions between words and terms as well as the relationships between them, such as synonym, hyponym and hypernym. These definitions and relationships are usually represented in some form of linked data such as taxonomy, thesaurus or ontology. Existing explicit approaches such as ontology-based query expansion [18], semantic indexing [19], explicit semantic analysis (ESA) [20] and granularity analysis [21] have been developed for improving the semanticbased retrieval. In the last few years, some semantic-based IR techniques have been devoted to the study of information retrieval systems for the engineering field [9,10,22–27]. Many of them dealt with semantic representation or domain ontology construction for applying to various engineering fields. Li et al. [10] developed an engineering ontology in mechanical design and manufacturing, and applied the ontology to index and 15 retrieve unstructured engineering documents and CAD drawings. Lin et al. [23] introduced some text-based information retrieval experiments in the AEC industry, and described an AEC online product search engine by incorporating domain knowledge and IR techniques [24]. Rezgui [27] used domain knowledge to formulate an ontology that assists in indexing and retrieving construction documents in e-COGNOS system. Weissman et al. [26] proposed a computational framework for authoring and searching product design specification documents using semantic mapping. Demian and Balatsoukas [22] investigated the effects of search result interfaces (particularly the aspects of granularity and context) in systems for searching archives of construction documents. Lin et al. [25] presented a passage partitioning approach according to domain ontology, and applied the approach to a conceptbased IR system for engineering domain-specific technical documents. More recently, Hahm et al. [9] introduced a personalized query expansion approach for engineering document retrieval based on a selfconstructed engineering ontology. Among the IR techniques in the engineering field, it is noteworthy that query expansion approaches [18] which aim to overcome the ambiguity of natural language and also the difficulty in using a single term to represent an abstract concept. It is generally conducted by supplementing original queried terms by morphological variations or semantically related terms, so the performance degradation caused by syntax variation and semantic complexity of BIM documents can be overcome with query expansion. Query expansion can be classified as interactive, manual, or automatic approaches according to their strategies [18], where automatic query expansion is an alternative strategy for domain-specific retrieval by employing taxonomy (e.g., WordNet) or domain-specific ontology. Considering the characteristics of BIM documents, automatic query expansion based on BIM-specific ontology is an appropriate strategy because ambiguous and complicated queried keywords can be disambiguated and interpreted by the ontology. Also, terminologies in BIM documents can be disambiguated and indexed with the ontology. As a result, the user's short queries can be expanded and matched to syntax varied and semantically complicated documents. By combining query expansion with domain ontology, there have been several search engines on information retrieval in the engineering domain [9,10,24,25,27]. However, they are still limited to their ontology sources and the needs of particular applications as follows. Firstly, most of existing studies focus on retrieving the controlled collection of engineering documents, rather than Web-centric BIM resources. Secondly, existing methods are mainly based on explicit semantic-based IR techniques, which are limited to the capability of hand-crafted ontology or thesaurus and cannot fully fulfill the retrieval task for dynamically changed Web-based BIM resources. Thirdly, ontologies of existing IR approaches lack the utilization of BIM-specific domain knowledge for online document retrieval. As the commonly used data exchange standard for BIM, Industry Foundation Classes (IFC) [11] led by the buildingSMART, formerly known as International Alliance for Interoperability (IAI), plays a crucial role to facilitate interoperability between various software platforms in the AEC industry. To date, the IFC standard has been widely supported by the market-leading BIM software vendors. As the most widely used taxonomy and specification in BIM applications, the underlying IFC specification is therefore our preferred candidate semantic resource, which provides a sharable skeleton on which the BIM-oriented IR ontology can be built. Recently, several IFC-based ontologies have been studied for particular application needs [28–32]. For instance, Pauwels et al. [31] applied an IFC ontology to semantic rule checking. Beetz et al. [29] presented an approach for converting the IFC schema into the OWL format, which is a remarkable effort to lift the IFC specification onto the ontology level. Zhang and Issa [28] used an IFC ontology to extract partial model from a complete IFC model. In addition, several applications also used IFC ontologies for querying spatial information within a BIM model [32,33]. 16 G. Gao et al. / Automation in Construction 56 (2015) 14–25 The common strategies mentioned in previous studies [28–31] are to utilize their respective ontologies for extracting specific information from an IFC file itself, whereas the currently available IFC files are relatively few on the WWW. In practice, a large number of available online BIM resources consist of BIM components/models (in their native file formats but without providing IFC files), and these models are associated with their online product documents (e.g., descriptions or specifications). Therefore, merely parsing and retrieving IFC files cannot take full advantage of the abundant online BIM resources. In contrast, this paper develops a BIM-specific IR ontology based on the IFC schema, and applies the ontology to retrieve online BIM documents rather than IFC files. To alleviate the insufficiency of hand-crafted ontology, our method combines the explicit semantic technique (i.e., ontologybased query expansion) with the implicit semantic technique (i.e., local context analysis [13]). Consequently, our search engine can cover more widely online BIM resources for information retrieval. 3. Objective and methodology Building product webpage on BIM resource libraries (e.g., [2,4,6]) typically contains BIM models associated with product documents (e.g., specifications and descriptions of the objective products), as shown in Fig. 1. The BIM models are normally in their native file format dependent on various BIM software vendors (e.g., Autodesk Revit, Bentley Architecture and Graphisoft ArchiCAD) or in industry-neutral file format (e.g., IFC/ifcXML). The relevant product documents on the Web are the textual contents for describing BIM models including their functions, dimensions, materials, performances, manufacturers, etc. These product documents are independent of BIM model file format. In particular, much information is embedded in textual BIM documents generated during design and construction phases [22]. Most online BIM documents are unstructured, in contrast to structured contents (e.g., BIM models or database tables) following a strict schema. In order to retrieve online BIM resources, we have developed a prototype search engine, i.e., BIMSeek, which contains two individual modules: document search module and model search module. The former that will be described in this paper focuses on the problem of retrieving online BIM documents by incorporating the domain ontology and query expansion techniques. The latter that is an ongoing work deals with the problem of searching online BIM models by incorporating the same ontology and structured query techniques. Although the two modules in BIMSeek process different contents (documents or models) of online BIM resources, they are all based on the common ontology, named IFC IR Ontology developed in this paper (see Section 4), for the semantic search engine. In this paper, only the hierarchical and enumeration relationships of the ontology are utilized for retrieving BIM documents. However, in the model search module, the properties and restrictions of the same ontology are also utilized for searching BIM models, where BIM models are converted into OWL (Ontology Web Language) instances and the user queries are transformed into SPARQL. Also, it is interesting to make use of document and model information together for improving search accuracy and performance in BIMSeek. In this Fig. 1. Illustration of building product webpages that contain BIM models associated with product documents (including product specification and description). G. Gao et al. / Automation in Construction 56 (2015) 14–25 way, more relevant BIM resources might be retrieved for the user, and we leave this study to our future work. In addition, the IFC ontology built in this paper is also used for another module, named BIMTag [34], as part of BIMSeek. Using the ontology, BIMTag achieved semantic annotation for online BIM documents. The purpose of this study is to use the ontology to process the contextual meaning of terms for retrieving online BIM documents. Based on the ontology, we typically choose automatic query expansion techniques to enhance retrieval performance by reflecting the user's needs. Since automatic query expansion techniques require no effort on the part of the user, they have a significant advantage over manual techniques. Automatic query expansion can be categorized as either global or local [13]. Global techniques rely on analysis of the whole corpus for the source of expansion terms, while local techniques process a small number of top-ranked documents retrieved for a query to expand that query. In information retrieval, query expansion is the first step of semantic search [18,19], which reformulates a given query to increase the likelihood of the term overlap between the query and documents that are likely to be relevant to the user's needs. If used together with an effective word sense disambiguation (WSD) algorithm, query expansion can improve retrieval performance. However, simply using a thesaurus or ontology for automatic query expansion still has some limitations, which could be overcome through using global statistics, such as the document frequency of the query terms, for selecting expansion terms [13]. On the one hand, query expansion without a WSD algorithm or with a poor WSD may cause degradation in retrieval performance, which is mainly caused by the irrelevant terms added to the query. This limitation can be alleviated by combining query expansion with local context analysis (LCA) technique [12,13], where query expansion terms are selected based on co-occurrence with the query terms within the topranked documents. With the combination of LCA, query expansion terms generated from thesaurus or ontology undergo a statistical inspection, and consequently some irrelevant expansion terms won't be used for retrieval. On the other hand, ontology-based query expansion may miss some possible expansion terms, which are not defined in thesaurus or ontology but have statistical dependencies with the user's query in corpus. This limitation can be overcome by computing statistical relevance between terms in the most relevant documents, and then these statistical relevant terms are added to the expansion terms for retrieval. By combining the domain ontology and LCA technique [12,13], this paper introduces an automatic query expansion method to improve retrieval performance for online BIM documents. 4. Development of IFC IR Ontology To achieve the semantic search engine for online BIM documents, the principal work relies on two aspects: (1) how to construct a BIM-oriented ontology for the needs of IR and (2) how to utilize IR techniques to retrieve BIM documents based on the ontology. The former is introduced in this section and the latter will be presented in Section 5. This section first summarizes some basic concepts of the domain ontology, then argues the IFC specification [11] as a semantic foundation for our purpose, and finally introduces a method for developing the IFC IR Ontology. In computer science and information science, an ontology is defined as “formal, explicit specification of shared conceptualization” [35]. Ontologies can be roughly divided into two categories: general ontology and domain ontology. The interest of general ontology is the whole world, while domain ontology focuses on specification of particular domain conceptualization. Although some general ontologies (e.g., WordNet [36]) contain a large number of general concepts, they are not designed for domain-specific retrieval, which may lead to inaccurate description of concepts in the AEC domain. In contrast, domain ontology is a representation of semantics in particular domain, which 17 often consists of a hierarchical description of important concepts precisely defined in the domain, along with description of properties of each concept [37]. Domain ontology is considered as a key element to enhance domain-specific IR [35]. It has a variety of representation forms, among which OWL is selected as the modeling language in this study. With the syntax extension from RDF (Resource Description Framework) — a lightweight representation for data and knowledge, OWL has been proposed by W3C as the ontology language of Semantic Web [38]. There are various methods for constructing domain ontologies, such as TOVE, IDEF5, Skeleton, KACTUS, SEN-SUS, METHONTOLOG and Seven-Step methods. Among them the Seven-Step method [39] is regarded as the mature one. It was developed by the School of Medicine in Stanford University, as well as the most widely used ontology editor Protégé [40]. The Seven-Step method is also the only one which strictly conforms to Gruber's five basic principles of ontology construction [35]. Following the general principles of ontology construction and the Seven-Step method, construction of IFC IR Ontology is essentially a process of conceptualizing and formalizing BIM knowledge from the IFC schema. By following the ontology development process of Seven-Step method, we give the main procedure of constructing IFC IR Ontology as follows. 4.1. Step 1: determining the domain and scope of ontology The first step of Seven-Step method is to define the domain and scope of the ontology. It is an important step for minimizing the amount of data and concepts to be analyzed, especially for the extent and complexity of BIM-related semantics. During the ontology-design process, it may be adjusted if necessary. The ontology discussed in this work mainly meets the needs of information retrieval, which enables architects, engineers and other design professionals to find useful BIM resources quickly. Therefore, it includes specific concepts related to AEC product information belonging to the product lifecycle phases (design, manufacturing, assembly, etc.). In addition, since online BIM documents are generally organized by different information providers, which often use natural languages for describing the textual contents of documents, our ontology should also include natural language ontologies for capturing terminologies in documents and queries. 4.2. Step 2: selecting and reusing existing resources The second step of Seven-Step method is to consider reusing existing semantic resources for our particular domain and task. Several wellknown semantic resources have been developed for various AEC applications, and they have the potential to be enhanced as domain ontology for retrieving online BIM documents [27,41]. The most notable efforts [27,41] include: ISO 12006-2, Uniclass, OmniClass, Industrial Foundation Classes (IFC) [11], etc. Among these existing semantic resources, structured taxonomies deserve particular attention. However, since improper taxonomies have the opposite effects leading to confusion and difficulty to retrieve [27], a properly structured taxonomy should be carefully selected to meet our needs. As a relative new field, ontology research for BIM resources is still rare, which needs more exploration. In this study, the IFC4 specification [11] is typically selected as the backbone of IFC IR Ontology. This paper aims to reuse rich semantic contents of newly developed IFC specification without having to understand the complex technical issues of IFC for information seekers. In addition, many general ontologies such as WordNet, EuroWordNet and Cyc are already available in electronic form, so we typically select WordNet [36] as a common lexical database for the English language. In this paper, WordNet will be combined with domain ontology to process terminologies in BIM documents and queries. 18 G. Gao et al. / Automation in Construction 56 (2015) 14–25 4.3. Step 3: enumerating important terms in the ontology The third step of Seven-Step method is to enumerate important terms in the ontology. In this work, our goal is to recognize the important terms in IFC specification to establish the ontology structure. Initially, it is important to get a comprehensive list of terms without worrying about overlap between concepts. Therefore, the key point lies on how to obtain a comprehensive list of concepts from the IFC specification. The IFC4 specification is the latest version of IFC standard, and it enhances the capability of previous IFC specifications in several areas of building elements, building service elements and structural elements and accompanying basic definitions. Currently, it contains 766 entities, 206 groups of enumeration types, 408 groups of property sets, 1691 individual properties, and lots of defined types, select types, quantity sets, functions and rules. We confine important terms to IfcObject in IFC specification, excluding the others. Currently, we have manually extracted 248 entities, 140 type enumerations, and 583 enumeration items from the IFC4 schema as the important terms of IFC IR Ontology. However, the IFC schema was designed for computer instead of human readability, which follows a special formalized naming convention. For example, the names of types, entities, rules and functions in IFC start with the prefix “Ifc” and continue with the English words in Camel Case naming convention (no underscore, first letter in word in upper case). Therefore, they should be refined by the process of segmenting the names of entity, removing the prefix and eliminating redundancy, for IR purpose. This process is semi-automatically done in our work. 4.4. Step 4: defining the classes and class hierarchy The fourth step is to define the classes and the class hierarchy. There are three possible approaches in developing the class hierarchy [42], including (1) the top-down approach, (2) the bottom-up approach, and (3) a combination development process. None of these three methods is inherently better than any of the others. Considering the consistency of our model, we adopt the first one (i.e., the top-down approach) in our work to avoid the duplication of works and the inconsistency of knowledge. Since the IFC specification has been defined as a hierarchical structure, its structure can be directly adopted in the class hierarchy of IFC IR Ontology. On the top of the ontology, we use the EXPRESS entity relationship directly, that is to say, we can map directly entities as well as their subtypes and supertypes (profiting from OWL classes inheritance). Afterwards, we treat the element's enumeration type as its subclasses. For example, the concept “covering” has the enumeration type (including “membrane”, “cladding”, “insulation”, “roofing”, “modeling”, “ceiling”, “flooring”, “wrapping”, etc.) according to the IFC schema. And, these types are connected to their superclass (i.e., “covering”) to define the class hierarchy in the ontology. Table 1 gives the outline of concepts in the resulting IFC IR Ontology and Fig. 2 shows a portion of class hierarchy in the ontology. In Fig. 2, the concept Covering inherited from the class Object has a property PredefinedType mapped to Covering Type Enumeration. An Object is the generalization of any semantically treated thing or process. In IFC4 specification, Object is the supertype of Actor, Control, Group, Process, Product, and Resource, which consists of the first level of IFC IR Ontology Table 1 The outline of concepts in the resulting IFC IR Ontology. Taxonomies Number of concepts Example of concepts Acquisition resources Actor Control Group Process Product Resource 2 11 12 4 211 8 Occupant Project order Building system Task Covering Crew resource IfcActor IfcControl IfcGroup IfcProcess IfcProduct IfcResource (see Table 1). In addition, IFC IR Ontology comprises a variety of relationships, for example “PredefinedType”, “OperationType”. There are totally 140 enumeration types related by these relationships. 4.5. Step 5: define the properties of classes The hierarchical structure is just a frame for the concept system which cannot reflect the varied relations between the concepts. Therefore, we need to describe and define the properties of classes in this step. These properties become slots attached to the classes. Slots are the descriptors used for describing the properties of classes and instances. We define the domain/range of the properties according to the applicable classes and DataType in the XML specification of the property set (Pset) of IFC 4. The OWL datatype property is used to represent the EXPRESS simple attributes, and the OWL object property represents named attributes. As an example of properties in IFC IR Ontology, the properties of class “door” are defined with “Acoustic Rating”, “Thermal Transmittance”, “Status”, “Self Closing”, “Smoke Stop”, “Durability Rating”, “Has Drive”, “Is External”, “Glazing Area Fraction”, “Hydrothermal Rating”, “Infiltration”, “Fire Exit”, “Security Rating”, “Handicap Accessible”, “Reference” and “Fire Rating”. 4.6. Step 6: defining the facets of the slots The slots of classes may have different facets to describe the value type, allowed values, the number of the values (cardinality), and other features of the values the slot can take. The conversion of EXPRESS simple types (String, Integer, Real, Binary, Boolean, Logical) is direct, as they have equivalents in OWL (XSD types). For instance, String type is directly mapped into xsd:string type. And, the value of “Fire Rating” in IFC (in a common property set of “door”) is IfcLabel, we directly map it to xsd:string type. As a result, “Fire Rating” becomes a slot of class “door”, whose type is string in OWL. Besides mapping the value type of the slots from EXPRESS to OWL, we also represent the allowed value types with OWL rules, and the allowed number of the values with OWL cardinality restriction according to the EXPRESS attribute's optional flag. 4.7. Step 7: creating the instances The last step of Seven-Step method is to create individual instances of classes in the hierarchy. However, since our goal in this study is information retrieval of unstructured BIM documents with the help of IFC IR Ontology, we don't have to use the instances of IFC files. Consequently, converting an IFC file to its ontology instance is not necessary for our current application. Of course, it is also interesting to explore information retrieval of IFC instance files, which is the future work in BIMSeek development. 5. Automatic query expansion process With reference to IFC IR Ontology, this section presents an automatic query expansion method for retrieving online BIM documents. Starting with query words as the input in the search engine, our algorithm mainly consists of four steps: generating the candidate concepts, expanding the candidate concepts, query pruning and keyword matching. The main procedure of our algorithm is shown in Fig. 3. The following will introduce each step in detail. 5.1. Step 1: generate the candidate concepts The users may not always formulate their search queries using the standardized concepts in IFC specification, so the original query terms are not suitable for direct searching. To execute the semantic search, we should identify the “meaning” of every term in the user's query, so that a matching concept in the IFC IR Ontology can be found. G. Gao et al. / Automation in Construction 56 (2015) 14–25 19 Fig. 2. Part of the classes and class hierarchy in IFC IR Ontology. This is a necessary step to semantic retrieval, through which semantics of the terms appearing in the user's query will be automatically replaced by the standardized concepts in the ontology. The first step of our algorithm preprocesses the simple noun phrases in the user's query, and generates the candidate concepts corresponding to IFC IR Ontology. In our implementation, we first use WordNet to find the synonyms for each query term, and then generate the candidate concepts of the query term by matching the domain ontology. WordNet [36] is a large lexical database of English, in which nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets). With reference to a generic lexicon in WordNet, a list of candidate concepts can be produced for further retrieval. For instance, the terms “cover”, “screening”, “masking”, “coating” and “application” are the synonyms of “covering” in WordNet, while only “covering” is the standardized concept in IFC IR ontology. When some synonyms words of “covering” appear to the user's query, the “covering” will be located as a candidate concept. However, WordNet, as the generic lexicon rather than domain ontology, often contains a large number of concepts that are not relevant to the target domain. As a result, these irrelevant concepts may lead to an inaccurate estimation of concept specificity and information retrieval. For example, “application” should not be regarded as the candidate concept of query “covering” in BIM-specific domain. To address this issue, a context-based method for concept identification can reduce the incorrect mapping, which will be illustrated in Section 5.3. extent of the neighborhood is governed by the relatedness function, as will be shown in Eq. (1). Especially, each expanded concept in the neighborhood is associated with a relatedness value according to its distance to the candidate concept. There are various relationships (including subclass, superclass, type enumeration etc.) between concepts in IFC specification. Therefore, the concepts, which have the above IFC relationships with each candidate concept, will be added to the new query terms, i.e., expanded concepts. Here, a semantic relatedness function between the candidate concept and its expanded concept is addressed for measuring the expansion range. The expansion process will be terminated when the value of semantic relatedness is less than a predefined threshold. There are several measures of semantic relatedness, such as Leacock– Chodorow measure [43], Jiang–Conrath measure [44], Lin measure [45], Resnik measure [46] and Wu–Palmer measure [47]. The measures of Lin, Resnik and Jiang–Conrath are based on the notion of information content. In contrast, the Leacock–Chodorow measure is based on pathlength, which was found more effective than information content-based relatedness measures [46]. In this section, we choose the Leacock– Chodorow measure [43] for building the semantic relatedness function, where the semantic relatedness between two concepts is estimated based on the conceptual links (i.e., the distance) between these concepts with reference to IFC IR Ontology. The smaller the distance between two concepts, the higher the semantic relatedness between the two concepts will be. Therefore, the semantic relatedness is inversely proportional to the semantic distance in the ontology. Our semantic relatedness function is defined by 5.2. Step 2: expand the candidate concepts Based on WordNet, Step 1 only generates some candidate concepts for the user's query in a level of word-concept matching. In information retrieval, the user-entered query is usually simple and short, and it is hard for a search engine to completely express the user's information needs. In order to represent the information seeker's needs in BIMspecific domain, the candidate concepts can be further expanded by utilizing the relationships between concepts in IFC IR Ontology. The expanded concepts will be added to the query to improve the accuracy and coverage in retrieval. In the second step of our algorithm, a concept expansion algorithm is proposed based on the relationships between concepts in IFC IR Ontology. The key task lies on how to compute the relatedness between two concepts in the ontology. In our implementation, each candidate concept is expanded through its neighborhood over the ontology to yield more concepts close to the candidate concept semantically. The RelIFC ðc; ci Þ ¼ − log lenðc; ci Þ ; 2Depth ð1Þ where c is a query concept in IFC IR Ontology, ci is the i-th concept related to c in its neighborhood, Depth is the maximal tree depth of the ontology. Especially, len(c, ci) is the shortest path between c and ci in IFC IR Ontology, which is calculated as the edge number on the shortest path (in practice, plus 1 to avoid that the value is zero). In Eq. (1), RelIFC(c, ci) is the semantic relatedness function which takes into account the shortest path len(c, ci) between two concepts c and ci defined in IFC IR Ontology. The factor of 2 in the denominator is used to represent the possible maximum length between two concepts. The distance between two concepts reaches the maximum length when they are both leaf nodes and the only common ancestor of them is the root node. This definition guarantees that in any circumstances, the relevance value between two concepts won't be less than 0. 20 G. Gao et al. / Automation in Construction 56 (2015) 14–25 Fig. 3. The main procedure of our algorithm. As an example in Fig. 2, the concept “covering” can be expanded to some subclasses (“membrane”, “cladding”, “insulation”, “roofing”, “modeling”, “ceiling”, “flooring”, “wrapping”, etc.) in the 1-level hyponym expansion mode. Considering the concepts “covering” and its subclass “membrane”, the Depth is equal to 6 and len(⋅) is equal to 2. Accordingly, the IFC relatedness between “covering” and “membrane” is: 5.3. Step 3: prune the irrelevant concepts ′′ ′′ RelIFC ‘‘covering ; ‘‘membrane ¼ −log 1 The “incorrect mapping” problem. As discussed in Section 5.1, WordNet, as a generic lexicon rather than domain ontology, often contains a large number of synonyms that are irrelevant to BIMspecific domain. As a result, these irrelevant concepts may lead to an inaccurate estimation of concept specificity and information retrieval. 2 The “overexpansion” problem. The expansion of concepts is based on the relationships defined in the IFC specification, which are usually 2 ¼ 0:778: 26 At the end of Step 2, we obtain the expanded concepts and their semantic relatedness values, where all the expanded concepts are saved as a set denoted by ExpansionSet. As discussed in Step 1 and Step 2, the original query terms are mapped into the IFC concepts, and then these concepts are expanded to a set of related concepts to formulate the appropriate domainspecific query. However, simple concept mapping and expansion may have some shortages as follows. G. Gao et al. / Automation in Construction 56 (2015) 14–25 made by some professionals. These relationships are independent of the document corpus in the domain. In practice, some expanded concepts that are irrelevant to the original query will be harmful to the retrieval, resulting in topic dilution of documents. To solve the above problems, we use a statistical method to examine the relatedness between concepts. In this step, we utilize local context analysis (LCA) [12,13] for improving IFC-based concept expansion. LCA acquires statistical relatedness between concepts and query terms according to co-occurrence analysis of the concepts with the query terms in top n passages retrieved by the query. In our implementation, we first use a standard IR system to retrieve the top n ranked passages in the corpus. Then these top-ranked passages are used as the local “context” of the query terms, and the “context” is used to analyze the co-occurrence of the related concepts and the query terms. The concepts, expanded in Section 5.2, are accepted according to a function of the frequencies of occurrence of the query terms and co-occurrence concepts in the retrieved passages, and the inverse passage frequencies for the entire collection of those terms and concepts. Only the concepts that have strong relationship are kept for final retrieval. Since the textual contents of online BIM documents are usually short and each document usually has one single topic rather than multiple topics, the passage in our study is defined as the whole document in a webpage. Given a user's query Q, we adopt LCA [12,13] to compute the statistic relatedness measure between an IFC concept c and Q, as defined by RelLCA ðc; Q Þ ¼ ∏ ðδ þ coðc; t i ÞÞ; t i∈Q ð2Þ where co(c, t i ) denotes the co-occurrence degree between the concept c and each term ti in the query Q, which is computed using the term frequency (tf) in the top-ranked documents and inverse document frequency (idf) in the corpus [12,13]. δ is a small positive constant (e.g., δ = 0.1) to avoid that the value of RelLCA is zero. Multiplication in Eq. (2) is used to emphasize co-occurrence with all query terms. - Against the “incorrect mapping” problem. For a concept c in the expanded concepts ExpansionSet, generated in Section 5.2, if the average of RelLCA values of concepts expanded from c is very small, it suggests that these expanded concepts are statistically irrelevant to the user's query through analyzing the document corpus. For example, as stated in Section 5.1, the terms “cover”, “screening”, “masking”, “coating” and “application” are initially mapped to the IFC concept “covering”. However, the retrieval documents using query term “application” barely include any other related concepts of “covering”. Hence the mapping from term “application” to concept “covering” is regarded as an incorrect mapping. Then all the concepts expanded from “covering” in ExpansionSet, including “membrane”, “wrapping”, etc., should be removed. - Against the “overexpansion” problem. The concepts in ExpansionSet, which have the small values of RelLCA (e.g., less than a predefined threshold 0.3), are removed from the ExpansionSet. This means that the expanded concepts with small RelLCA values are not quite relevant to the original query. After pruning the irrelevant concepts, the remaining part in ExpansionSet is added to the final query terms. For example, the mapping from the query term “masking” to IFC concept “covering” can be accepted according to the above principle. However, some of the expanded concepts, like “skirting board” has the relatively small RelLCA values, hence are abandoned from ExpansionSet in our algorithm. After pruning all irrelevant concepts from the ExpansionSet, the final relatedness weight of each concept in ExpansionSet can be calculated by combining Eq. (1) and Eq. (2). Given a query concept c and its expanded 21 ci in ExpansionSet, the relatedness weight between ci and the user's query Q is defined by α⋅RelIFC ðci ; cÞ⋅RelLCA ci;Q ; ω ci;Q ¼ α⋅RelIFC ðci ; cÞ þ RelLCA ci;Q ð3Þ where α is the importance factor (we typically set α = 1.0 in our implementation). 5.4. Step 4: rank the documents of BIM resources After the user's query is expanded by IFC IR Ontology and LCA, document ranking can be implemented based on each document's score against the query. The ranking process uses the vector space model (VSM) [14] to determine how relevant a given document is to the user's query. In the process, a Boolean model is first used to narrow down the documents that need to be scored based on the use of Boolean logic in query specification. Then, these resultant documents are ranked based on their matching scores between the query vector and their document vectors using the VSM. In the VSM implemented in this research, the expanded concepts are associated with the weights computed in Eq. (3). The score between the user's query Q and a document D in corpus is denoted by 2 3 1 1 7 2 ∑6 scoreðQ; DÞ ¼ 4tf ðt; DÞ⋅idf ðt Þ ⋅ωðt; Q Þ⋅ →5 ! t∈Q VQ VD ð4Þ ! where t is a term in the query Q, V Q is the query vector of user's query ! Q, V D is the document's vector of D. tf(⋅) is the term frequency in document D, which measures how often a term appears in the document. idf(⋅) is the inverse document frequency, which measures how often the term appears in document corpus. ω(t, Q) is the weight, calculated by Eq. (3), measuring the importance of each query term t in the query Q. 6. Experimental results 6.1. System overview By combining the usability of keyword-based interface with automatic query expansion techniques, we have developed a semantic search engine, named BIMSeek, for retrieving online BIM documents. The presented search engine is a query-driven information retrieval system, which automatically expands the user's query based on the ontology and then ranks BIM document repositories by analyzing the semantic details carried by such documents. For intuitively showing the ontology's concepts associated with the user's query, an ontology navigation toolbar is also developed. The toolbar provides a dynamic navigation way, which allows the user to manually refine the query through exploring the related concepts. Fig. 4 shows a snapshot view of the main interface of our search engine, which consists of four parts. First, the user inputs a specific query on the top-left (see “Query Area”). Then, the system automatically ranks online BIM documents and displays the search results produced by our algorithm on the bottom-left (see “Result Area”). The ontology navigation toolbar on the top-right allows the user to manually refine the query through clicking on the related concepts (see “Navigation Bar”). Alternatively, one can adjust the algorithm parameters for refining search through the control panel on the bottom-right (see “Parameter Panel”). In the process of building IFC IR Ontology in Section 4, we use OntoSTEP, a Protégé EXPRESS tool [40], to semi-automatically construct a raw IFC IR Ontology, and this raw ontology is used for further ontology 22 G. Gao et al. / Automation in Construction 56 (2015) 14–25 Fig. 4. The screenshot of our search engine. It allows a user to specify a search query using keywords. Then it returns the ranked results related to online BIM resources. The user may refine the search through clicking on “Navigation Bar” for accessing the query-related concepts. The algorithm parameters can be alternatively adjusted through the “Parameter Panel”. (a) Rank 1 using our method (b) Rank 2 using our method (d) Rank 1 using Lucene (e) Rank 2 using Lucene (c) Rank 3 using our method (f) Rank 3 using Lucene Fig. 5. Comparison of search results between our method and traditional keyword-based search (Lucene [48]). For the same test query “railing”, the webpages corresponding to top 3 query results (from left to right) are typically selected for displaying, where the top row shows our ranking results and the bottom row shows the ones using Lucene system. G. Gao et al. / Automation in Construction 56 (2015) 14–25 6.2. Evaluation In the experiments, the comparison between our method and keyword-based search has been conducted. To measure the performance of the keyword-based search, Lucene [48] with the default function was used. Lucene is frequently used as a baseline keywordbased system in IR. The Lucene [48] scoring uses a combination of the VSM and the Boolean model to determine how relevant a given document is to a user's query. Fig. 5 shows an example of search results with the test query “railing” applied to the benchmark collection of online BIM documents. We typically display the webpages corresponding to top three query results (from left to right), where the top row is our ranking results and the bottom row is the ones using the keyword-based search. For the keyword-based search, only the webpages that contain the keywords “railing” or “rail” are returned. However, some top search results, such as the webpage “Heavy-Duty Screen Doors” in Fig. 4 that just has a rail material description, don't sufficiently reflect user's search intent (“railing”). In contrast, using our method, some ontology's concepts (e.g., “balustrade”, “handrail” and “guardrail”) associated with the input query “railing” are expanded as new query terms. Consequently, the most relevant webpages, which are not only similar to the input query word (i.e., “railing”) but also semantically satisfying the expanded concepts (e.g., “guardrail” or “balustrade”), tend to be ranked at the top in our system. For example, rank 1 in Fig. 4 is a webpage about “aluminum Wall Mounted Handrails”, rank 2 in Fig. 4 is the one about “stainless steel handrail”, and rank 3 in Fig. 4 is the one about “an indoor/outdoor LED-based handrail”. In this way, BIM documents, which do not only hit the query keywords but also semantically satisfy an information seeker's needs, can be found. The example suggests that our retrieval method has an obvious advantage over the keyword-based search in BIM-specific domain. In order to evaluate the effectiveness of the proposed method, we run an experiment of information retrieval on a document collection. Currently, the document collection contains a total number of 15,176 BIM documents acquired from Autodesk Seek [2]. Autodesk Seek provides three industry-standard classifications (including MasterFormat, OmniClass and UniFormat) for browsing BIM model catalogue. In this test, we typically select OmniClass as the baseline classification of BIM documents for our retrieval evaluation. The OmniClass number of each BIM document is obtained through crawling the categories from the website, and the OmniClass number is used for “ground truth” for each test query. For instance, the OmniClass number “23.35.00.00” has its name “Covering, Cladding, and Finishes”. Then, we conceive of a test query “Covering, Cladding, and Finishes”, and judge whether each webpage in the search results belongs to the corresponding OmniClass classification (i.e., “23.35.00.00”) or its subclass (e.g., “23.35.20.21”). There are 30 test queries used in our test. To measure the performance of our method, we adopt the standard evaluation procedure from information retrieval, namely precision–recall curves, which is used for evaluating the retrieval results [49,50]. The precision–recall curves describe the relationship between precision and recall for an information retrieval method. In the precision–recall curve, the number of relevant documents for each query is denoted as Relevant, the number of documents retrieved for the query is denoted as Retrieved, and the number of relevant documents correctly retrieved is denoted as Relevant ∩ Retrieved. Then the recall is defined as Relevant∩Retrieved , and the precision is defined as Relevant∩Retrieved for each Relevant Retrieved experiment. It is desirable to achieve both high precision and recall, but unfortunately this is rather difficult to achieve, especially for the text-based retrieval problem. We compare our method with several retrieval methods, which include the keyword-based search (Lucene), query expansion based on WordNet (synonyms, hyponyms and hypernyms, respectively). Fig. 6 shows the average precision–recall curves. The results show that our method performs better than both the keyword-based search and WordNet-based query expansion methods. In addition to the precision–recall curves, we also evaluate other quantitative statistics for evaluation of retrieved results. Specifically, we compute E-measure and F-measure [51]. F-measure (also F-score) is a measure of a test's accuracy, and it is the harmonic mean of precision and recall. The F-measure is defined as Fβ¼ð1þβ2 Þ⋅ Precision⋅Recall : β2 ⋅Precision þ Recall It measures the effectiveness of retrieval with respect to a user who attaches times as much importance to recall as precision. β is a non-negative real value denoting the times as much importance to recall as precision. Since the recall and precision are both important in BIM resource retrieval, we set β = 1 to let the recall and precision rate evenly weighted. Another score combining precision and recall is E-measure, which means the effectiveness measure: E ¼ 1− α 1 : 1 1 þ ð1−α Þ Precision Recall In our test, α is set as 0.5 when β is 1. Table 2 gives F-measure and E-measure of our method and other methods. By examining the results, we can see that our method is higher in performance compared with other methods. 7. Conclusions and future work Based on IFC IR Ontology and local context analysis (LCA), this paper has introduced an automatic query expansion method for retrieving online BIM resources. IFC IR Ontology can be used for the disambiguation of terms on online BIM documents. Ontology-based query 0.7 Keyword−based search Our method WordNet synonym WordNet hyponym WordNet hypernym 0.6 0.5 Precision refinement. Then the built ontology is automatically converted into OWL structured form. In the process of semantic retrieval in Section 5, the toolkit OWLAPI, a semantic system framework, is used for handling ontology and OWL language. The indexing and ranking are offered by Apache Lucene, and Apache Nutch Web Crawler is used to collect online BIM documents. The search system developed in this work has been designed to be a flexible tool to test different retrieval methods. The system also provides a friendly interface for the user, in which the user can modify several simple parameters for adjusting the search results. In this section, all the experiments are run on a 2.80 GHz processor with 4 GB memory under Windows 7. 23 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Fig. 6. The precision–recall curves of retrieval results using our method and other methods. 24 G. Gao et al. / Automation in Construction 56 (2015) 14–25 References Table 2 F-measure and E-measure of our method and others. Method F-measure E-measure Keyword-based search WordNet synonym WordNet hyponym WordNet hypernym Our method 0.180093695 0.162367191 0.189367722 0.124649668 0.24816183 0.819906305 0.875350332 0.837632809 0.810632278 0.75183817 expansion enables the addition of standardized domain terms for search query, while LCA, through BIM document analysis, offers the broader scope of terms related to user information needs. The experimental results show that our method can achieve better efficiency for retrieving BIM documents, than traditional keyword-based search and WordNetbased query expansion methods. In the current implementation, only the inheritance relationship and the type enumeration in the IFC ontology are used for expanding and matching BIM documents. In our ongoing work, we are trying to use the properties and restrictions in the ontology for searching BIM models. One alternative way is to extract and convert each BIM model into in RDF or OWL format, and query with SPARQL. One of the limitations of our method is that the single IFC ontology cannot fully cover the IR needs of BIM-specific domain because of its multidisciplinary and multistakeholder nature. Therefore, there is a need to combine or merge the IFC ontology with some existing AEC ontologies such as Uniclass or OmniClass [52], so that more extensive BIM resources can be covered. In the future, we would like to integrate more AEC ontologies (e.g., OmniClass) in the search engine. On the other hand, the number of expansion terms using LCA is arguable. Too few expansion terms may have no impact, and too many will cause a query drift. A more effective query expansion method is another future direction of research. The experiment described in Section 6.2 is essentially a systemoriented evaluation [21], where the benchmark collection of BIM documents is acquired from Autodesk Seek [2] and OmniClass is used as the baseline classification of BIM documents for retrieval evaluation. Although such a system-oriented experiment can help evaluating the effectiveness of particular document ranking, it cannot examine the full operational characteristics of the proposed search engine. Therefore, a user-centered evaluation [53] might be conducted to supplement the system-oriented experiment by multiple humans independently. In particular, the inter-rater agreement (or inter-rater reliability) of document ranking can be assessed by some statistical measures such as Fleiss' kappa [54]. The user-centered evaluation studies will be left as part of our future work. Acknowledgments The authors appreciate the comments and suggestions of all reviewers, whose comments significantly improved this paper. The authors would like to thank Dr. Pieter Pauwels at Ghent University for discussing the problem of IFC-to-RDF conversion. The research is supported by the National Science Foundation of China (61472202, 61272229, 61003095) and the National Technological Support Program for the 12th-Five-Year Plan of China (2012BAJ03B07). The fourth author is supported by the Chinese 973 Program (2010CB328003) and the last author is supported by the Chinese 863 Program (2012AA040902). Supplementary material The proposed system and its demonstration can be accessed at: http://cgcad.thss.tsinghua.edu.cn/liuyushen/ifcqe/. [1] C. Eastman, P. Teicholz, R. Sacks, K. Liston, BIM Handbook: A Guide to Building Information Modeling for Owners, Managers, Designers, Engineers and Contractors, 2nd Edition John Wiley and Sons, NJ, 2011. [2] Autodesk, AutodeskSeek, http://seek.autodesk.com 2014. [3] BIMobject, http://bimobject.com/ 2014. [4] NBS of UK, National BIM Library, http://www.nationalbimlibrary.com/ 2014. [5] Google, Google 3D Warehouse, http://sketchup.google.com/3dwarehouse/ 2014. [6] SmartBIM LLC, SmartBIM, http://www.smartbim.com/ 2014. [7] Pierced Media LC, RevitCity, http://www.revitcity.com 2014. [8] A. Court, D. Ullman, S. Culley, A comparison between the provision of information to engineering designers in the UK and the USA, Int. J. Inf. Manag. 18 (6) (1998) 409–425. [9] G.J. Hahm, M.Y. Yi, J.H. Lee, H.W. Suh, A personalized query expansion approach for engineering document retrieval, Adv. Eng. Inform. 28 (4) (2014) 344–359. [10] Z. Li, V. Raskin, K. Ramani, Developing engineering ontology for information retrieval, J. Comput. Inf. Sci. Eng. 8 (1) (2008) 737–745. [11] Industry Foundation Classes (IFC), IFC4 Release Candidate 4, http://www. buildingsmart-tech.org/ifc/IFC2x4/rc4/html/index.htm 2014. [12] J. Xu, W.B. Croft, Query expansion using local and global document analysis, Proceedings of ACM SIGIR'96 1996, pp. 4–11. [13] J. Xu, W.B. Croft, Improving the effectiveness of information retrieval with local context analysis, ACM Trans. Inf. Syst. 18 (1) (2000) 79–112. [14] G. Salton, Developments in automatic text retrieval, Science 253 (5023) (1991) 974–980. [15] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman, Indexing by latent semantic analysis, J. Soc. Inf. Sci. 41 (6) (1990) 391–407. [16] T. Hofmann, Probabilistic latent semantic indexing, Proceedings of ACM SIGIR'99 1999, pp. 50–57. [17] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (4–5) (2003) 993–1022. [18] J. Bhogal, A. Macfarlane, P. Smith, A review of ontology based query expansion, Inf. Process. Manag. 43 (4) (2007) 866–886. [19] S. Kara, O. Alan, O. Sabuncu, S. Akpinar, N. Cicekli, F. Alpaslan, An ontology-based retrieval system using semantic indexing, Inf. Syst. 37 (4) (2012) 294–305. [20] O. Egozi, S. Markovitch, E. Gabrilovich, Concept-based information retrieval using explicit semantic analysis, ACM Trans. Inf. Syst. 29 (2) (2011) (Article 8). [21] X. Yan, R. Lau, D. Song, X. Li, J. Ma, Toward a semantic granularity model for domain-specific information retrieval, ACM Trans. Inf. Syst. 29 (3) (2011) (Article 15). [22] P. Demian, P. Balatsoukas, Information retrieval from civil engineering repositories: importance of context and granularity, J. Comput. Civ. Eng. 26 (6) (2012) 727–740. [23] K. Lin, S. Hsieh, H. Tserng, K. Chou, H. Lin, C. Huang, K. Tzeng, Enabling the creation of domain-specific reference collections to support text-based information retrieval experiments in the architecture, engineering and construction industries, Adv. Eng. Inform. 22 (3) (2008) 350–361. [24] K. Lin, L. Soibelman, Incorporating domain knowledge and information retrieval techniques to develop an architectural/engineering/construction online product search engine, J. Comput. Civ. Eng. 23 (4) (2009) 201–210. [25] H.-T. Lin, N.-W. Chi, S.-H. Hsieh, A concept-based information retrieval approach for engineering domain-specific technical documents, Adv. Eng. Inform. 26 (2) (2012) 349–360. [26] A. Weissman, M. Petrov, S. Gupta, A computational framework for authoring and searching product design specifications, Adv. Eng. Inform. 25 (3) (2011) 516–534. [27] Y. Rezgui, Ontology-centered knowledge management using information retrieval techniques, J. Comput. Civ. Eng. 20 (4) (2006) 261–270. [28] L. Zhang, R. Issa, Ontology based partial building information model extraction, J. Comput. Civ. Eng. 27 (6) (2013) 576–584. [29] J. Beetz, J. Leeuwenand, B. Vries, IfcOWL: a case of transforming EXPRESS schemas into ontologies, Artip. Intell. Eng. Des. Anal. Manuf. (AI EDAM) 23 (1) (2009) 89–101. [30] L. Zhang, R. Issa, Development of IFC-based construction industry ontology for information retrieval from IFC models, Proceedings of the 2011 EG-ICE Workshop, 2011. [31] P. Pauwels, D.V. Deursen, R. Verstraeten, J.D. Roo, R.D. Meyer, R.V. de Walle, J.V. Campenhout, A semantic rule checking environment for building performance checking, Autom. Constr. 20 (5) (2011) 506–518. [32] M.P. Nepal, S. Staub-French, R. Pottinger, A. Webster, Querying a building information model for construction-specific spatial information, Adv. Eng. Inform. 26 (4) (2012) 904–923. [33] A. Borrmann, E. Rank, Specification and implementation of directional operators in a 3D spatial query language for building information models, Adv. Eng. Inform. 23 (1) (2009) 32–44. [34] G. Gao, Y.-S. Liu, M. Wang, X.-G. Han, BIMTag: semantic annotation of web BIM product resources based on IFC ontology, 21st International Workshop: Intelligent Computing in Engineering 2014 (EG-ICE), 2014. [35] T. Gruber, A translation approach to portable ontology specifications, Knowl. Acquis. 5 (2) (1993) 199–220. [36] G. Miller, WordNet: a lexical database for English, Commun. ACM 38 (11) (1995) 39–41. [37] T. Lukasiewicz, U. Straccia, Managing uncertainty and vagueness in description logics for the semantic web, J. Web Semant. 6 (4) (2008) 291–308. [38] W3C, OWL 2 Web Ontology Language Overview, http://www.w3.org/TR/owl-features/ 2009. G. Gao et al. / Automation in Construction 56 (2015) 14–25 [39] N.F. Noy, D.L. McGuinness, Ontology development 101: a guide to creating your first ontology, http://protege.stanford.edu/publications/ontology_development/ontology1 01.html 2014. [40] Protégé, An ontology and knowledge base editor, http://protege.stanford.edu/ 2014. [41] K. Lin, L. Soibelman, Promoting transactions for A/E/C product information, Autom. Constr. 15 (6) (2006) 746–757. [42] M. Uschold, M. Gruninger, Ontologies: principles, methods and applications, Knowl. Eng. Rev. 11 (2) (1996) 93–136. [43] C. Leacock, M. Chodorow, Combining Local Context and WordNet Similarity for Word Sense Identification, 1st EditionMIT Press, 1998 (Ch. 11). [44] J. Jiang, D. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, Proceedings of International Conference Research on Computational Linguistics 1997, pp. 112–123. [45] D. Lin, An information-theoretic definition of similarity, Proceeding of 15th International Conference on Machine Learning 1998, pp. 296–304. [46] P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, International Joint Conference on Artificial Intelligence 1995 1995, pp. 448–453. 25 [47] Z. Wu, M. Palmer, Verb semantics and lexical selection, Proceedings of 32nd Annual Meeting of the Association for Computational Linguistics 1994, pp. 448–453. [48] Lucene, http://lucene.apache.org/ 2014. [49] V. Raghavan, P. Bollmann, G.S. Jung, A critical investigation of recall and precision as measures of retrieval system performance, ACM Trans. Inf. Syst. 7 (3) (1989) 205–229. [50] C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. [51] C.J.V. Rijsbergen, Information Retrieval, 2nd Edition Butterworths, London, 1979. [52] N. El-Gohary, T. El-Diraby, Merging architectural, engineering, and construction ontologies, J. Comput. Civ. Eng. 25 (2) (2011) 109–128. [53] T. Saracevic, Evaluation of evaluation in information retrieval, Proceedings of ACM SIGIR'95 1995, pp. 138–146. [54] J.L. Fleiss, Measuring nominal scale agreement among many raters, Psychol. Bull. 76 (5) (1971) 378–382.
© Copyright 2024