NLIDB Home Description Corpus Demo Bibliography




Comparative study on the customization of natural language interfaces to databases

Open access


Rodolfo A. Pazos Rangel, Marco A. Aguirre L., J. Javier González Barbosa, José A. Martínez F., Joaquín Pérez O., and Andrés A. Verástegui O.

Computer Science

SpringerPlus 2016, 5:553


Abstract. In the last decades the popularity of natural language interfaces to databases (NLIDBs) has increased, because in many cases information obtained from them is used for making important business decisions. Unfortunately, the complexity of their customization by database administrators make them difficult to use. In order for a NLIDB to obtain a high percentage of correctly translated queries, it is necessary that it is correctly customized for the database to be queried. In most cases the performance reported in NLIDB literature is the highest possible; i.e., the performance obtained when the interfaces were customized by the implementers. However, for end users it is more important the performance that the interface can yield when the NLIDB is customized by someone different from the implementers. Unfortunately, there exist very few articles that report NLIDB performance when the NLIDBs are not customized by the implementers. This article presents a semantically-enriched data dictionary (which permits solving many of the problems that occur when translating from natural language to SQL) and an experiment in which two groups of undergraduate students customized our NLIDB and English language frontend (ELF), considered one of the best available commercial NLIDBs. The experimental results show that, when customized by the first group, our NLIDB obtained a 44.69 % of correctly answered queries and ELF 11.83 % for the ATIS database, and when customized by the second group, our NLIDB attained 77.05 % and ELF 13.48 %. The performance attained by our NLIDB, when customized by ourselves was 90 %. 




Interface for Composing Queries for Complex Databases for Inexperienced Users


Rodolfo A. Pazos R., Alan G. Aguirre L., Marco A. Aguirre L., and José A. Martínez F.

Hybrid Artificial Intelligent Systems.

Lecture Notes in Computer Science Volume 9121, 2015, pp. 61-72.


Abstract. In most business activities, decision-making has a very important role, since it may benefit or harm the business. Nowadays decision-making is based on information obtained from databases, which are only accessible directly by computer experts; however, the end-user that requires information from a database is not always a computer expert, so the need arises to allow inexperienced users to obtain information directly from a database. To this end, several tools are commercially available such as visual query building and natural language interfaces to databases (NLIDBs). However, the first kind of tools requires at least a basic level of knowledge of some formal query language, while NLIDBs, despite the fact that users do not require training for using the interface, have not obtained the desired performance due to problems inherent to natural language processing. In this paper an intuitive interface is presented, which allows inexperienced users to easily compose queries in SQL, without the need of training on its operation nor having knowledge of SQL.




Features and Pitfalls that Users Should Seek in Natural Language Interfaces to Databases


Rodolfo A. Pazos Rangel, Marco A. Aguirre, Juan J. González and Juan Martín Carpio.

Recent Advances on Hybrid Approaches for Designing Intelligent Systems.

Studies in Computational Intelligence Volume 547, 2014, pp. 617-630.


Abstract. Natural Language Interfaces to Databases (NLIDBs) are tools that can be useful in making decisions, allowing different types of users to get information they need using natural language communication. Despite their important features and that for more than 50 years NLIDBs have been developed, their acceptance by end users is very low due to extremely complex problems inherent to natural language, their customization and internal operation, which has produced poor performance regarding queries correctly translated. This chapter presents a study on the main desirable features that NLIDBs should have as well as their pitfalls, describing some study cases that occur in some interfaces to illustrate the flaws of their approach.




Natural Language Interfaces to Databases: An Analysis of the State of the Art


Rodolfo A. Pazos R., Juan J. González B., Marco A. Aguirre L., José A. Martínez F. and Héctor J. Fraire H.

Recent Advances on Hybrid Intelligent Systems.

Studies in Computational Intelligence Volume 451, 2013, pp. 463-480.

Abstract. People constantly make decisions based on information, most of which is stored in databases. Accessing this information requires the use of query languages to databases such as SQL. In order to avoid the difficulty of using these languages for users who are not computing experts, Natural Language Interfaces for Databases (NLIDB) have been developed, which permit to query databases through queries formulated in natural language. Although since the 60s many NLIDBs have been developed, their performance has not been satisfactory, there still remain very difficult problems that have not been solved by NLIDB technology, and there does not yet exist a standardized method of evaluation that permits to compare the performance of different NLIDBs. This chapter presents an analysis of NLIDBs, which includes their classification, techniques, advantages, disadvantages, and a proposal for a proper evaluation of them.




Interfaces de Lenguaje Natural para Consultar Bases de Datos en Español

Open access


Marco A. Aguirre L. and Rodolfo A. Pazos R.

Komputer Sapiens Año IV, Volumen II, pp. 20-24

Abstract. Desde tiempos remotos, el ser humano ha tenido en mente la idea de imitar los procesos del pensamiento de manera artificial. Con la creación de la computadora electrónica, el sueño del hombre ha sido la posibilidad de establecer comunicación con ésta a través del lenguaje natural y obtener información mediante dicha interacción. En las últimas décadas la información ha jugado un papel importante en nuestra vida cotidiana, la mayoría de las personas solicitan informaci´on antes de tomar una decisión importante. Actualmente, las fuentes m´as grandes de información se encuentran almacenadas en bases de datos (BDs). Las BDs contienen una colección de datos relacionados entre sí, los cuales son estructurados para modelar la informaci´on que se encuentra en el mundo real. Para que un usuario obtenga la información de una BD, es necesario que se formule una consulta de tal manera que la computadora la interprete y genere la respuesta correcta (usualmente se utiliza un lenguaje de consulta a BDs, tal como SQL, por sus siglas en inglés, Structured Query Language). Desafortunadamente, no cualquier usuario es capaz de escribir tales consultas, especialmente aquéllos que carecen de conocimientos computacionales. Únicamente profesionales de la computación pueden formular ese tipo de consultas, lo cual es costoso y tiene limitaciones de tiempo.




Semantic Model for Improving the Performance of Natural Language Interfaces to Databases


Rodolfo A. Pazos R., Juan J. González B., and Marco A. Aguirre L.

Advances in Artificial Intelligence.

Lecture Notes in Computer Science Volume 7094, 2011, pp. 277-290

Abstract. Despite the fact that since the late 60s many Natural Language Interfaces to Databases (NLIDBs) have been developed, up to now many problems continue, which prevent the translation process from natural language to SQL to be totally successful. Some of the main problems that have been encountered relate to 1) achieving domain independence, 2) the use of words or phrases of different syntactic categories for referring to tables and columns, and 3) semantic ellipsis. This paper introduces a new method for modeling databases that includes relevant information for improving the performance of NLIDBs. This method will be useful for solving many problems found in the translation from natural language to SQL, using a database model that contains linguistic information that provides more semantic information than that found in conventional database models (such as the extended entity-relationship model) and those used in previous NLIDBs.




Dialogue Manager for a NLIDB for Solving the Semantic Ellipsis Problem in Query Formulation


Rodolfo A. Pazos R., Juan C. Rojas P., René Santaolaya S., José A. Martínez F., and Juan J. Gonzalez B.

Knowledge-Based and Intelligent Information and Engineering Systems.

Lecture Notes in Computer Science Volume 6277, 2010, pp. 203-213.


Abstract. A query written in natural language (NL) may involve several linguistic problems that cause a query not being interpreted or translated correctly into SQL. One of these problems is implicit information or semantic ellipsis, which can be understood as the omission of important words in the wording of a query written in NL. An exhaustive survey on NLIDB works has revealed that most of these works has not systematically dealt with semantic ellipsis. In experiments conducted on commercial NLIDBs, very poor results have been obtained (7% to 16.9%) when dealing with query corpora that involve semantic ellipsis. In this paper we propose a dialogue manager (DM) for a NLIDB for solving semantic ellipsis problems. The operation of this DM is based on a typification of elliptical problems found in queries, which permits to systematically deal with this problem. Additionally, the typification has two important characteristics: domain independence, which permits the typification to be applied to queries of different databases, and generality, which means that it holds for different languages such as English, French, Italian, Spanish, etc. These characteristics are inherited to the dialogue processes implemented in the DM, since they are based on this typification. In experiments conducted with this DM and a NLIDB on a corpus of elliptical queries, an increase of correctly answered queries of 30-35% was attained.




Shedding Light on a Troublesome Issue in NLIDBS


Rodolfo Pazos, René Santaolalaya S., Juan C. Rojas P., and Joaquín Pérez O.

Text, Speech and Dialogue.

Lecture Notes in Computer Science Volume 5246, 2008, pp. 641-648.


Abstract. A natural language interface to databases (NLIDB) without help mechanisms that permit clarifying queries is prone to incorrect query translation. In this paper we draw attention to a problem in NLIDBs that has been overlooked and has not been dealt with systematically: word economy; i.e., the omission of words when expressing a query in natural language (NL). In order to get an idea of the magnitude of this problem, we conducted experiments on EnglishQuery when applied to a corpora of economized-wording queries. The results show that the percentage of correctly answered queries is 18%, which is substantially lower than those obtained with corpora of regular queries (53%–83%). In this paper we describe a typification of problems found in economized-wording queries, which has been used to implement domain-independent dialog processes for an NLIDB in Spanish. The incorporation of dialog processes in an NLIDB permits users to clarify queries in NL, thus improving the percentage of correctly answered queries. This paper presents the tests of a dialog manager that deals with four types of query problems, which permits to improve the percentage of correctly answered queries from 60% to 91%. Due to the generality of our approach, we claim that it can be applied to other domain-dependent or domain-independent NLIDBs, as well as other NLs such as English, French, Italian, etc.




Issues in Translating from Natural Language to SQL in a Domain-Independent Natural Language Interface to Databases

Juan J. González B., Rodolfo A. Pazos Rangel, I. Cristina Cruz C., H. Héctor J. Fraire, L. de Santos Aguilar, and O. Joaquín Pérez.

MICAI 2006: Advances in Artificial Intelligence.

Lecture Notes in Computer Science Volume 4293, 2006, pp. 922-931.


Abstract. This paper deals with a domain-independent natural language interface to databases (NLIDB) for the Spanish language. This NLIDB had been previously tested for the Northwind and Pubs domains and had attained good performance (86% success rate). However, domain independence complicates the task of achieving high translation success, and to this end the ATIS (Air Travel Information System) database, which has been used by several natural language interfaces, was selected to conduct a new evaluation. The purpose of this evaluation was to asses the efficiency of the interface after the reconfiguration for another domain and to detect the problems that affect translation success. For the tests a corpus of queries was gathered and the results obtained showed that the interface can easily be reconfigured and that attained a 50% success rate. When the found problems concerning query translation were analyzed, wording deficiencies of some user queries and several errors in the synonym dictionary were discovered. After correcting these problems a second test was conducted, in which the interface attained a 61.4% success rate. These experiments showed that user training is necessary as well as a dialogue system that permits to clarify a query when it is deficiently formulated.




A Domain Independent Natural Language Interface to Databases Capable of Processing Complex Queries.


Rodolfo A. Pazos Rangel, O. Joaquín Pérez, B. Juan Javier González, Alexander Gelbukh, Grigori Sidorov, and M. Myriam J. Rodríguez.

MICAI 2005: Advances in Artificial Intelligence.

Lecture Notes in Computer Science Volume 3789, 2005, pp. 833-842.


Abstract. We present a method for creating natural language interfaces to databases (NLIDB) that allow for translating natural language queries into SQL. The method is domain independent, i.e., it avoids the tedious process of configuring the NLIDB for a given domain. We automatically generate the domain dictionary for query translation using semantic metadata of the database. Our semantic representation of a query is a graph including information from database metadata. The query is translated taking into account the parts of speech of its words (obtained with some linguistic processing). Specifically, unlike most existing NLIDBs, we take seriously auxiliary words (prepositions and conjunctions) as set theory operators, which allows for processing more complex queries. Experimental results (conducted on two Spanish databases from different domains) show that treatment of auxiliary words improves correctness of translation by 12.1%. With the developed NLIDB 82of queries were correctly translated (and thus answered). Reconfiguring the NLIDB from one domain to the other took only ten minutes.




Spanish Natural Language Interface for a Relational Database Querying System


Rodolfo A. Pazos Rangel, Alexander Gelbukh, J. Javier González Barbosa, Erika Alarcón Ruiz, Alejandro Mendoza Mejía, and A. Patricia Domínguez Sánchez.

Text, Speech and Dialogue.

Lecture Notes in Computer Science Volume 2448, 2002, pp. 123-130.


Abstract. The fast growth of Internet is creating a society where the demand on information storage, organization, access, and analysis services is continuously growing. This constantly increases the number of inexperienced users that need to access databases in a simple way. Together with the emergence of voice interfaces, such a situation foretells a promising future for database querying systems using natural language interfaces. We describe the architecture of a relational database querying system using a natural language (Spanish) interface, giving a brief explanation of the implementation of each of the constituent modules: lexical parser, syntax checker, and semantic analyzer. 






Powered by Instituto Tecnológico de Ciudad Madero