Information Description Corpus Demo Bibliography

There have been three versions of the natural language interface for databases (NLIDB) presented on this web page (all of them allow formulating queries in Spanish). The first version, described in (Pazos et al., 2005, see bibliography below) uses a pre-processor that automatically builds a data dictionary, which allows dealing with domain independence. The translation technique involves the interpretation of nouns, prepositions and conjunctions. The second version, described in (Pazos et al., 2010) includes domain-independent dialogue processes, which are based on a typification of problems occurring in queries that involve most of the cases found in queries.


This web page presents a demo of the new version of the NLIDB. Though, it has been developed for querying databases (DBs) in Spanish, this demo and the corpus of queries have been prepared so that English-speaking people can easily test the NLIDB.


The NLIDB has been pre-customized for the ATIS database. We decided to use the ATIS database for testing because it is an example of complex medium-size databases that can be found in real-life applications. The ATIS benchmark consists of 28 tables, 125 columns, and a corpus with 85% of elliptical queries.

1. Description of the ATIS database.

2. Corpus of test queries.

3. NLIDB demo.

4. Bibliography.

The customization was performed in two steps: first an automatic customization was carried out, and afterwards the initial customization was manually fine-tuned. The automatic customization was carried out using the descriptions of the DB tables and columns, which can be found in the document Description of the ATIS database above. The manual fine-tuning was carried out using the Domain editor utility of the NLIDB, which has been partially disabled in this demo. This utility allows looking at the descriptors of DB tables and columns, but it does not permit modifying the customization.


The document Corpus of test queries contains a set of 70 queries for the ATIS database. Each query is written both in Spanish and also in English so that English-speaking people can understand its meaning. Since the NLIDB has been developed for Spanish, users are advised to copy the Spanish version of a query from the corpus and paste it in the text field of the NLIDB below the label Enter the natural language query. After clicking the Send button, the NLIDB translates the query to SQL (which is displayed under the label SQL translation) and shows the results obtained from the DB server.


Unlike other NLIDBs, the translation process of this NLIDB from natural language to SQL does not look into the database for search values in order to determine the columns needed for generating the SQL statement. This enables this NLIDB to correctly translate the queries in the corpus into SQL even though the search value is not stored in the database. For example, query number 24 from the corpus may me modified as follows: Tarifa de viaje redondo para el vuelo desde XXX (Round-trip airfare for the flight from XXX. In this case, the NLIDB will correctly translate the query, but it will show the message There are no records for this query, since XXX is not the code of any airport.

Powered by Instituto Tecnológico de Ciudad Madero