Welcome to DiACL!

DiACL is an open access database with lexical and typological/morphosyntactic data for historical, comparative and phylogenetic linguistics. It contains data from 500 languages of 18 families, divided into three macro-areas: Eurasia, Pacific, and the Amazon. The database has the following content:

  • Lexical data sets with basic vocabularies (Swadesh lists)
  • Lexical data sets with culture vocabularies, focusing on subsistence system vocabulary
  • Typological/morphosyntactic data sets including the main types Word Order, Alignment, and Nominal/ Verbal Morphology.

DiACL contains data from contemporary and historical languages, and, if possible, reconstructed languages. Data is derived from dictionaries, grammars, or by new fieldwork (in particular data from Caucasus and the Amazon). All data is sourced in scientifically reliable literature.

Language metadata includes geographic position, alternative names, reliability, and family tree topology.

Culture vocabulary data is organized into semantic taxonomies of lexical meanings, which are adapted to macro-areas. Culture vocabulary meanings are selected according to geography and environment (by identifying culture-relevant flora and fauna of macro-areas), relevance to subsistence system of language families, cultural function or affordance, and occurrence in reconstructed vocabularies of targeted language families.

Lexemes are organized under etymologies (cognates), which are graphically reproduced as trees and maps on the database frontend. Lexical etymologies account for borrowing, morphological derivation, and semantic change.

Typological/morphosyntactic data is organized into a four-level hierarchy, which enables coding of polymorphic behaviour (e.g., several word orders) in individual languages.

Typological/morphosyntactic features are selected to match known prototypical features of linguistic areas of included macro-areas (Eurasia, Pacific, Amazon), targeting properties which ensure a typological variation and which are known to correlate typologically to each other. Besides, typological/morphosyntactic features are selected to whether they can be identified in historical languages.

From the frontend, data sets with basic vocabulary, culture vocabulary (by area), and typology/morphosyntax (by area) are downloadable in XML/JSON format for computational analysis (requires registration/permission).

Data and results will be published in monograph form in collaboration with De Gruyter under the title Mouton Atlas of Languages and Cultures (2019).

Fieldwork data (ELAN, audio/video files) from several of the targeted language areas is available via the Lund Corpus Server on the Lund Humanities Lab.

DiACL is a SWE-CLARIN resource, hosted by Lund University.

Note from the editor: The DiACL database is a work in progress. Even though our aim is to produce a scientifically reliable resource, where all datapoints are sourced, mistakes and inconsistencies might be found in the codings. Further, datasets may be in the process of expansion or change. We are grateful for all comments on the content of the database, either on our Facebook page, or by mail to the editor (Gerd Carling). We will also inform, via the webpage, on changes of datasets in the database.


Carling, Gerd (ed.) 2017. Diachronic Atlas of Comparative Linguistics Online. Lund: Lund University. (DOI/URL: Accessed on: x.).

Data in the DiACL database consist of individual data for specific languages, sometimes retrieved by fieldwork. For fieldwork data from individual languages, not available elsewhere, quote data by the source (see the Source menu: Literary Sources and Informants).

SOURCE. In: Carling, Gerd (ed.) 2017. Diachronic Atlas of Comparative Linguistics Online. Lund: Lund University. (DOI/URL Accessed on: x.).

Where SOURCE is for example: da Silva Sinha, V. & Cronhamn, S. (2013). Fieldwork data from Amazonian languages. Native speaker: Wary.

Overview of languages for which data sets are present in the database.


DiACL was previously also known as LUNDIC.