About Omniglot

Omniglot is an encyclopedia of writing systems and languages.

It contains:

Details of more than 180 writing systems, including Abjads, Alphabets, Abugidas, Syllabaries and Semanto-phonetic scripts
Information about over 500 languages
More than 300 con-scripts – writings systems invented by visitors to this site
Tips on learning languages
Language-related articles
Useful foreign phrases in more than 150 languages with quite a few audio recordings
Texts, language names, country names, colours and songs in many languages
A language book store
Links to language-related resources


40 Fascinating Lectures for Linguistics Geeks | Online Universities

Linguistics is kind of like The Force — it surrounds us, penetrates us and binds the galaxy together. Or at least the planet, anyway. Both this universality and frequent intersections with a diverse array of subjects — including, but not limited to, cognitive science, literature, politics, psychology, communication, anthropology and more — make linguistics a compelling, dynamic, nuanced study. The following lectures, by no means the only ones available online, represent a lovely little slice of how language permeates all things, for better and for worse.


GT | Newsroom – Email Language Tips Off Work Hierarchy

Members of the modern workforce might be surprised to learn that if they use the word “weekend” in a workplace email, chances are they’re sending the message up the org chart. Likewise the words “voicemail,” “driving,” “okay”—and even a choice four-letter word that rhymes with “hit.” However a new study by Georgia Tech’s Eric Gilbert shows that certain words and phrases indeed are reliable indicators of whether workplace emails are sent to someone higher or lower in the corporate hierarchy.


Speech Accent Archive

The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers.


Distinguishing blue from green in language

The English language makes a distinction between blue and green, but some languages do not. Of these, quite a number, mostly in Africa, do not distinguish blue from black either, while there are a handful of languages that do not distinguish blue from black but have a separate term for green. Also, some languages treat light (often greenish) blue and dark blue as separate colors, rather than different variations of blue, while English does not.


Controlled Natural Language

Controlled Natural Languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. Traditionally, controlled natural languages fall into two major categories: those that improve the readability for human readers, in particularly for non-native speakers, and those that improve the computational processing of a text


Topic Modeling

Topic models provide a simple way to analyze large volumes of unlabeled text. A “topic” consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings.
The MALLET topic model package includes an extremely fast and highly scalable implementation of Gibbs sampling, efficient methods for document-topic hyperparameter optimization, and tools for inferring topics for new documents given trained models.


Home : Inform

Inform is a design system for interactive fiction based on natural language. It is a radical reinvention of the way interactive fiction is designed, guided by contemporary work in semantics and by the practical experience of some of the world’s best-known writers of IF.


Native American Language Net: Preserving and promoting indigenous American Indian languages

Native American languages do not belong to a single Amerindian family, but 25-30 small ones; they are usually discussed together because of the small numbers of natives speaking most of these languages and how little is known about many of them. There are around 25 million native speakers of the more than 800 surviving Amerind languages. The vast majority of these speakers live in Central and South America, where language use is vigorous. In Canada and the United States, only about half a million native speakers of an Amerind tongue remain.

Click on a language family to see a linguistic tree of that family and links about the group. Click on a language name to see a description and links about that language, as well as information about the American Indian people who speak it.


What is Ellogon?

Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.


Europarl Parallel Corpus

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

The goal of the extraction and processing was to generate sentence aligned text for statistical machine translation systems. For this purpose we extracted matching items and labeled them with corresponding document IDs. Using a preprocessor we identified sentence boundaries. We sentence aligned the data using a tool based on the Church and Gale algorithm.

Load More