Cunning Linguist

conceptnet-numberbatch

2018-07-08 satyr.nl

GitHub – commonsense/conceptnet-numberbatch

ConceptNet Numberbatch consists of state-of-the-art semantic vectors (also known as word embeddings) that can be used directly as a representation of word meanings or as a starting point for further machine learning.

Cunning Linguist

Omniglot

2012-06-12 satyr.nl

About Omniglot

Omniglot is an encyclopedia of writing systems and languages.

It contains:

Details of more than 180 writing systems, including Abjads, Alphabets, Abugidas, Syllabaries and Semanto-phonetic scripts
Information about over 500 languages
More than 300 con-scripts – writings systems invented by visitors to this site
Tips on learning languages
Language-related articles
Useful foreign phrases in more than 150 languages with quite a few audio recordings
Texts, language names, country names, colours and songs in many languages
A language book store
Links to language-related resources

Cunning Linguist

40 Fascinating Lectures for Linguistics Geeks

2012-02-25 satyr.nl

40 Fascinating Lectures for Linguistics Geeks | Online Universities

Linguistics is kind of like The Force — it surrounds us, penetrates us and binds the galaxy together. Or at least the planet, anyway. Both this universality and frequent intersections with a diverse array of subjects — including, but not limited to, cognitive science, literature, politics, psychology, communication, anthropology and more — make linguistics a compelling, dynamic, nuanced study. The following lectures, by no means the only ones available online, represent a lovely little slice of how language permeates all things, for better and for worse.

Cunning Linguist

Email Language Tips Off Work Hierarchy

2012-02-20 satyr.nl

GT | Newsroom – Email Language Tips Off Work Hierarchy

Members of the modern workforce might be surprised to learn that if they use the word “weekend” in a workplace email, chances are they’re sending the message up the org chart. Likewise the words “voicemail,” “driving,” “okay”—and even a choice four-letter word that rhymes with “hit.” However a new study by Georgia Tech’s Eric Gilbert shows that certain words and phrases indeed are reliable indicators of whether workplace emails are sent to someone higher or lower in the corporate hierarchy.

Cunning Linguist

Lyric Writing For Crap Lyric Writers

2011-12-29 satyr.nl

Lyric Writing For Crap Lyric Writers | Guitar Columns @ Ultimate-Guitar.Com

Even if you can already play through every Satriani record from memory, and everyone in your family including weird Aunt Ida tells you that you’re a genius, you’ve still got a long way to go.

Cunning Linguist

Shapecatcher.com: Unicode Character Recognition

2011-12-22 satyr.nl

Shapecatcher.com: Unicode Character Recognition

The best way to quickly find that unicode for ‘Face savouring delicious food’

Cunning Linguist

An Etymologist's View of the World

2011-12-06 satyr.nl

An Etymologist’s View of the World

It looks like a normal map, but once you start reading, it becomes clear that the “Atlas of True Names” is not at all conventional. It is an etymological trip around the world.

Cunning Linguist

Speech Accent Archive

2011-11-11 satyr.nl

Speech Accent Archive

The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers.

Cunning Linguist

Distinguishing blue from green in language

2011-11-09 satyr.nl

Distinguishing blue from green in language

The English language makes a distinction between blue and green, but some languages do not. Of these, quite a number, mostly in Africa, do not distinguish blue from black either, while there are a handful of languages that do not distinguish blue from black but have a separate term for green. Also, some languages treat light (often greenish) blue and dark blue as separate colors, rather than different variations of blue, while English does not.

Cunning Linguist

Alan Kennedy's Color/Language Project – The Idiom List

2011-10-24 satyr.nl

Alan Kennedy’s Color/Language Project – The Idiom List

a large list of color related idioms in several languages

Cunning Linguist

Controlled Natural Language

2011-03-23 satyr.nl

Controlled Natural Language

Controlled Natural Languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. Traditionally, controlled natural languages fall into two major categories: those that improve the readability for human readers, in particularly for non-native speakers, and those that improve the computational processing of a text

Cunning Linguist

Free Foreign Language Lessons

2011-03-20 satyr.nl

Free Foreign Language Lessons | Open Culture

Learn languages for free. Features 37 foreign languages, including Spanish, French, English, Mandarin, Italian, Russian and more. Download lessons to your computer and mp3 player and you’re good to go.

Cunning Linguist

Topic Modeling

2011-03-10 satyr.nl

Topic Modeling

Topic models provide a simple way to analyze large volumes of unlabeled text. A “topic” consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings.
The MALLET topic model package includes an extremely fast and highly scalable implementation of Gibbs sampling, efficient methods for document-topic hyperparameter optimization, and tools for inferring topics for new documents given trained models.

Cunning Linguist

php-stemmer

2011-01-21 satyr.nl

php-stemmer

This stem extension for PHP provides stemming capability for a variety of languages using Dr. M.F. Porter’s Snowball API.

It has a much simpler API than the stem extension found in pecl.

Cunning Linguist

Inform

2011-01-10 satyr.nl

Home : Inform

Inform is a design system for interactive fiction based on natural language. It is a radical reinvention of the way interactive fiction is designed, guided by contemporary work in semantics and by the practical experience of some of the world’s best-known writers of IF.

Cunning Linguist

Native American Language

2010-11-28 satyr.nl

Native American Language Net: Preserving and promoting indigenous American Indian languages

Native American languages do not belong to a single Amerindian family, but 25-30 small ones; they are usually discussed together because of the small numbers of natives speaking most of these languages and how little is known about many of them. There are around 25 million native speakers of the more than 800 surviving Amerind languages. The vast majority of these speakers live in Central and South America, where language use is vigorous. In Canada and the United States, only about half a million native speakers of an Amerind tongue remain.

Click on a language family to see a linguistic tree of that family and links about the group. Click on a language name to see a description and links about that language, as well as information about the American Indian people who speak it.

Cunning Linguist

Magical Letter Page

2010-07-20 satyr.nl

Magical Letter Page – Linguistic Iconism, Sound Symbolism, Phonosemantics

Sound Symbolism, Phonosemantics, Phonetic Symbolism, Mimologics, Iconism, Cratylus
Ideophones, Synaesthesia
The Alphabet, The Word

Cunning Linguist

Ellogon

2010-07-04 satyr.nl

What is Ellogon?

Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.

Cunning Linguist

Europarl Parallel Corpus

2010-07-03 satyr.nl

Europarl Parallel Corpus

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

The goal of the extraction and processing was to generate sentence aligned text for statistical machine translation systems. For this purpose we extracted matching items and labeled them with corresponding document IDs. Using a preprocessor we identified sentence boundaries. We sentence aligned the data using a tool based on the Church and Gale algorithm.

Cunning Linguist

Porter Stemming Algorithm

2010-07-01 satyr.nl

Porter Stemming Algorithm

The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.

Category: Cunning Linguist