40 Fascinating Lectures for Linguistics Geeks | Online Universities

Linguistics is kind of like The Force — it surrounds us, penetrates us and binds the galaxy together. Or at least the planet, anyway. Both this universality and frequent intersections with a diverse array of subjects — including, but not limited to, cognitive science, literature, politics, psychology, communication, anthropology and more — make linguistics a compelling, dynamic, nuanced study. The following lectures, by no means the only ones available online, represent a lovely little slice of how language permeates all things, for better and for worse.


The Julia Language

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, mostly written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing.


Speech Accent Archive

The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers.


Distinguishing blue from green in language

The English language makes a distinction between blue and green, but some languages do not. Of these, quite a number, mostly in Africa, do not distinguish blue from black either, while there are a handful of languages that do not distinguish blue from black but have a separate term for green. Also, some languages treat light (often greenish) blue and dark blue as separate colors, rather than different variations of blue, while English does not.


Dart : Structured web programming

Dart is a new class-based programming language for creating structured web applications. Developed with the goals of simplicity, efficiency, and scalability, the Dart language combines powerful new language features with familiar language constructs into a clear, readable syntax.

Great, another attempt on a proprietary programming language from Google. Just what the world needs. Sure hope it will go where Google Go did go.


William Cook’s Fusings: Enso Introduction

Structures in Ensō are a specialized kind of graph, whose nodes are either primitive data or collections of observable properties, whose values are either nodes or collections of nodes. From a programming language viewpoint this may seem an odd choice for data representation. However, it is essentially the Entity-Relationship (ER) model, also known as Information Models, which is widely used in the design of relational databases and is also the basis for Class Diagrams in the Unified Modeling Language (UML), which describe the structure of networks of objects. The key point is that structures in Ensō are viewed holistically as graphs, not as individual values or traditional sums-and-products data structures.


What is Ellogon?

Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.


Europarl Parallel Corpus

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

The goal of the extraction and processing was to generate sentence aligned text for statistical machine translation systems. For this purpose we extracted matching items and labeled them with corresponding document IDs. Using a preprocessor we identified sentence boundaries. We sentence aligned the data using a tool based on the Church and Gale algorithm.


semanticvectors

Semantic Vector indexes, created by applying a Random Projection algorithm to term-document matrices created using Apache Lucene. The package was created as part of a project by the University of Pittsburgh Office of Technology Management, to explore the potential for automatically matching related concepts in them technology management domain, e.g., mapping new technologies to potentatially interested licensors.