Table of Contents
Video/audio editing
-
Praat
- praat.org
- the number 1 speech-editing software in academia
- supports speech measurements, annotation in TextGrids, manipulation, synthesis
- scripting interface supports batch processing
- recommended scripting interface: PraatVSCode
-
ffmpeg
- ffmpeg.org
- a command line tool for batch video processing
- ’lives’ in the terminal (e.g., command prompt)
- Adobe Premiere Pro is a great video-editor when working on individual files, but is not the best solution for batch processing. ffmpeg is great at efficiently and quickly extracting the audio channels from a large set of video files, converting mpg to mp4, manipulating audio/video temporal alignment (asynchrony), etc.
-
MediaPipe
- mediapipe.dev
- 2D video motion-tracking tool in Python, developed by Google
- input: video file of a single person (OpenPose is preferred for multi-person tracking).
- output: x, y, (estimated) z coordinates of body landsmarks + video with superimposed tracking skeleton.
- here’s a great tutorial by Wim Pouw and James Trujillo at Envision Bootcamp.
-
WebMAUS
- https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface/WebMAUSBasic
- forced-alignment tool taking wav files and txt files with orthographic transcripts as input, providing TextGrids as output
- ’lives’ online: you upload wav and txt files and download TextGrids
- large set of languages available
- default: annotations of words and phonemes only
- get syllables too by using our script: forced-alignment.py
-
EasyAlign
- http://latlcui.unige.ch/phonetique/easyalign.php
- forced-alignment tool taking wav files and txt files with orthographic transcripts as input, providing TextGrids as output
- ’lives’ in Praat (plugin)
- French, English, Spanish, Brazilian Portuguese, Taiwan Min
- annotations at word-level, syllable-level and phone-level
Corpora
-
MultiPic
- https://www.bcbl.eu/databases/multipic/
- standardized set of 750 drawings with multilingual name agreement and visual complexity norms
- in color; 300 x 300 pixels; DPI=96 pixels/inch
- Spanish, English (British), German, Italian, French, Dutch (Belgium), Dutch (Netherlands)
- also useful when looking for ‘imageable’ words
-
Severens et al.
- https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/pnn/overview.htm
- timed naming norms for 590 pictures in Belgian Dutch
- black-and-white line drawings
- name agreement, freq, aoa, h-statistic, naming latencies
-
SUBTLEX
- http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-nl
- database of Dutch word frequencies based on 44 million words from film and television subtitles
- also available for other languages, including US English (SUBTLEX-US), UK English (SUBTLEX-UK), Mandarin Chinese (SUBTLEX-CH), Spanish (SUBTLEX-ESP), German (SUBTLEX-DE), Greek (SUBTLEX-GR), Polish (SUBTLEX-PL), Italian (SUBTLEX-IT), Brazilian Portuguese (SUBTLEX-PT-BR)
- reliable predictor of lexical decision reaction times, outperforming Google Books Ngram=1 (Brysbaert et al., 2011)
-
ANW
- Algemeen Nederlands Woordenboek
- Dutch word list, allowing searching with regular expressions (“spraa*”) and with particular word characteristics (number of syllables, stress on syllable n, etc.)
- NOTE. I’ve found that the search lists are not exhaustive. Failure to find certain words in ANW does not necessarily mean they do not exist.
-
Lombard speech corpora
- Acted clear speech corpus: English; 1 male talker; ’normal’ sentences; 25 items; babble-modulated noise; Mayo et al. (2012), doi:10.7488/ds/138.
- Hurricane natural speech corpus: English; 1 male talker; Harvard sentences (720 items) and MRT sentences (300 items); speech-modulated noise; Cooke et al. (2013), doi:10.7488/ds/140.
- DELNN: L1 Dutch and L2 English from 30 native speakers of Dutch (+9 native speakers of US English as control); speech-shaped noise; Marcoux (2022, PhD thesis).
- RaLoCo: Dutch; 78 talkers; 48 sentences; speech-shaped noise; Shen (2022, PhD thesis). Additional info, including human listening effort ratings and HEGP scores (spectral glimpsing metric of intelligibility).
- Also see our very own NiCLS corpus of Lombard speech.
Writing tools
-
Thesaurus
- thesaurus.com for looking up synonyms
- indispensable gizmo when scribbling palimpsests, particularly useful for L2 writers of English like myself
- also gives antonyms, example sentences, and related words
- links to definitions at dictionary.com
-
Zotero
- zotero.org reference manager
- use the Zotero Connector in your browser to store a paper, including fulltext and all bibliographic specs
- use the Zotero Word Plugin to cite papers in a Word document, automatically generating a bibliography at the end of the document
- easily change bibliography styles (from author-year in APA to numbered-lists in IEEE)
- supports open fulltext search, notes and tags, organize in folders, etc.
- preferred over other reference managers because Zotero is institute-independent, free, open-source, and very flexible.
-
Overleaf
- Overleaf Online LaTeX editor
- it’s like Google Docs but then in LaTeX
- collaborative writing, commenting, track changes
Online experimenting
-
Prolific
- Prolific online participant pool
- representative human sample, supporting very precise in/exclusion criteria (gender, language, speech disorders, etc.)
- very fast: data collection typically a matter of days, first participant a matter of hours
-
Gorilla
- Gorilla Experiment Builder
- graphical interface so easy to use, no scripting or code, supports video stimuli, reliable audiovisual synchrony, headphone screening tests, great support, validated reaction times, build for free
- graphical interface is less efficient for tweaking and copy-pasting (compared to code, so prepare for lots of clicking…), pay for data collection per participant (paid tokens), some institutes including RU and MPI have institution licenses (‘free’ tokens) but many do not, whether you can use Gorilla depends to a large extent on what institute you’re at.
-
PsyToolkit
- psytoolkit.org
- free to build, free to run, code-based so efficient, works well with audio stimuli, extensive documentation and how-to’s, headphone screening tests, institute-independent
- code-based so steep learning curve, not great with audiovisual stimuli, little support
- I have also used FindingFive (used to be free, now paid) and Testable (paid but free tokens for new users, has its own online participant pool) but I prefer the tools above.
Mailing lists
Did you know there was life before Twitter? In those days, people shared conference announcements, job opportunities, and new research tools with each other by means of mailing lists. And what’s more: did you know they still exist? Sign up and receive daily/weekly emails with announcements from your peers.
-
amlap-list
- http://www.amlap.org/amlap-list.html
- mostly EU-focused
- associated with the annual AMLAP conference (Architectures and Mechanisms for Language Processing)
-
sentproc
- https://lists.qc.cuny.edu/mailman/listinfo/sentproc
- mostly US-focused
- associated with the annual HSP conference (Human Sentence Processing; what used to be CUNY in the olden days)
-
LinguistList
- https://linguistlist.org/
- international
- forum for linguists in general
-
D-multisensory
- https://mailman.mcmaster.ca/mailman/listinfo/d-multisensory/
- international
- associated with the IMRF conference (International Multisensory Research Forum)
-
sprosig
- Speech Prosody Special Interest Group
- international
- associated with the biannual Speech Prosody conference
-
LOT
- Landelijke Onderzoekschool Taalwetenschap
- not really a mailing list (phony!) but a weekly newsletter by and for the Dutch linguistics community
- associated with the Netherlands National Graduate School of Linguistics that publishes almost every PhD thesis in the field of Linguistics in the Netherlands (except for Donders and MPI theses)