The Video Game Corpus of Text and Speech collects writing and dialogue from video games for usage in linguistic research around games.

What is a corpus?

A linguistic corpus is a collection of written text concerning a certain topic for use as research database. Some corpora are tagged and annotated according to the needs of the researcher compiling it, and some simply consist of raw text data. They can be used to examine linguistic phenomena found in the text passages, to compare sentence structures or word meaning with each other and so on. Depending on the content, some corpora could also be used to research social or psychological aspects.

What kind of corpus is VGCoST?

The Video Game Corpus of Speech and Text is a collection of dialogue and ingame text extracted from video games. It consists of raw text data in .txt files ordered by the game they are from. It is entirely Open Access/Open Science and can therefore be freely used by anyone and everyone.
So far, the corpus only includes English text. While English will stay the main focus of VGCoST, some German files might find their way into the collection if time allows.

No idea how to work with a corpus?

I wrote a little introduction of working with VGCoST in German here. You can find extensive English explanations on a lot of university websites, such as this one from the University of Heidelberg.

Need help or want to contribute?

If you need specific information, help, data or anything connected to the Video Game Corpus of Speech and Text, use the contact form below. I will try to help as best as I can. If you are a game developer and want the text of your game included in the corpus for science, you are the best! You can send me an email to the address on the ‚Site Notice‚ page. All games are welcome, whether independently published or AAA. The only requirement is that the text is English! You can send in German text as well if you have and want to, preferably together with the English version.

Download

Via Dropbox (Version 10/2019)

Datum	Veranstaltung
20.-22.11.2025	Konferenz für Gamedesign & Digitalspielforschung 2025
27.-29.11.2025	AKGWDS-Jubiläumstagung
04.-06.12.2025	Neue Perspektiven auf Intermedialität in Kunst/Literatur
10.-12.12.2025	Kontingenz, Macht und Datenlücken
19.-20.02.2026	Historical Fictions Research Conference

Video Game Corpus of Speech and Text

What is a corpus?

What kind of corpus is VGCoST?

No idea how to work with a corpus?

Need help or want to contribute?

Download

Nächste Game-Studies-Events

Game Studies-Veranstaltungen

Neueste Beiträge

Kategorien