Digital Tolkien Project

— a blog by James Tauber

A scholarly project focused on Tolkien from both a linguistic and digital humanities perspective.

Tolkien Reading Day 2022

26 March 2022 | James Tauber

The last two years I’ve participated in the Tolkien Reading Day sessions organized by Tolkien Collector’s Guide. This year, Jeremy interviewed people on various topics and he invited me and Elise Trudel Cedeño to talk about Digital Humanities and Education.

Given the Tolkien Society’s theme this year of Love and Friendship it was a particular delight to appear with Elise, one of my dearest friends and a longtime collaborator on applying the work of the Digital Tolkien Project to teaching Tolkien to both kids and adults (one of the topics we discuss).…

more ☞

Search Tolkien launched

25 March 2022 | James Tauber

While there is considerable text preparation and internal tooling work being done by the Digital Tolkien Project that can’t be openly shared, I’ve been thinking for a while about public tools that do not violate the copyright holder’s rights. Today I’m happy to launch the first such tool.

I’ve taken the text of The Hobbit, Lord of the Rings, and the Silmarillion all structured according to the project’s citation systems (which meant finishing the system for the Silmarillion — more on that soon) and indexed sequences of up to seven words, folding case and stripping all punctuation and diacritics. I’ve also added the Letters structured just to the individual letter (and no further at the moment).…

more ☞

Counting breakfasts

22 September 2021 | James Tauber

This weekend I’ll be giving another virtual presentation, this time at New England Moot, talking about some food-themed text analysis on Lord of the Rings.

My last New England Moot was in 2019. I attended in person and gave the talk that kicked off my collaboration with Elise Trudel Cedeño and which became the three guest blog posts on her blog (starting here).…

more ☞

Digital Tolkien on Instagram

10 August 2021 | James Tauber

On a fairly regular basis I post charts and visualizations from this project to Twitter and I thought it might be fun to start an Instagram account dedicated to just this.

You can follow at: https://​www​.insta​gram​.com/​d​i​g​i​t​a​l​t​o​l​kien/

At some point I’ll start to make the code and data behind these sorts of things available on this site in a gallery of sorts with more detail and potential reproducibility but, for now, Instagram seems like a fun way to share some of the output.

more ☞

Speaking at Mythmoot VIII

15 June 2021 | James Tauber | James Tauber

I’m very excited to be giving a virtual talk at next week’s Mythmoot on Modeling the multiple dimensions of time in Tolkien’s legendarium.” I’ll be exploring some ideas that have been on my mind from the very beginning of the Digital Tolkien Project.…

more ☞

Education and computers and the Cottage of Lost Play

25 May 2021 | James Tauber

One of the most exciting collaborations I’ve embarked on related to the Digital Tolkien Project is the ongoing educational work with Elise Trudel Cedeño and we had the opportunity to give a talk about it at the recent Prancing Pony Podcast Digital Moot.

Elise is an educator and fellow Signum student (we had much fun in our Beowulf in Old English class together) who produces literacy-focused educational material and teaches online courses for kids, including fantasy book clubs covering the likes of Tolkien and C. S. Lewis.…

more ☞

Tokenizing The Hobbit

14 March 2021 | James Tauber

How many words are there in The Hobbit?

No, not two. But jokes aside (thanks, Iian Neill) we need to immediately be clear if we’re talking about unique words or not.

For example does:

In a hole in the ground there lived a hobbit.

consist of 10 words or only 8 (a, ground, hobbit, hole, in, lived, the, there)?

This distinction is often referred to as type vs token. There are 10 tokens but 8 types (assuming we treat In” and in” as the same word, of course).…

more ☞

Minimal prefixes to identify Hobbit paragraphs

6 March 2021 | James Tauber

The previous blog post introduced a citation system for The Hobbit and linked to an index that showed the first five tokens in each paragraph. How often is five a sufficient number to uniquely identify the paragraph? How often can we get away with less?

These are relevant questions because they go to whether a lookup can be provided that does not give enough text to violate any rights of the copyright holder.…

more ☞

The Hobbit citation system

4 February 2021 | James Tauber

The Digital Tolkien Project now has a paragraph-based citation system for The Hobbit derived directly from the marked-up version of the text and checked against previous work by others.

Back in 2018, I generated an initial paragraph-based citation system for The Hobbit based on my initial markup of the text in discussion with Paul O’Rear. At the time, I did an initial comparison with L. F. S. Alden’s Paragraph Index to The Hobbit but in the last month, I’ve been working with Ugo Truffelli to align my citation system with his own work and we’ve now reached a consensus. In this post, I’ll go through the initial discrepencies between Alden, Tuffelli and myself and discuss how Ugo and I resolved them (sometimes by me changing the XML markup).…

more ☞

permalink 🔗︁
source URL 🌐Digital Tolkien Project | A scholarly project focused on Tolkien from both a corpus linguistic and digital humanities perspective.
date recorded 📅2021-06-27
scribe 🖋worblehat