Word of the day:
estimant · nounone who values |
|
This post is based on a short talk I gave at the monthly event Wordhack on January 18, 2024, held at Wonderville in New York City. Some changes were made to suit the format.
One of the major limitations of Lyre's Dictionary that I've wanted to work through for a long time is that it can only create words based on Latin and Greek roots. This, of course, does not reflect the makeup of English overall. In fact, the most common English words of today can be traced back to English's original form, the language called Old English, or to its earlier Germanic roots. Recently I've been working on a new system for making words from Old English roots, and have made some good progress.
First, some background:
Old English refers to the earliest period of the English language, as it was spoken between about the 6th and 12th centuries. This language was distinct from Modern English in many ways, and it both looks and sounds so different that it cannot be readily understood by Modern English speakers.
As an example, consider these first few passages of the Lord's Prayer as it was written in Old English (you can hear it spoken here) with its modern form:
Fæder ūre þū þe eart on heofonum, | Our Father, who art in heaven, |
Sīe þīn nama ġehālgod. | Hallowed be thy name. |
Tōbecume þīn rīċe, | Thy kingdom come, |
Ġeweorðe þīn willa, on eorðan swā swā on heofonum. | Thy will be done on earth as in heaven. |
Although it may appear totally alien at first, with examination, it is possible to see connections between the Old and Modern English words. 'Fæder' becomes 'father', 'heofonum' changes to 'heaven', 'nama' to 'name'.
It's not just that these words happen to look similar – there are more or less well defined changes that transformed the earlier forms of the words into the later ones. And to the extent that these changes can be defined, they can be simulated as well. This is what I have attempted to do in adding Old English to Lyre's Dictionary.
This table shows some output from the new code that I've been writing. It takes in the Old English words on the left, and produces their modern forms on the right:
OE written | OE sounds | ME sounds | Modern form |
---|---|---|---|
bāt | /baːt/ | /bɔːt/ | boat |
heofon | /xeofon/ | /xɛːvən/ | heaven |
ċild | /tʃild/ | /tʃiːld/ | child |
dæġ | /dæj/ | /dai/ | day |
frēond | /freːond/ | /freːnd/ | friend |
nama | /nama/ | /naːm/ | name |
eorðe | /eorθe/ | /erð/ | earth |
bryċġ | /brydʒ/ | /bridʒ/ | bridge |
gōd | /ɣoːd/ | /goːd/ | good |
hlæhhan | /xlæxxan/ | /laux/ | laugh |
stelan | /stelan/ | /stɛːl/ | steal |
This is a three step process, moving from the first column to the fourth.
Start with Old English written forms, generally in their standard form from about the year 900, along with some sound annotations used by modern scholars (these are the dots and lines you see above some letters).
Determine the pronunciation of this word in Old English. This tends to be straightforward, as Old English was usually written the way it was pronounced (and the added sound annotations clarify the ambiguous cases). The table here uses my own rough phonemic representation of the sounds of Old English.
Put this pronunciation through about six hundred years of simulated historical sound change, approximating how it would have sounded in Middle English (a later form of the language) around the late 15th century. The reason we stop there is because, while English spelling is notoriously confusing, part of the reason is because we still use spellings that were developed during this time, which weren't updated as pronunciations changed over the last 500 years.
Determine a modern form for the word, based on observed modern-day spelling conventions.
As the table shows, this procedure works pretty well. It's able to produce the correct spellings of all of these words based only on their Old English forms.
However, this table hides some of the complexity. These next tables show some of the ambiguities and uncertainties in this process:
OE written | OE sounds | ME sounds | Modern form |
---|---|---|---|
bāt | /baːt/ | /bɔːt/ | bote (boat) |
māra | /maːra/ | /mɔːr/ | moar (more) |
ċēowan | /tʃeːowan/ | /tʃiu/ | chue (chew) |
trēwe | /treːwe/ | /triu/ | trew (true) |
Like before, the input to the program is on the left, and the output is on the right. I've also added annotations in red. As you can see, sometimes the program produces unexpected results. But although these spellings aren't the ones we see in actual English today, they seem to be plausible. We might just as well have ended up with 'bote' instead of 'boat' — as far as I can tell, it's a matter of chance. In these cases, I've coded the system to produce one or the other at random.
Pronunciations can also diverge during historical change:
OE written | OE sounds | ME sounds | Modern form |
---|---|---|---|
clyċċan | /klytʃtʃan/ | /klitʃ/ | clitch (clutch) |
hȳdan | /xyːdan/ | /xuːd/ | houd (hide) |
wicu | /wiku/ | /wik/ | wick (week) |
wudu | /wudu/ | /wud/ | wud (wood) |
Cases like these occur because the historical changes and spelling conventions aren't completely predictable: a single sound might change differently in different cases, and standard modern forms might derive from any one of multiple Old English dialects.
And there are some cases I can't explain at all. Either my system doesn't account for some historical processes, or these words simply changed unpredictably:
OE written | OE sounds | ME sounds | Modern form |
---|---|---|---|
camb | /kamb/ | /kamb/ | camb (comb) |
ċīcen | /tʃiːken/ | /tʃiːkən/ | chiken (chicken) |
dūst | /duːst/ | /duːst/ | doust (dust) |
nēdl | /neːdl/ | /nedəl/ | neddle (needle) |
While I may discover more historical rules that explain some of these differences, I'll never be able to write a system that could account for all of the historical happenstances that created all our modern words. Some words will always be mysterious.
So, there is necessarily some ambiguity in this process, and one Old English form may have multiple plausible modern forms. But overall, the program does seem to generally produce plausible output. And once we have this system, we can do something very interesting with it — we can take words that fell out of the language along the way, and see what they might look like if they still existed today:
OE written | OE sounds | ME sounds | Modern form |
---|---|---|---|
drēfan | /dreːfan/ | /dreːv/ | dreeve |
friþ | /friθ/ | /friθ/ | frith |
heolfriġ | /xeolfrij/ | /xelvrəj/ | helvry |
hremman | /xremman/ | /rem/ | rem |
līg | /liːɣ/ | /liːx/ | ligh |
nyten | /nyten/ | /nitən/ | nitten |
racente | /rakente/ | /rakənt/ | rackent |
snytru | /snytru/ | /snitər/ | snitter |
swincan | /swinkan/ | /swink/ | swink |
þild | /θild/ | /θiːld/ | thild |
For example, Old English had a word 'drefan', meaning "to hinder"; if it still existed today, it might look like 'dreeve'. 'snytru', meaning 'wisdom', might be 'snitter'. With this process, we can create lots of new words which the program would have had no way of producing before, and get a much more complete picture of the possibilities latent in English.
There's still some work to do before this is ready. While the basic sound change and spelling components are looking good, it'll take some additional work to be able to support things like affixes and compounds. I also want to do some more testing against real words to make sure that I'm catching as many uncommon cases as possible – my research is still turning up new rules for sound change, and there are some spelling ambiguities that might become clearer with more examples to work from. Maybe the biggest piece of remaining work, I'll also need to create a whole new corpus of Old English roots. This will require a lot of research work, and I'll need to decide how I want to handle alternate forms and dialects.
But, this is very exciting not just because of the promise of creating words from Old English roots, but because it proves that this kind of sound change simulation is possible. In the future, I hope it will be possible to use a similar process for different etymological pathways, such as deriving French forms from Latin, Middle English from Old French, and more.
– Robin, February 2024