The Celtic Roots of… Korean

Not meant literally, of course. This post is on the topic of substrate language influence and the search for it. The title and initial stimulus has come from a collection of edited papers, The Celtic Roots of English (2002), recommended to me by Janne Saarikivi who himself is specialized on Finno-Ugrian substrate studies. In the case of that work, too, the title is somewhat misleading in my view, because plant or tree ‘roots’ are an inaccurate metaphor for the non-genetic nature of substrate influence which might be better understood as the earth in which a tree – or language – is rooted. “Celtic (word) roots in English” would be more accurate but is less evocative as a title.

Celtic Roots of Englishc

How to explain the differences between related languages

The notion of genetic affiliation (language families) explains why two or more languages are similar in many regards but not why those same languages are at once so different to one another that they should be classified as distinct languages. The same issue applies to dialects within a language.

Language change

The conventional explanation is that languages naturally change over time as they are passed from generation to generation such that given long enough, the sounds, and occasionally the structure of words (the morphology) will significantly diverge from their point of origin, the proto language. The purest conceptualization is that languages split and evolve in relative isolation resulting in dialects and eventually new languages (but recognized as being within the same family).

Some apparently spontaneous/organic “language change” certainly occurs for any number of reasons, but rarely – if ever – have historical languages evolved in perfect isolation. The other phenomena, then, which cause languages to change stem from “areal contact” with other languages.

In the discussion of language families, the concept of areal contact most often comes up as the explanation for words which appear similar – usually too similar – but are not genetic cognates; they are interpreted instead as loanwords “borrowed” from one language into another; often they are associated with the introduction of new cultural or technological items and so tend to constitute secondary rather than basic vocabulary. Borrowing is generally restricted to words, not grammar, and as they are usually words for “things” the majority tend to be nouns rather than verbs or any other parts of speech; for this reason, even when there is a very large number of borrowings, they do not have much impact on the structure of the language. Korean, like Japanese, is heavily saturated with Chinese “borrowings” (even when they are calques they are derived from Sino-Korean vocabulary), but remains recognizably Korean and otherwise distinct from the Chinese language, both in structure and phonology (sound system).


The other aspect of areal contact relates to the movement and spread of whole languages. Since the last ice age, and particularly in a region such as east Asia, when a language does spread or expand it will have been over territory previously inhabited by speakers of other languages. In this case, the incoming language will invariably acquire “substrate” influences of the earlier languages.

How profound the influences are depends very much on the nature of the groups involved including both relative population ratios and stages of cultural development resulting in differing “prestige” values of the languages, again, relative to one another. In cases of invasion and conquest, most often the conquerors will be outnumbered by the indigenous population; if the new overlords have a social impact but fail to fully impose their language, the indigenous language will likely receive cultural loanwords but not too much further. Even in a maximal example of this, such as the Norman conquest of Anglo-Saxon England, an enormous number of Norman-French loans resulted in the pronounced diglossia (double vocabulary) of English, but structurally and genetically the language remained broadly English and Germanic.

The alternative situation is if the incoming group, even when a relative minority, is able to gradually cause the indigenous population to adopt their language resulting in anywhere between bilingualism (at least short term) to a complete “language shift” by the indigenous population to the incoming language. Under this process, as the incoming language expands, it may be more fundamentally altered – on the level of syntax and morphology – by the “substrate” (the previous indigenous) language influence of its new speakers.

A key property of substrate influence in the theory of language contact is that it has the greater potential to cause structural changes without necessarily influencing vocabulary; in a very simple model this might be considered the result of the indigenous population “mis-learning” the prestige language of the newcomers who are likely to constitute a socially higher class which will not make any effort to learn the natives’ – possibly a defeated enemy’s – language.

Separately, on the lexical (vocabulary) level, substrate languages survive in trace as the residue of local words for which the incoming language had no substitute word of its own or immediate interest to name; these typically include local flora and fauna, geographical features and toponyms (place names).

Thus, although not the only cause of language change, substrates provide one of the more concrete and interesting sources of explanation for some of the differences between genetically related languages and what would seem to give any particular language, or dialect, some of its distinct vocabulary.

An archetypal example of substrate influence accompanying language shift is the spread of Germanic Anglo-Saxon languages over the Celtic speaking Romano-British; in short, some of the structural features of Old English which distinguish it from continental Germanic languages (e.g. being structurally more analytic) are found in Celtic, especially Welsh. The traditional 19th view is that the Celtic languages were obliterated in the regions which became English speaking as the Celtic population was subjugated and marginalized to the northern and western peripheries but both archaeological continuities and toponymic evidence fail to support the total replacement of the Romano-British; rather, it seems, the Anglo-Saxon migrations occurred over a longer period of time (the introduction of the language potentially beginning with Germanic Roman soldiers) and there was more of a synthesis of language and cultures (visible in swirly Anglo-Saxon art and style of poetry) than previously appreciated.

Turning to Korean..

What substrate influence(s) might there be in the Korean language and what evidence of pre-Koreanic languages might remain across the peninsula where only Korean is spoken today?

If we accept the premise that two or more of the kingdoms of the Three Kingdoms period were Koreanic speaking, how might their languages have differed to one another?

It must be a certain that Korean, like all languages spread over a wide previously inhabited area, has significant substrate influences. Because these languages are prehistoric and unattested, however, we don’t know what they are, so it is difficult to frame a trendy academic question like “How Celtic is English?”; instead we have to ask “How non-Korean(ic) is Korean?”

The example of Celtic and English is neat because we have two language families, or at least distinct branches (both being Indo-European) with surviving and historically attested examples (Celtic and Germanic) outside of the contact area in question (England). As a basic method, one can identify where Old English differs from continental Germanic languages and compare those features to the Celtic languages, both insular (Q Celtic) and continental (P Celtic); if there are similarities – particularly with the Celtic languages geographically closest (i.e. Welsh) – these might be considered candidates for substrate influence.

The difficulty with Korean is that there are no surviving attestations of other Koreanic languages or candidate substrate languages with which to compare. Japonic is the only other language we can be reasonably confident was spoken on the Korean peninsula but we do not know the timing of arrival (assuming also that it had continental origins) and exact nature of interaction between the speakers: are there areas where Japonic was a substrate to Koreanic, for example, or Koreanic a substrate to Japonic? Or were they both influenced by an older indigenous substrate language now lost?

In many regards, the latter of those speculations seems the more intriguing prospect. We know that typologically (structurally speaking) Koreanic and Japonic are a part of the broader Altaic Sprachbund, but K and J also share certain features between themselves, in a sense forming their own smaller Sprachbund. Some of these features might be caused by a shared substrate influence.

Turning to dialects..

Although there is no surviving Koreanic language outside of the peninsula, there is internal variation – dialects – within. Difference between the dialects may also provide evidence of substrate influences.

Whilst some, or the greater proportion of dialectal variation may be due to the more spontaneous processes of “language change” associated with geographic isolation, it seems unlikely to be pure coincidence that the known dialectal zones of modern Korean broadly map the positions of the ancient kingdoms: certainly the Jeolla (southwest ‘Honam’) – Gyeongsang (southeast ‘Yeongnam’) divide persists; and elsewhere, for example, east coast Gangwon-do correlates to either “Ye” or “Ye-Maek” territory, perhaps extending north to southern Okjeo; northeastern Hamgyeong-do may correlate to Okjeo; Jeju-do remains particularly distinct as an island, and of course there is Pyeongyang, the former capital of Goguryeo. These dialects may carry echoes both of greater Koreanic variation and, beyond that, non-Koreanic substrates.

Toponymy and local lexicon..

The other area to investigate, as mentioned, are place names and local dialect words. These will not tell much about structural substrate influence, but they rather have the potential to identify the prehistoric “lost” languages.

The basic method to determine non- (or distantly related) Koreanic words is: firstly, if their phonology (the basic sound system from which the word is built) does not match proto- (or as old as can be reconstructed) Koreanic phonology, and secondly if the words cannot be semantically analysed as Korean (i.e. if they don’t carry meaning found in other Korean words and/or do not posses any Koreanic etymology that could be internally reconstructed).

For this to be potentially informative, we ideally need a lot of toponyms and local words which can be mapped; only then is there a chance that through naming patterns the meaning of recurring word parts might be deduced and that they might even be associated with local archaeological sites. The distribution of words may indicate the spread of a substrate language and would be even more telling if they corresponded to the distribution pattern of surviving dialects or historical polities (though they equally may not).

What sources might be used?

To my (possibly inaccurate) knowledge, there is very little literature on Korean dialects in English language whilst most discussion of toponyms relates only to those found in the Samguk-sagi.

In Korean, there are various collections of oral literature, such as produced by the Academy of Korean Studies, compiled during the 1970~80s at a time before standard Seoul dialect had yet to become so utterly pervasive through improved infrastructure and television; if studied, these likely contain toponyms as well as examples of local syntax and word forms.

The first studies of Korean dialects were produced during the Japanese era; they include the 1936 Bang’eon-jip (方言集 Dialect Collection, published by 京城師範學校 [醇和]朝鮮語硏究部 Keijō-shihan-gakkō Chōsen-go Kenkyū-bu) and Ogura Sinpei’s (小倉進平 1882-44) lifetime work Chōsen-go Hōgen no Kenkyū (朝鮮語方言の研究 Research on Korean Dialects, 岩波書店 1944).

Again, from the 1970s onwards at least the SK dialects have been the subject of investigation by Korean linguists; between 1987-95 the Academy of Korean Studies published the 9 volume Han’guk-bang’eon-jaryo-jip (한국방언자료집 Collected Sources [on] Korean Dialects).

The premodern local gazettes and maps, as well as private writings of provincial literati may also be a source of toponyms although there is the obvious challenge that they are mostly authored in Chinese.

If it hasn’t been done thoroughly enough already, it is still not too late to collect local toponyms from the oldest speakers of rural communities.

As a conclusion, I would suggest that whilst the genetic discourse of language families is regularly exploited for purposes of ethnic nationalism, substratum discourse reminds us of shared accumulative heritage across ethnic distinctions and encourages us to look deeper.

References (what I’ve been reading related to this topic)

Filppula, Markku (ed.). 2002. The Celtic Roots of English. Joensuu: Joensuun Yliopistopaino.

Kroonen, Guus. “Non-Indo-European root nouns in Germanic: evidence in support of the Agricultural Substrate Hypothesis” in Grünthal (ed.). 2012. A Linguistic Map of Prehistoric Northern Europe. Mémoires de la Société Finno-Ougrienne 266.

Saarikivi, Janne. 2006. Substrata Uralica: Studies on Finno-Ugrian Substrate in Northern Russian Dialects (PhD dissertation). Tartu University Press.

Tristram (ed.). 2000. The Celtic Englishes II. Heidelberg: Universitätsverlag C. Winter.

강정희 2005. 재주방언 형태 변화 연구 (Research on morphological changes in Jeju dialect). 서울: 도서출판역락(亦樂).

박성종 2008. 강원도 영동지역의 방언 (The dialect of Yeodong region of Gangwon-do province). 서울: 제이앤.