"One big Donald Trump AIDS"

Jun. 25th, 2017 02:34 pm
Posted by Mark Liberman

As I've observed several times over the years, automatic speech recognition is getting better and better, to the point where some experts can plausibly advance claims of "achieving human parity". It's not hard to create material where humans still win, but in a lot of ordinary-life recordings, the machines do an excellent job.

Just like human listeners, computer ASR algorithms combine "bottom-up" information about the audio with "top-down" information about the context — both the local word-sequence context and various layers of broader context. In general, the machines are more dependent than humans are on the top-down information, in the sense that their performance on (even carefully-pronounced) jabberwocky or word salad is generally rather poor.

But recently I've been noting some cases where an ASR system unexpectedly fails to take account of what seem like some obvious local word-sequence likelihoods. To check my impression that such events are fairly common, I picked a random youtube video from YouTube's welcome page — Bill Maher's 6/23/2017 monologue — and fetched the "auto-generated" closed captions.

Here's an example that combines impressive overall performance with one weird mistake:

5:07 Mitch McConnell says he wants a vote
5:10 before the 4th of July when Trump voters
5:13 traditionally blow their hands off
5:19 oh the fourth of July hey summers here
5:24 boy it was real Beach weather in Phoenix
5:26 the other day did you see that it was
5:28 122 122 plains could not take off hey
5:34 climate deniers
5:36 if melting IceCaps and rising oceans and
5:40 pandemics aren't enough to scare you not
5:42 being able to leave Phoenix that should
5:50 work

I'll give the machine a pass on "summers" instead of "summer's", and we can ignore the issue of "oh" vs. "ah", and forgive the hallucinated "work" at the end — but "plains could not take off"? In Psalm 114:4 the mountains skipped like rams, but not even then did the plains take off.

A bit later:

6:32 but speaking of solar Donald Trump broke
6:36 some news at the rally that the wall you
6:39 know the wall between us and Mexico it's
6:41 going to have solar panels on he said it
6:43 was his idea solar battles okay so the
6:47 wall which is never going to be built
6:49 which Mexico is never going to be paying
6:52 for which now has imaginary so propels
6:56 on because if it's one big Donald Trump
6:59 AIDS it's fake news

So the system got "solar panels" right the first time, but then heard "solar battles" and "so propels". In fairness, Maher kind of garbles the last one into something like "solar pels":

But still, I don't think anyone in the audience heard "so propels".

And then at the end, "if it's one thing Donald Trump hates it's fake news" get turned into "if it's one big Donald Trump AIDS it's fake news":

In that case, I don't hear any acoustic phonetic excuses. And surely "one thing Donald Trump hates" is a priori a more probable word string than "one big Donald Trump AIDS"…

I don't know which generation of ASR Google is using to generate YouTube captions. But it's possible that this sort of thing is an example of the sometimes-peculiar behavior of RNN language models.

Renewal of the race / nation

Jun. 25th, 2017 02:53 am
Posted by Victor Mair

Jamil Anderlini in the Financial Times (6/21/17), "The dark side of China’s national renewal", writes:

To an English-speaking ear, rejuvenation has positive connotations and all nations have the right to rejuvenate themselves through peaceful efforts.

But the official translation of this crucial slogan is deeply misleading. In Chinese it is “Zhonghua minzu weida fuxing” and the important part of the phrase is “Zhonghua minzu” — the “Chinese nation” according to party propaganda. A more accurate, although not perfect, translation would be the “Chinese race”.

That is certainly how it is interpreted in China. The concept technically includes all 56 official ethnicities, including Tibetans, Muslim Uighurs and ethnic Koreans, but is almost universally understood to mean the majority Han ethnic group, who make up more than 90 per cent of the population.

The most interesting thing about Zhonghua minzu is that it very deliberately and specifically incorporates anyone with Chinese blood anywhere in the world, no matter how long ago their ancestors left the Chinese mainland.

“The Chinese race is a big family and feelings of love for the motherland, passion for the homeland, are infused in the blood of every single person with Chinese ancestry,” asserted Chinese premier Li Keqiang in a recent speech.

This is a highly perceptive, and troubling, article that merits reading in its entirety.

In this post, I will focus on some key terms.

First of all, front and center, what is this mínzú 民族?  It can mean lots of things:  nation, nationality, people, ethnic group, race, volk.  This is not the first time that mínzú 民族 has erupted on the international stage.  One of the most notable instances was four years ago, emanating right here from the University of Pennsylvania.  The incident is well recounted by R.L.G. in "Johnson" at The Economist (5/21/13), "Of nations, peoples, countries and mínzú:  Differing terms for ethnicity, citizenship and group belonging ruffle feathers":

DID Joe Biden insult China?  The American vice-president has a habit of sticking his foot into his mouth, and in this case, the recent graduation speech he gave at the University of Pennsylvania inspired a viral rant by a "disappointed" Chinese student at Penn, Zhang Tianpu. What was Mr Biden's sin? Was it Mr Biden's suggestion that creative thought is stifled in China?

You cannot think different in a nation where you cannot breathe free. You cannot think different in a nation where you aren't able to challenge orthodoxy, because change only comes from challenging orthodoxy.

No, that wasn't it.

The source of the insult is a surprising one: Mr Biden called China a "great nation", and a "nation" repeatedly after that. Victor Mair, the resident sinologist at the Language Log blog, translates Mr Zhang's complaint.

In this sentence, "You CANNOT think different in a nation where you aren't able to challenge orthodoxy", he used the word "nation". This is what really infuriated me, because in English "nation" indicates "race, ethnicity", which is different from "country, state". "Country, state" perhaps places more emphasis on the notion of the entirety of the country, even to the point of referring to the idea of government.

Mr Mair explains:

The weakness in Zhang's reasoning lies mainly in his confusion over the multiple meanings of the word mínzú 民族…. [M]ínzú 民族 can mean "ethnic group; race; nationality; people; nation".  Coming from the English side, we must keep in mind that "nation" can be translated into Chinese as guó 国 ("country"), guójiā 国家 ("country"), guódù 国度 ("country; state"), bāng 邦 ("state"), and, yes, mínzú 民族 ("ethnic group; race; nationality; people; nation").

It is clear that, when Biden said "China is a great nation", he was respectfully referring to the country as a whole.  Yet the sensitivity to questions of ethnicity in China, especially with regard to the shǎoshù mínzú 少数民族 ("ethnic / national minorities"), e.g., Uyghurs, Tibetans, and scores of others, caused Zhang to take umbrage over something that the Vice President never intended.

In a later post about smartphone zombies, Cant. dai1tau4 zuk6 / MSM dītóu zú 低頭族 (“head-down tribe”), "Tribes" (3/10/15), I wrote:

The first word I think of when I see 族 as a suffix is Mandarin mínzú, Japanese minzoku 民族 (“nation; nationality; people”), which is formed from 民 (“people; subjects; civilians”) + 族 (“family clan; ethnic group; tribe”).  The term is a neologism coined in the late 19th century by Japanese thinkers to match the Western (especially German) concept of “nation”.

… I have assembled a large amount of material concerning the absence of mínzú / minzoku 民族 as a lexical item corresponding to “nation” in China before it was introduced from Meiji [1868-1912] Japan.

When we prefix mínzú 民族 with shǎoshù 少数 ("few; small number; minority"), we have shǎoshù mínzú 少数民族 ("minority; national minority; ethnic minority").  Here it gets really tricky, because, as Anderlini points out in his article, there are officially 56 ethnic groups (mínzú 民族) in China, of which 55 are shǎoshù mínzú 少数民族 ("minorities; national minorities; ethnic minorities; ethnic groups"), with the 56th being the dominant, majority (over 90%) Hàn mínzú 汉民族 ("Han nationality; Han ethnic group").  Consequently, when Chinese politicians talk about the blood of the Chinese race, it's important to know whether they are are referring to Hàn mínzú 汉民族 ("Han nationality; Han ethnic group"), Zhōnghuá mínzú 中华民族 ("Chinese nation / people", where Zhōnghuá 中华 is understood as "Central cultural florescence"), or something else.  In each case, we need to judge carefully whether they meant to include all the ethnicities within the sovereign territory of the PRC or in the whole world, or whether they were referring specifically to individuals of Han ethnicity within the sovereign territory of the PRC or in the whole world.  Often, for politicians, as for poets, ambiguity is desirable, or at least convenient.

There are no less than half a dozen other words for "(the) people" that are in common use in Mandarin.  I won't go into all of them here, but will mention only one:  rénmín 人民, as in rénmínbì 人民币 ("RMB; people's currency") and Rénmín rìbào 人民日报 ("People's Daily").  This term, rénmín 人民, does not get involved with race, ethnicity, nation, and so on, but emphasizes the population as a whole.

As for "Zhongguo / China", that too is a huge can of worms, for which see this incisive paper by Arif Dirlik:

"Born in Translation: 'China' in the Making of 'Zhongguo'"

[h.t. John Rohsenow, Bill Bishop]

Bruria Kaufman

Jun. 24th, 2017 04:53 pm
Posted by Mark Liberman

The Annual Reviews have a tradition of featuring retrospective articles by or about senior figures, and the Annual Review of Linguistics has followed this pattern with pieces featuring Morris Halle in the 2016 volume and Bill Labov in 2017. For 2018, we'll be featuring Lila Gleitman.

As background, Barbara Partee, Cynthia McLemore and I spent the last couple of days interviewing Lila about her life and work. We've got more than 7.5 hours of recordings, which is more like a book than an article — and it may very well turn into a book as well, with edited interview material interspersed with reprints of Lila's papers. But what I want to post about today is one of the many things that I learned in the course of the discussions. This was just a footnote in Lila's life story, but it has its own intrinsic interest, and I'm hoping that some readers will be able to provide more information.

I learned that the founder of the Penn Linguistics Department, Zellig Harris, was married to a mathematical physicist named Bruria Kaufman. She worked with John von Neumann, wrote some widely-cited papers on crystal statistics in the late 1940s, published with Albert Einstein (Albert Einstein and Bruria Kaufman. "A new form of the general relativistic field equations", Annals of Mathematics, 1955), and later wrote papers like "Unitary symmetry of oscillators and the Talmi transformation", Journal of Mathematical Physics 1965, and "Special functions of mathematical physics from the viewpoint of Lie algebra", Journal of Mathematical Physics 1966.

The thing that interested me most was that Bruria Kaufman also worked for a while in the 1950s with Harris at Penn, at the same time as others including Lila Gleitman, Aravind Joshi, R.B. Lees, Naomi Sager, Zeno Vendler, and Noam Chomsky. And according to this 1961 NSF report, her contributions included Transformations and Discourse Analysis Papers (TDAP) numbers 19 and 20:

19. Higher-order Substrings and Well-formedness, Bruria Kaufman.
20. Iterative Computation of String Nesting (Fortran Code), Bruria Kaufman.

I've found a couple of citations to these works, but so far not the works themselves.

The 1961 NSF report says that

Paper 15 gives an information [sic — should be informal?] presentation of a general theory and method for syntactic recognition. Papers 16-19 give the actual flow charts of each section of the syntactic analysis program.

where 15-19 are

15. Computable Syntactic Analysis, Zellig S. Harris. (Revised version published as PoFL I, above)
16. Word and Word-Complex Dictionaries, Lila Gleitman.
17. Elimination of Alternative Classifications, Naomi Sager.
18. Recognition of Local Substrings, Aravind K. Joshi.
19. Higher-order Substrings and Well-formedness, Bruria Kaufman.

and "PoFL I" is Harris's String Analysis and Sentence Structure, 1962.

Aravind Joshi and Phil Hopely, "A parser from antiquity", Natural Language Engineering 1996, explains that

A parsing program was designed and implemented at the University of Pennsylvania during the period from June 1958 to July 1959. This program was part of the Transformations and Discourse Analysis Project (TDAP) directed by Zellig S. Harris. The techniques used in this program, besides being influenced by the particular linguistic theory, arose out of the need to deal with the extremely limited computational resources available at that time. The program was essentially a cascade of finite state transducers (FSTs).

More on the history from that source:

The original program was implemented in the assembly language on Univac 1, a single user machine. The machine had acoustic (mercury) delay line memory of 1000 words. Each word was 12 characters/digits, each character/digit was 6 bits. Lila Gleitman, Aravind Joshi, Bruria Kauffman, and Naomi Sager and a little later, Carol Chomsky were involved in the development and implementation of this program. A brief description of the program appears in Joshi 1961 and a somewhat generalized description of the grammar appears in Harris 1962.  This program is the precursor of the string grammar program of Naomi Sager at NYU, leading up to the current parsers of Ralph Grishman (NYU) and Lynette Hirschman (formerly at UNISYS, now at Mitre Corporation). Carol Chomsky took the program to MIT and it was used in the question-answer program of Green, BASEBALL (1961). At Penn, it led to a program for transformational analysis (kernels and transformations) (1963) and, in many ways, influenced the formal work on string adjunction (1972) and later tree-adjunction (1975).

The paper's bibliography cites

Transformations and Discourse Analysis Project (TDAP) Reports, University of Pennsylvania, Reports #15 through #19, 1959-60. Available in the Library of the National Institute of Science and Technology (NIST) (formerly known as the National Bureau of Standards (NBS)), Bethesda, MD.

So I'll ask my friends at NIST if these works are still there.


Chinglish with tones

Jun. 23rd, 2017 07:57 pm
Posted by Victor Mair

4th tone – 3rd tone, it would appear:

Well, maybe not; the diacritics are probably meant to indicate vowel quality, but I don't know what system (if any) they are using.

Ben Zimmer writes:

The diacritics may be intended to evoke pinyin tone marks, but they're also reminiscent of dictionary-style phonetic respelling and stress marking. The grave accent on "ì" could be intended as an indicator of primary stress, though that's more typically marked with an acute accent. And the breve on the "ĭ" is a common enough way to represent /ɪ/ (the macron is used for long vowels and the breve for short vowels — see, e.g., Phonics on the Web). But this use of diacritics as typographical ornamentation is never very consistent — recall the styling of the play Chinglish as "Ch’ing·lish”.

The illustration appears at the top of this article:

It turns out that the image used by the People's Daily originally appeared as a promotion for the play Chinglish that Ben mentioned, specifically for its performance by the Singaporean theater company Pangdemonium in 2015. See the Pangdemonium website, as well as local coverage by PopSpoken and Today. So the People's Daily may have searched for a "Chinglish" image online and borrowed this one, without giving proper credit. (Credit should go to Olivier Henry of MILK Photographie.)

The six individuals in the picture seem to be aspiring to some idealized form of Chinglish in the sky above, overlying the cloud shrouded five star design of the Chinese flag, leading them on.  The thrust of the People's Daily article, however, is anything but adulatory of Chinglish:

Chinese authorities on June 20 issued a national standard for the use of English in the public domain, eradicating poor translations that damage the country’s image.

The standard, jointly issued by China’s Standardization Administration and General Administration of Quality Supervision, Inspection and Quarantine, aims to improve the quality of English translations in 13 public arenas, including transportation, entertainment, medicine and financial services. It will take effect on Dec. 1, 2017.

According to the standard, English translations should prioritize correct grammar and a proper register, while rare expressions and vocabulary words should be avoided. The standard requires that English not be overused in public sectors, and that translations not contain content that damages the images of China or other countries. Discriminatory and hurtful words have also been banned. The standard provided sample translations for reference, and warned against direct translation.

There are perpetual plans for eliminating Chinglish in China, but they are unlikely ever to materialize unless professional translators are sought after for their expertise and paid accordingly.

Earlier calls for the elimination of English more generally are no longer heard from responsible persons:

Now the goal is more reasonably just to get rid of Chinglish, but that will not happen on December 1, 2017 when the new standards go into effect.  Although it will take many years for their full implementation and realization, the standards are admirable goals to aim for.

See also:

[h.t. Jim Fanell, Toni Tan]

Ask Language Log: "assuage"

Jun. 23rd, 2017 11:41 am
[syndicated profile] languagelog_feed

Posted by Mark Liberman

Query from a reader:

Is it correct to use the word assuage to indicate a lessening of something? That is, it is often used in the realm of feelings, i.e. assuage hunger, assuage grief, etc. But would it be acceptable to use to indicate the lessening of something more tangible, such as assuage criminality, assuage the flow of water, assuage drug use.

I probably wouldn't use assuage to describe the lowering of flood waters or the amelioration of traffic jams. But I don't have any special standing to rule on such matters, so as usual, let's look at how others use the word.

The OED's entry for assuage, which is flagged as "not yet … fully updated (first published 1885)", has several senses marked as "arch. or Obs." that don't involve "angry or excited feelings", or beings in such a state.

There's the transitive form glossed "To abate, lessen, diminish (esp. anything swollen)", with examples like

1774   J. Bryant New Syst. II. 284   The Dove..brought the first tidings that the waters of the deep were asswaged.

There's the intransitive inchoative version of the same, glossed "To grow less, diminish, decrease, fall off, die away; to abate, subside", with examples like

1611   Bible (King James) Gen. viii. 1   And the waters asswaged .

COCA has 509 instances of "assuage", 134 of "assuaged", 46 of "assuaging", and 17 of "assuages". Looking at a random sample of 100, we find that all 100 are transitive, and that in 98 of them, what's assuaged is an negatively-evaluated emotion or feeling or concern ("the community's grief", "his guilt", "such mortal concerns", "the twitchy sensation in my cells", "white opposition to slave conversion", "my hunger", "Democratic anxieties", "India's complaints", "feelings of humiliation", the monarch's fears", "his own damaged pride", "the egos of movie stars", "my curiosity", …), or an person or group of people subject to such emotions or feelings or concerns ("his uneasy party", "the academic intellectual community", "the larger man", "international critics of the war", "his jittery passenger", "the chiefs", "the dealers", …).

The two exceptions in the sample are these:

In The Efficiency Trap, Steve Hallett claims that we will exhaust many of our resources by the 2030s, and violence and chaos will erupt as a result. Hallett proposes recycling and growing food locally as possible means of assuaging the damage.

The measure, which awaits Senate approval of a minor amendment next week, can not assuage the impending disaster that will kill virtually all the fish in the Dolores River this summer.

With respect to the specific examples in the query, Google finds

"assuage criminality": one example [link] Please reconsider your gig – don't play for a segregated audience in Israel and make of yourself a balm to assuage criminality.

"assuage the flow of water": no examples (though see biblical examples cited by the OED)

assuage drug use: one example [link] Becker's neoliberal drug policy presumes to assuage drug use and addiction by the instantiation of a highly regulated market as a system of control.

So the verdict of norma loquendi seems to be that applying assuage to things other than people and their feelings is out of fashion and currently marginal.


Posted by Victor Mair

My own investigations on the Bronze Age and Early Iron Age peoples of Eastern Central Asia (ECA) began essentially as a genetics cum linguistics project back in the early 90s.  That was not long after the extraction of mtDNA (mitochondrial DNA) from ancient human tissues and its amplification by means of PCR (polymerase chain reaction) became possible.

By the mid-90s I had grown somewhat disenchanted with ancient DNA (aDNA) studies because the data were insufficient to determine the origins and affiliations of various early groups with satisfactory precision, neither spatially nor temporally.  Around the same time, I began to realize that other types of materials, such as textiles and metals, provided powerful diagnostic evidence.

By the late 90s, combining findings from all of these fields and others, I was willing to advance the hypothesis that some of the mummies of ECA, especially the earliest ones dating to around 1800 BC, may have spoken a pre-proto-form of Tocharian when they were alive (some people think it's funny or scary to imagine that mummies once could speak).  This hypothesis was presented at an international conference held at the University of Pennsylvania in April, 1996, which was attended by more than a hundred archeologists, linguists, geneticists, physical anthropologists, textile specialists, metallurgists, geographers, climatologists, historians, mythologists, and ethnologists — including more than half a dozen of the world's most distinguished Tocharianists.  It was most decidedly a multidisciplinary conference before it became fashionable to call academic endeavors by such terms (see " Xdisciplinary" [6/14/17]).  The papers from the conference were collected in this publication:

Victor H. Mair, The Bronze Age and Early Iron Age Peoples of Eastern Central Asia (Washington, D.C.: Institute for the Study of Man Inc. in collaboration with the University of Pennsylvania Museum Publications, 1998).  2 vols.

See also:

J. P. Mallory and Victor H. Mair, The Tarim Mummies: Ancient China and the Mystery of the Earliest Peoples from the West. (2000). Thames & Hudson. London.

"Early Indo-Europeans in Xinjiang" (11/19/08)

It is only very recently, within the last ten years or so, that Y-chromosome analysis has been brought into play for the study of ancient DNA.  See Toomas Kivisild, "The study of human Y chromosome variation through ancient DNA", Human Genetics, 2017; 136(5): 529–546; published online 2017 Mar 4. doi:  10.1007/s00439-017-1773-z.*  Since only males carry the Y-chromosome, this has made it possible to trace the patriline of individuals.  This, coupled with the massive accumulation and detailed analysis of modern DNA with increasing sophistication and the rise of the interdisciplinary (!) field referred to as genomics, has made studies on the genetics of premodern people, including their origins, migrations, and affinities, far more exacting than it was during the 90s when I did the bulk of my investigations on the early inhabitants of the Tarim Basin.

Now it is possible to draw on the results of genetics research to frame and more reliably solve questions about the development of languages from their homeland to the far-flung places where they subsequently came to be spoken.  One such inquiry is described in this article:

Tony Joseph, "How genetics is settling the Aryan migration debate", The Hindu (6/16/17).

It is significant that this substantial article appeared in The Hindu, since there is a strong bias against such conclusions among Indian nationalists (see "Indigenous Aryans").  It begins thus:

New DNA evidence is solving the most fought-over question in Indian history. And you will be surprised at how sure-footed the answer is, writes Tony Joseph

The thorniest, most fought-over question in Indian history is slowly but surely getting answered: did Indo-European language speakers, who called themselves Aryans, stream into India sometime around 2,000 BC – 1,500 BC when the Indus Valley civilisation came to an end, bringing with them Sanskrit and a distinctive set of cultural practices? Genetic research based on an avalanche of new DNA evidence is making scientists around the world converge on an unambiguous answer: yes, they did.

Joseph's paper is informed, sensitive, balanced, and nuanced.  This is responsible science journalism.

The scientific paper itself, “A Genetic Chronology for the Indian Subcontinent Points to Heavily Sex-biased Dispersals” by Marina Silva, Marisa Oliveira, Daniel Vieira, Andreia Brandão, Teresa Rito, Joana B. Pereira, Ross M. Fraser, Bob Hudson, Francesca Gandini, Ceiridwen Edwards, Maria Pala, John Koch, James F. Wilson, Luísa Pereira, Martin B. Richards, and Pedro Soares, was published in BMC Evolutionary Biology (3/23/17) ( DOI: 10.1186/s12862-017-0936-9).

I'm skeptical of many of the claims put forward by geneticists concerning origins and dispersals, not just about humans, but also about horses, dogs, cats, plants, and so forth.  This study, however, is both cautious and solid.  Moreover, it fits well with the archeological evidence (more below).

Here are two key paragraphs from the scientific paper (numbers in square brackets are to accessible references):

Although some have argued for co-dispersal of the Indo-Aryan languages with the earliest Neolithic from the Fertile Crescent [88, 89], others have argued that, if any language family dispersed with the Neolithic into South Asia, it was more likely to have been the Dravidian family now spoken across much of central and southern India [12]. Moreover, despite a largely imported suite of Near Eastern domesticates, there was also an indigenous component at Mehrgarh, including zebu cattle [85, 86, 90]. The more widely accepted “Steppe hypothesis” [91, 92] for the origins of Indo-European has recently received powerful support from aDNA evidence. Genome-wide, Y-chromosome and mtDNA analyses all suggest Late Neolithic dispersals into Europe, potentially originating amongst Indo-European-speaking Yamnaya pastoralists that arose in the Pontic-Caspian Steppe by ~5 ka, with expansions east and later south into Central Asia in the Bronze Age [53, 76, 93, 94, 95]. Given the difficulties with deriving the European Corded Ware directly from the Yamnaya [96], a plausible alternative (yet to be directly tested with genetic evidence) is an earlier Steppe origin amongst Copper Age Khavlyn, Srednij Stog and Skelya pastoralists, ~7-5.5 ka, with an infiltration of southeast European Chalcolithic Tripolye communities ~6.4 ka, giving rise to both the Corded Ware and Yamnaya when it broke up ~5.4 ka [12].

An influx of such migrants into South Asia would likely have contributed to the CHG component in the GW [VHM:  genome-wide] analysis found across the Subcontinent, as this is seen at a high rate amongst samples from the putative Yamnaya source pool and descendant Central Asian Bronze Age groups. Archaeological evidence suggests that Middle Bronze Age Andronovo descendants of the Early Bronze Age horse-based, pastoralist and chariot-using Sintashta culture, located in the grasslands and river valleys to the east of the Southern Ural Mountains and likely speaking a proto-Indo-Iranian language, probably expanded east and south into Central Asia by ~3.8 ka. Andronovo groups, and potentially Sintashta groups before them, are thought to have infiltrated and dominated the soma-using Bactrian Margiana Archaeological Complex (BMAC) in Turkmenistan/northern Afghanistan by 3.5 ka and possibly as early as 4 ka. The BMAC came into contact with the Indus Valley civilisation in Baluchistan from ~4 ka onwards, around the beginning of the Indus Valley decline, with pastoralist dominated groups dispersing further into South Asia by ~3.5 ka, as well as westwards across northern Iran into Syria (which came under the sway of the Indo-Iranian-speaking Mitanni) and Anatolia [12, 95, 97, 98].

The spread of R1a into South Asia had earlier been securely documented in Peter A. Underhill, et al., "The phylogenetic and geographic structure of Y-chromosome haplogroup R1a", European Journal of Human Genetics (2015) 23, 124–131; doi:10.1038/ejhg.2014.50; published online 26 March 2014.

The precise coalescence of R1a within South Asia was identified in Monika Karmin, et al., "A recent bottleneck of Y chromosome diversity coincides with a global change in culture", Genome Research (2015);

This kind of male migration theory is proposed with arguments based on archeological evidence in the last pages of H.-P. Francfort, “La civilisation de l'Oxus et les Indo-Iraniens et Indo-Aryens”, in: Aryas, Aryens et Iraniens en Asie Centrale (Collège de France. Publications de l'Institut de Civilisation Indienne, vol. 72), G. Fussman, J. Kellens, H.-P. Francfort, et X. Tremblay (eds.) (Paris:  Diffusion de Boccard, 2005) pp. 253-328.  The complete paper is on academia website.

Michael Witzel has favored this, the (Indo-)Aryan Migration view, on linguistic and textual grounds since at least 1995 and was constantly criticized for saying so. See his papers of 1995, 2001:

"Autochthonous Aryans? The Evidence from Old Indian and Iranian Texts."  EJVS (May 2001) pdf.

"Early Indian History: Linguistic and Textual Parameters."  In: Language, Material Culture and Ethnicity: The Indo-Aryans of Ancient South Asia. Ed. G. Erdosy (Berlin/New York: de Gruyter 1995), 85-125; —  Rgvedic history: poets, chieftains and politics, loc. cit. 307-352 combined pdf (uncorrected).

and the substrate paper of 1999:

"Early Sources for South Asian Substrate Languages." Mother Tongue (1999, extra number) pdf

Some relevant Language Log posts:

"Dating Indo-European" (12/10/03)

"The Linguistic Diversity of Aboriginal Europe" (1/6/09)

"Horse and wheel in the early history of Indo-European" (1/10/09)

"More on IE wheels and horses " (1/10/09)

"Inheritance versus lexical borrowing: a case with decisive sound-change evidence" (1/13/09)

"The place and time of Proto-Indo-European: Another round" (8/24/12)

"Irish DNA and Indo-European origins" (12/31/15)

*For those who are interested in the development of aDNA Y-chromosome studies beginning in the 2000s, I have some additional documentation and several relevant papers that I can send to you.

[Thanks to Richard Villems, Toomas Kivisild, and Peter Underhill]

My summer

Jun. 22nd, 2017 11:37 am
[syndicated profile] languagelog_feed

Posted by Mark Liberman

.. or at least six weeks of it, will be spent at the 2017 Jelinek Summer Workshop on Speech and Language Technology (JSALT) at CMU in Pittsburgh. As the link explains, this

… is a continuation of the Johns Hopkins University CLSP summer workshop series from 1995-2016. It consists of a two-week summer school, followed by a six-week workshop. Notable researchers and students come together to collaborate on selected research topics. The Workshop is named after the late Fred Jelinek, its former director and head of the Center for Speech and Language Processing.

I took part in the first of these annual summer workshops, back in 1995, as a member of the team focused on "Language Modeling for Conversational Speech Recognition".

This summer, I'll be part of a group whose theme is described as "Enhancement and Analysis of Conversational Speech".

One of the group's goals is to do a better job of "diarization", i.e. keeping track of who spoke when in conversations. Existing systems do an especially bad job with overlapping speech, which can be extremely common.

Here's a graphical representation of (accurate) diarization in a (real) conversation between Red and Blue:

And the same thing continued for a while (though not to the end of the conversation):

As discussed here, turn-taking overlaps are often cooperative rather than competitive — and it would be good to be able to supplement robust diarization with a functional analysis of conversational flow.

As the workshop progresses, I'll post some updates.


Paul Zukofsky

Jun. 21st, 2017 11:45 pm
Posted by Mark Liberman

This strikes me as an unusual obituary: Margalit Fox, "Paul Zukofsky, Prodigy Who Became, Uneasily, a Virtuoso Violinist, Dies at 73", NYT 6/20/2017. It massively violates the precept de mortuis nil nisi bonum, describing its subject at great length as an "automaton" who was "deeply ill at ease with world"; an "arch-bridge troll", full of "unbridled hubris", "disdain for those less gifted than he", and "an ample sense of self-worth"; "swift to run to judgment", "meanspirited, sarcastic, rather bitter"; someone who would "look at [his audience] with utter contempt", and on and on.

Margalit Fox certainly found plenty of sources for these judgments. But this litany of bitter score-settling is completely at odds with my own experience of Paul Zukofsky.

I first met Paul around 1976, when I was employed at Bell Labs in Murray Hill NJ, and he was the music director of the Colonial Symphony in Madison, a few miles west. He was planning to present Bach's Fourth Brandenburg Concerto, and he needed a continuo player. I owned a harpsichord, had once taken a conservatory course in figured bass realization, and occasionally performed with professional and semi-professional chamber groups in the area, so Joan Miller recommended me to him.

Paul was then teaching at Stony Brook, so I treked out there to audition. That was an amazing experience — while I played the continuo part, Paul, with an occasional glance at the score, played the parts of all three soloists and the rest of the orchestra all at once on the violin. It was amazing. I had never seen anything like it. I managed keep my jaw off the floor, and made my way through the audition well enough to get the part.

This situation was inherently intimidating, and my own musical gifts were far below Paul's. But he was charming and friendly, interested in talking about Bach's music, and about music theory and the psychology of music, and he left me with a positive feeling about the whole experience.

For a while around that time, Paul became a regular visitor at Bell Labs, where he contributed to some interesting work, including these publications:

Ronald Knoll, Saul Sternberg, and Paul Zukofsky, "Subdivision of the beat: Estimation and production of time ratio by skilled musicians", JASA 1976.
Mark Liberman, Joseph Olive, and Paul Zukofsky, "Studies of metric patterns", JASA 1977.
Saul Sternberg, Ronald Knoll, and Paul Zukofsky, "Timing by Skilled Musicians", in Diana Deutsch, Ed., Psychology of Music, 1982.

Throughout those interactions, I never met the cold, mean, unpleasant man depicted in the NYT obituary. On the contrary, Paul was always smart, engaged, friendly, and even convivial.

Maybe I have a thicker skin than the people who supplied Margalit Fox with so much bile. Or maybe Paul was different in later life than he was when I knew him.

But looking over the obituary, I see two other factors that might be relevant. One is Paul's role as executor of his father's estate — that's a side of him that I never saw, and one that would not have been relevant before Louis Zukofsky died in 1978, which was after most of my interactions with Paul.

And the other factor might be his apparent reluctance to take up the standard role of a violin virtuoso, or at least to limit himself to playing that part. Perhaps he saw me and others at Bell Labs as part of his self-liberation from that role, rather than as part of the world that he needed to escape, and perhaps he therefore interacted differently with us.

Still, I have a feeling that most people could be unlucky enough to be treated to an obituary like the one under discussion. The recipe is clear:  find people with a grudge, people on the other side of arguments, people who were offended on purpose or by accident, people who were disappointed, people with relevant prejudices, and select your quotes to play up the negatives and minimize the positives. The Paul Zukofsky I knew deserves better.

Update — a letter sent by Saul Sternberg to the New York Times:

I believe that this obituary gives a false impression of Zukofsky's personality.  The only indication that he could be a sweet, loving, caring person is the one quote (Kalish) "to those who understood him deeply…"  If you look at the comments on slippedisc.com/2017/06/death-of-an-important-american-violinist-73/ you'll find many who loved him, and some for whom he was a kind and caring mentor. Surely they didn't all "understand him deeply".

It is as if, rather than providing a balanced description, the writer emphasized those aspects of his personality that would fit with her beliefs about his early life and her claims about his "emotional development" having been "sacrificed to professional prowess".

I've known Paul Zukofsky for the past forty years, and although the names of many people have come up in our conversations and correspondence, I've seen no evidence of "his disdain for people less gifted than he".

Also, the obit fails to mention the existence of the Zukofsky Quartet, named in his honor.

Update #2 — from Joshua Gordon:

It was good to read your commentary on the NYBTimes obituary for Paul Zukofsky, and I am sympathetic to your experience with him (he was an important mentor to me at Juilliard and beyond). I posted a new Facebook page for anybody who wants to share thoughts or materials on him called "In Memory of Paul Zukofsky", I hope you'll want to contribute to it.

Faimly Lfie

Jun. 21st, 2017 05:07 pm
Posted by Mark Seidenberg

When the parents are psycholinguists, the children get exposed to some weird stuff.

For example, the Stroop effect (words interfere with naming colors, e.g. GREEN RED BLUE) makes a great 4th grade science project; 9 year olds think it’s hilarious. There are lots of fun versions of the task (e.g., SKY FROG APPLE) but prudence dictates avoiding this variant in which taboo words like FUCK COCK PUSSY produced greater interference than neutral words like FLEW COST PASTA (p < .01).

Or, the kid knows that “I see that the clothes on the floor in your room have risen a couple of feet above sea level” means “clean up the mess, please” but also that this is an indirect speech act because the form of the utterance (an assertion) differs from its communicative intent (a request).  Thus enabling exchanges such as “Can you take out the garbage???”  “Is that an indirect speech act?”

I confess that we have actually had dinner conversations about the Transposed Letter Effect, the finding that with brief exposure, subjects frequently misperceive a stimulus such as ODRER as ORDER.  It happens in real life, as in the sign on the left and the company logo (French Connection UK) on the right.


The explanation for the effect is interesting–well, I thought it was–having to do with statistical properties of English orthography and the fact that ODRER is closer to ORDER than to any other word. There's a simple demo of the phenomenon here, if you're interested.

For birthdays and other events greeting cards with terrible wordplay are just low hanging fruit. A recent Father’s Day card connected to my interest in accent and dialect:

I’m told that this is an old joke, popular in summer camp several years ago. Still, it does manage to alert a person to how much American regional accents can differ.  On this occasion I psycholinguisplained that it also illustrates the folly of trying to reform English spelling to make the correspondences between spellings and pronunciations consistent, as in Finnish and most other alphabetic writing systems: there would need to be different spellings for each regional pronunciation of even simple words like NICE.

On the inside the card said I was “Rat nas”.  Which is what you call kids who participate in dinner conversations about speech acts and spelling reform.


Posted by Mark Liberman

Adrienne LaFrance, "What an AI's Non-Human Language Actually Looks Like", The Atlantic 6/20/2017:

Something unexpected happened recently at the Facebook Artificial Intelligence Research lab. Researchers who had been training bots to negotiate with one another realized that the bots, left to their own devices, started communicating in a non-human language.  […]

What does this language actually look like? Here’s an example of one of the bot negotiations that Facebook observed:

Bob: i can i i everything else . . . . . . . . . . . . . .
Alice: balls have zero to me to me to me to me to me to me to me to me to
Bob: you i everything else . . . . . . . . . . . . . .
Alice: balls have a ball to me to me to me to me to me to me to me
Bob: i i can i i i everything else . . . . . . . . . . . . . .
Alice: balls have a ball to me to me to me to me to me to me to me
Bob: i . . . . . . . . . . . . . . . . . . .
Alice: balls have zero to me to me to me to me to me to me to me to me to
Bob: you i i i i i everything else . . . . . . . . . . . . . .
Alice: balls have 0 to me to me to me to me to me to me to me to me to
Bob: you i i i everything else . . . . . . . . . . . . . .
Alice: balls have zero to me to me to me to me to me to me to me to me to

Not only does this appear to be nonsense, but the bots don’t really seem to be getting anywhere in the negotiation. Alice isn’t budging from her original position, anyway. The weird thing is, Facebook’s data shows that conversations like this sometimes still led to successful negotiations between the bots in the end, a spokesperson from the AI lab told me. (In other cases, researchers adjusted their model and the bots would develop bad strategies for negotiating—even if their conversation remained interpretable by human standards.) 

This is strikingly reminiscent of Google Translate's responses to certain sorts of nonsensical inputs — and for good reasons. See

What a tangled web they weave“, 4/15/2017
A long short-term memory of Gertrude Stein“, 4/16/2017
Electric sheep“, 4/18/2017
The sphere of the sphere is the sphere of the sphere“, 4/22/2017
I have gone into my own way“, 4/27/2017
Your gigantic crocodile!“, 4/28/2017
More deep translation arcana“, 4/30/2017

The article's author wrote to me this morning to ask some sensible questions, to which I tried to give sensible answers, most of which are quoted — you should read the whole thing. But if you're in a hurry, here are her questions and my answers:

1. Does that truly count as language?

We have to start by admitting that it's not up to linguists to decide how the word "language" can be used, though linguists certainly have opinions and arguments about the nature of human languages, and the boundaries of that natural class.

Are vernacular languages really capital-L languages, rather than just imperfect approximations to elite languages? All linguists would agree that they are. Are sign languages really Languages rather than just ways to use mime to communicate? Again, everyone agrees that they are. Is "body language" really Language in the same sense? Most linguists would say"no", even if they continue to use the term "body language", on the grounds that the gestural dimensions of human spoken communication are different in crucial ways from the core systems of human Language. What about computer languages? Again, it's clear that Python and JavaScript are not Languages in the sense that English and Japanese are, but we go on calling them "computer languages".

So let's divide your question in two:

1a. Is it reasonable to use the ordinary-language word "language" to describe the system that the Facebook chatbots apparently evolved?

Answer: Apparently so. After all, we use that word to describe the ones and zeros of "machine language", which is usually generated by compilers and assemblers for controlling digital hardware, without any humans involved in the process. Though my prediction would be that the Facebook chatbot's communication process is pretty ephemeral, in the sense that it's a sort of PR stunt built on an experimental accident, and in a few years it won't exist even in the sense of having descendants connected by a direct evolutionary chain.

1b. Is the Facebook chatbot's evolved version of English ("Facebotlish"?) like a new kind of human language, say a future version of English?

Answer: Probably not, though there's not enough information available to tell. In the first place, it's entirely text-based, while human languages are all basically spoken (or gestured), with text being an artificial overlay. And beyond that, it's unclear that this process yields a system with the kind of word, phrase, and sentence structures characteristic of human languages.

2. Will machines eventually change the definition of "language" as we know it?

Well, there's already the well-establish concept of "computer language", which merits a new word sense — e.g. the OED's sense 1.d. "Computing. Any of numerous systems of precisely defined symbols and rules devised for writing programs or representing instructions and data that can be processed and executed by a computer." And note that some of these systems are specifically devised to be written by computer programs and not by people.

3. Have they already?

In the above sense, yes.

But you should keep in mind that the Facebook chatbots, whatever their performance in specified tasks, are almost certainly not "intelligent" in the general sense, or even the leading edge of a durable approach to digital problem-solving. See e.g. here or here

The "expert systems" style of AI programs of the 1970s are at best a historical curiosity now, like the clockwork automata of the 17th century. We can be pretty sure that in a few decades, today's machine-learning AI will seem equally quaint.

It's easy to set up artificial worlds full of algorithmic entities with communications procedures that evolve through a combination of random drift, social convergence, and optimizing selection — just as it's easy to build a clockwork figurine that plays the clavier.

Are those Facebook chatbots an example of this? Apparently.

Could it be true that at some point digital self-organizing systems will become capable enough to develop their own inter-system communications procedures that evolve over a period of decades or centuries, rather than being scrapped and replaced by human developers starting over again from scratch every few years?

Sure. But are the Facebook chatbots the leading edge of this process?

I seriously doubt it. Are the bots themselves, and their evolved communication procedures, likely to be around in any directly derived form ten years from now?

I'm willing to bet a substantial sum that the answer is "no".

I probably should have written "…clockwork automata of the 18th century" — I misdated the androids in Adelheid Voskuhl's work (e.g. Androids in the Enlightenment) by assimilating it to Descartes' discussion of whether animals are automata.


Posted by Victor Mair

I say "in Taiwan", because this word, 阿沙力, is both in Taiwan Mandarin, where it is pronounced āshālì, and in Taiwanese, where it is pronounced at3sa55lih3.

This is a very common expression in Taiwan, where it is used as the name of restaurants, for instant noodles, beverages, and other products, but most of all to describe someone's personality.

Nick Kaldis says that it is a behavior to which he aspires and wonders whether it is originally Japanese (more about that later).  He and his wife (from Hsinchu) are used to calling someone āshālì 阿沙力 in a very positive way, when their personality is assertive and also very much like this aSaLi stocking brand defines it:

"Āshālì" suǒ tīng qǐlái de gǎnjué zé shì xíngróng wéi rén chǔshì hǎo xiāngchǔ, háoshuǎng bùjū xiǎojié de táiwān rén xìnggé.


"The feeling that you get when you hear the word 'āshālì' describes a person who does things in such a way that it is easy to get along with them, is forthright and doesn't get hung up on trifles — the typical Taiwan personality.

Because it is so protean and ineffable, āshālì 阿沙力 is hard to define and all the harder to translate.

The first two comments below are characterizations of āshālì 阿沙力 by native Taiwanese.

Sophie Ling-chia Wei:

It is very commonly used in today's daily conversation to indicate that someone is very sharp and decisive. Originally it should be a transliteration/ transcription of the word "assertive" into Japanese*. Then it kept being used very frequently in Taiwanese with the pronunciation "assari." In modern society, in Taiwan Mandarin, it is transcribed with Chinese characters as āshālì 阿沙力 or āshālì 阿莎力 to indicate someone is gāncuì 乾脆 ("straightforward"), zhíjiē 直接 ("direct"), shuǎngkuài 爽快 ("frank"), guǒduàn 果斷 ("decisive"), and bù tuōnídàishuǐ 不拖泥帶水 ("doesn't beat around the bush"). Interesting, now this term āshālì 阿莎力 is even borrowed by Mainlanders as a brand name.

*VHM:  This is a fairly common claim in Taiwan, but we shall call it into question later on in this post.

Melvin Lee:

As far as I know, āshālì 阿沙力 is definitely a loanword from the Japanese word assari あっさり and is often used to describe an "honest, direct, easy to get along with" personality. Here is a link to the definition of this term.

I am not sure if the Japanese word あっさり is in turn a loan word from the English word "assertive". However, āshālì 阿沙力 does include the meaning of "being assertive." For example, when a person is having a difficult time making a decision, his/her friends may urge him/her to be more āshālì 阿沙力. Interestingly though, this word is much more associated with males than females. I think this fact reflects the gender culture in Japan and maybe other Asian countries as well where men are encouraged to be more straightforward and decisive.

Linda Chance:

This looks like an interesting case in which Taiwanese writers felt compelled to assign graphs and ended up with a conversion, if not perversion, of the meaning. I have never seen this word written with ateji [VHM:  kanji used to represent the sounds of native or borrowed words in Japanese without much regard to their meaning]), and those would not be the choice for them, certainly. Assari あっさり、assarishita あっさりした are used to describe the kind of food one would want to eat today, when the temperature is going up over ninety, something light and refreshing, or people who are not bogged down in heavy emotions during some encounter. "Assertive" is on the other end of the continuum, if anywhere, with respect to assari あっさり.

As for coming from English, it could not be via Japanese, as it does not fulfill the most basic sound conversion rules. "Sa" would have to be "se" and "ri" would have to be "ru", and I suspect that if the form were clipped, it would not be at the "ru" either, but after the "chi" that would be called for, even if it were a case of a borrowed onomatopoeic (for which I can't think of an example).

Nathan Hopson

Assari あっさり is a mimetic word (gitaigo 擬態語) in Japanese.
As a flavor, it is “simple and light, w/o acid”.
* Add a bit of vinegar to get さっぱり (sappari)
Perhaps it is best described as the opposite of shitsukoi しつこい, which means “insistent/persistent, cloying, repetitive, etc.” (all negative), because this applies to both flavor and attitude/personality.

My JC dictionary (Poketto puroguresshibu Nit-Chū jiten ポケットプログレッシブ日中辞典 (Pocket Progressive JC Dictionary) gives:

dànbó 淡泊 (“indifferent”);[かんたん 簡単に]jiǎndān 简单 (“simple”)

~shita sūpu したスープ
qīngdàn de tāng 清淡的汤 (“simple / light / plain soup”)
~to kotowarareru と断(ことわ)られる
断然拒绝 (“decisively / categorically refuse”)

According to Nihon kokugo daijiten 日本国語大辞典 (Unabridged dictionary of the Japanese National Language), there are multiple etymological hypotheses for assari あっさり, among which is that it is a mimetic variation on asai 浅い (“shallow”). The っり [VHM: doubling of the consonant + -ri] is a common mimetic ending.

As to whether āshālì 阿莎力 is a borrowing from English or Japanese, I don’t think there’s any doubt that it’s Japanese. Historically and linguistically, it makes more sense. It’s the standard explanation given by Wiktionary, for example.

Here is an example from an article [pdf] in Japanese by a Taiwanese researcher on Sino-Taiwanese linguistic difference.  I can translate the relevant section if necessary, but the basic point here is that assari is clearly marked as Japanese.

Although Mandarin āshālì / Taiwanese at3sa55lih3 阿沙力 is derived from Japanese assari, its meaning has shifted significantly, from "easily; readily; quickly; frankly; openheartedly; lightly (flavored food, applied makeup)" in the latter to "straightforward; direct; frank; decisive; doesn't beat around the bush" in the former.  So far as I am aware, āshālì / at3sa55lih3 阿沙力 is not used to describe a flavor or taste of food directly the way assari is (when āshālì / at3sa55lih3 阿沙力 is applied to foods, restaurants, etc., it is not about the taste or flavor; rather, it is borrowing the positive personality trait to valorize the food, restaurant, etc.).  Perhaps it would not be too far off the mark to say that every borrowing is an adaptation.

[Thanks to Grace Wu and Edward McDonald]

Disparaging trademarks

Jun. 19th, 2017 03:19 pm
Posted by Mark Liberman

"Supreme Court rules government can't refuse disparaging trademarks", ESPN:

The Supreme Court on Monday struck down part of a law that bans offensive trademarks in a ruling that is expected to help the Redskins in their legal fight over the team name.

The justices ruled that the 71-year-old trademark law barring disparaging terms infringes free speech rights.

The ruling is a victory for the Asian-American rock band called the Slants, but the case was closely watched for the impact it would have on the separate dispute involving the Washington football team.

The opinion's "syllabus" is here:

Simon Tam, lead singer of the rock group “The Slants,” chose this moniker in order to “reclaim” the term and drain its denigrating force as a derogatory term for Asian persons. Tam sought federal registration of the mark “THE SLANTS.” The Patent and Trademark Office (PTO) denied the application under a Lanham Act provision prohibiting the registration of trademarks that may “disparage . . . or bring . . . into contemp[t] or disrepute” any “persons, living or dead.” 15 U. S. C. §1052(a). Tam contested the denial of registration through the administrative appeals process, to no avail. He then took the case to federal court, where the en banc Federal Circuit ultimately found the disparagement clause facially unconstitutional under the First Amendment’s Free Speech Clause.

Held: The judgment is affirmed.

Some reelvant LLOG posts:

Fenimore Cooper, call your office“, 10/7/2003
The conventions for expressive content words“, 10/11/2003
Of limes and racial epithets“, 1/18/2004
Mascot names and etymology“, 5/25/2004
Disparaging trademarks and the lexicography of tools“, 7/16/2005
Adverbial license“, 7/17/2005
The origin of redskin“, 3/26/2006
When should linguists disclose a conflict?“, 12/15/2009
The Slants vs. the USPTO“, 8/21/2013
"'Redskins' ruled disparaging", 6/18/2014


Posted by Mark Liberman

In an interview yesterday with Chris Wallace, did Donald Trump’s lawyer Jay Sekulow state that the president is being investigated by Robert Mueller (“Jay Sekulow on reports Bob Mueller has widened investigation“, Fox News 6/18/2017)? It certainly sounds like he did:

But Chris Wallace is frustrated to find that a few seconds later, Sekulow nevertheless asserts that he didn’t say any such thing.

Sekulow has a plausible argument that his apparent factual assertion “…now he’s being investigated…” was in the middle of an implicitly hypothetical discussion of constitutional issues that would arise if he were being investigated.

It’s much less clear that Donald Trump’s tweeted statement “I am being investigated” was hypothetical:

So Mr. Sekulow may not have directly contradicted himself within the span of a few seconds, but he certainly failed to prevent Chris Wallace from coming to that conclusion, and also failed to avoid supporting some negative stereotypes of lawyers.

Since Sekulow is not a stupid or verbally inept person, this puzzles me. Or was his purpose to focus attention on the question of whether or not there’s an investigation, and the meta-question of whether or not Sekulow admitted there’s an investigation, rather than on the issues that an investigation may be investigating?

Here’s the whole context, or at least the whole context as broadcast by Fox News:

Watch the latest video at video.foxnews.com

Putting the kibosh on bosh

Jun. 19th, 2017 02:35 am
[syndicated profile] languagelog_feed

Posted by Victor Mair

In the “Cultural disappropriation” section of the current The Economist, there’s an entertaining and informative article on the latest attempt to purify Turkish:

Turkey’s president wants to purge Western words from its language:  A new step in Recep Tayyip Erdogan’s campaign against foreign influences”

The whole business is both humorous and hopeless:

Mr Erdogan started by ordering the word “arena”, which reminded him of ancient Roman depravity, removed from sports venues across the country. Turkey’s biggest teams complied overnight. Vodafone Arena, home of the Besiktas football club, woke up as “Vodafone Stadyumu”. Critics wondered what the Turkish language had gained by replacing one foreign-derived word with another.

This is not the first time in modern Turkish history that a strong leader has set out to purify the Turkish language of alien elements.  The founder of the Turkish Republic, Kemal Ataturk, did it in the thirties, when he attempted to rid the mother tongue not only of words having Arabic or Persian roots, but discarding the Perso-Arabic alphabet in which they were written as well.  For details see Geoffrey Lewis, The Turkish Language Reform: A Catastrophic Success (Oxford Linguistics) (1999; new ed., 2002).

Here are the last two paragraphs of The Economist article:

Because so much abstract vocabulary had come from Arabic and Persian, this in effect created a new language. From one generation to the next, the country’s cultural history was cut off. Mr Erdogan seems to want to turn the clock back, complete with imperial nostalgia and resentment towards the West. In 2014 he proposed introducing mandatory high-school classes in Ottoman Turkish, which survives today only among linguists, historians and clerics. The plan was shelved after a popular backlash.

The offensive against Western loanwords will probably meet a similar fate. In an interview, the TDK’s* head, Mustafa Kacalin, clarified that it would apply only to “bizarre” foreign words incomprehensible to most Turks. The limits became clear in Mr Erdogan’s own speech on May 23rd, in which he denounced loanwords by using a loanword. They were not, he said, “sik” (“chic”). Many Turks no doubt consider the whole thing a load of bosh—from the Turkish bos, “nonsense”.

*TDK = Turkish Language Institute

I’m afraid that, no matter how hard Erdogan or any other purist huffs and puffs, they will not be able to blow away the foreign building blocks which have been used in the construction of the house that is Turkish.  I am the proud owner of the big Redhouse Turkish-English dictionary (I also have on the shelves of my library the Redhouse English-Turkish dictionary which is nearly as large — both of them are around twelve hundred pages in length).  Looking through the pages of Redhouse, I see an enormous number of words from Persian, Arabic, Greek, French, Spanish, English, German, Albanian, Armenian, Hebrew, Russian, Polish, Hungarian, Bulgarian, Serbo-Croatian, Romany, Chinese, Japanese, and Malay (sorry if I missed something).

The same is true of other modern Turkic languages besides Anatolian Turkish.  Henry G. Schwarz’s An Uyghur-English Dictionary, about a thousand pages long, is full of words borrowed from Arabic and Persian.  As much as 75% of the vocabulary of Uyghur is Perso-Arabic.  During the 20th century Russian words came flooding in, and now Chinese is having a heavy impact.

If we go back to the earliest traceable stage of the Turkic lexicon, as collected in Gerard Clauson’s An Etymological Dictionary of Pre-thirteenth-century Turkish (Clarendon, 1972) and other works of scholarship on early Turkic,  we find words derived from many languages, including Indic (Sanskrit), Iranic (Sogdian, Khotanese), Mongolic / Khitan, Samoyedic, and Sinitic (here again I may have missed some).  The language that served as the source of a number of Old Turkic words that intrigued me the most when I was perusing Clauson was Tocharian, since it may have been derived from the speech of the Bronze Age mummies of Eastern Central Asia and plays such an important role in discussions of the early development of Indo-European (“Early Indo-Europeans in Xinjiang” [11/19/08]).

Is there any language on earth today that is “pure” in the sense of having no lexical borrowings or other types of influences of any sort from other languages?

[Thanks to Juha Janhunen and Marcel Erdal]

“As many people as not”

Jun. 18th, 2017 01:49 pm
Posted by Mark Liberman

A reader from India, apparently not satisfied with the responses from WordReference and StackExchange, writes to express his problem with the phrase “They kill as many people as not”, found in an article by Anne Lamott (“Anne Lamott shares all that she knows: ‘Everyone is screwed up, broken, clingy, and scared’“, Salon 4/10/2015).

“As many people as __” is routine, so presumably the problem is “as not”.

And in fact “as not” is also routine, though it’s an example of a somewhat restricted construction — sometimes not can be the elliptical remnant of a negative verb phrase or a whole negative clause:

This kills as many people as it does not kill. = This kills as many people as not.
As many people agreed as did not agree. = As many people agreed as not.

This kind of ellipsis doesn’t work in all contexts:

*We registered the people who agreed and also the people who not.

Particular instances have become common collocations, or even idioms — thus as often as not can be used as a pre-predicate adverb, as in this passage from M.R. James, “The Austin Canons in England in the Twelfth Century“, Journal of Theological Studies 1904:

Nor most we be led away by the term Minster, and imagine that there were numerous small isolated monasteries in the kingdom. In the time of Beda we know that there were settlements of a vague kind of monasticism, but the head of these houses was as often as not married and the churches had been handed down from Either to son, and they had by this time fallen into the hands of those who were called secular clergy and were as often as not married men. The term Minster, as we have it in Ilminster, Charminster, Axminster, Banwell Minster, Cheddar Minster, seems to denote a church to which a resident priest was attached. The several Whitchurches in the south-west of England are all called Album Mooasterium and as often as not Whytminster.

In “A Vocabulary of Thinking” Gertrude Stein turned this into a sort of verbal finger spinner: “As often as not as often as not they as often as not were to be going away.”

But anyhow, “as many people as not” is Out There:

[link] Hype cuts both ways and may turn off as many people as not.

[link] Three times as many people as not favour mixed-use, smart growth communities, but most still want to live in a stand-alone house.

[link] These methods work for just as many people as not.

[link] It seems that as many people as not who go there do the same – I actually got the idea from reading the Bahamas forums here on Tripadvisor.

And “as many X as not” is of course more common:

[link] Either way, the verdict already seems clear that as many Americans as not are defaulting to the mediocre in principle, not merely in practice.

[link] So almost as many voters as not were seasoned enough in Kentucky politics to know that the mention of an efficient and effective Frankfort is no more than an oxymoron.

[link] Nudist beaches and camps (Freie Koerper Kultur or ‘free body culture’) are common all over the county, and in most public swimming places, as many women as not will be topless.

So I hope that the curious person from India will be satisfied with this explanation. But If Not, Then Oh Well.

Death by french fries

Jun. 18th, 2017 11:00 am
Posted by Geoffrey K. Pullum

The Daily Telegraph did not do much for its reputation, at least in my eyes, when it confused the defense with the prosecution after a celebrity sexual assault mistrial. Nor when it recently consulted me about whether there were grammar mistakes on a banknote, learned that there clearly were not, but went ahead and published the claim that there were anyway. Now for a sample of the Telegraph‘s science reporting, written by Adam Boult, who I suspect didn’t complete his statistics course:

That’s right: although your probability of dying is one hundred percent, just like mine, the Telegraph has found a study saying you can double it by eating french fries.

On the left you can see the Telegraph‘s picture of Britain’s brave prime minister courting death by eating fries regardless (why antagonize the fast food industry when you lack a majority in parliament). I will not bother to go through the business of correcting the math and explaining what the study really said, because that has been nicely done in an article in The Spectator by Christopher Snowdon. Suffice it to say that just when I think I’ve seen the dumbest science story headline ever, along comes a dumber one.

The most important word in Finnish

Jun. 17th, 2017 05:53 pm
Posted by Mark Liberman

Of course there are many words in any language that are similarly protean. In English, try “Okay”. Or just “mm”…

Defense counsel for the victim?

Jun. 17th, 2017 04:27 pm
Posted by Geoffrey K. Pullum

A truly Freudian slip in a story in the UK conservative newspaper the Daily Telegraph, speaking volumes about what goes wrong with so many rape and sexual assault prosecutions:

Camille Cosby, wife of the entertainer, issued a statement, read out by an associate on the court steps in a dramatically-delivered speech.

She attacked the judge as biased, and said the defence were “totally unethical.”

The defense? Andrea Constand and the other brave women who have accused Bill Cosby (they say he drugged them so he could enjoy sexual gratification without their consent) were not in the dock, and the lawyers arguing their case were not the defense team, but the prosecutors. The Telegraph journalist, Harriet Alexander, has apparently reversed the roles of the accused’s defense and the district attorney.

Women who have been assaulted have often reported feeling like they are the ones on trial. Here is the Telegraph actually putting it that way in cold newsprint (attributed wrongly to Camille Cosby).

The error is apparently down to Harriet Alexander (and of course the Telegraph editors who failed to spot what she had written). ABC News, for example, reports Mrs Cosby as saying: “How do I describe the counsels for the accusers? Totally unethical.” In the UK, the Guardian also has it right. Likewise all other accounts I can find.

I don’t know whether the Telegraph is yet aware of the slip, but at 5 p.m. UK time (noon Eastern time) it is still online. I conclude with the obligatory screenshot:

