Linguistic tools for the supervillain

Oct. 18th, 2017 03:30 pm
[syndicated profile] languagelog_feed

Posted by Mark Liberman

In celebration of Geoff Pullum's 700th LLOG post, "World domination and threats to the public", we'll be meeting for a quiet (virtual) drink this evening. But meanwhile I'll quietly suggest that Geoff has been too hasty in joining Randall Munroe at xkcd in assigning to the field of Linguistics a "low likelihood of being a crucial tool for a supervillain, and low probability of anything breaking out of the research environment and threatening the general population".

In fact LLOG posts have described at least two fictional counter-examples  over the years, and I expect that commenters will be able to suggest some others.

There's "La septième fonction du langage" (8/24/2017), describing Laurent Binet's novel of the same name, which imagines that Roman Jakobson extended his six functions of language with a secret seventh function, designated as the “magic or incantatory function,” whose mechanism is described as “the conversion of a third person, absent or inanimate, to whom a conative message is addressed". Instructions for using this seventh function were powerful enough to ensure the election of François Mitterand, and motivated an international police operation to prevent them from falling into more dangerous hands.

And there's also "Digitoneurolinguistic hacking" (2/4/2011) in which I quoted the Wikipedia entry for Neil Stephenson's 2003 novel Snow Crash:

The book explores the controversial concept of neuro-linguistic programming and presents the Sumerian language as the firmware programming language for the brainstem, which is supposedly functioning as the BIOS for the human brain. According to characters in the book, the goddess Asherah is the personification of a linguistic virus, similar to a computer virus. The god Enki created a counter-program which he called a nam-shub that caused all of humanity to speak different languages as a protection against Asherah, supposedly giving rise to the biblical story of the Tower of Babel. […]

As Stephenson describes it, one goddess/semi-historical figure, Asherah, took it upon herself to create a dangerous biolinguistic virus and infect all peoples with it; this virus was stopped by Enki, who used his skills as a "neurolinguistic hacker" to create an inoculating "nam-shub" that would protect humanity by destroying its ability to use and respond to the Sumerian tongue. This forced the creation of "acquired languages" and gave rise to the Biblical story of the Tower of Babel. Unfortunately, Asherah's meta-virus did not disappear entirely, as the "Cult of Asherah" continued to spread it by means of cult prostitutes and infected women breast feeding orphaned infants …

Since these examples belong more to the realm of fantasy than hard science fiction, I have to admit that Geoff is probably right about our field being "a safe thing to work on" — at least if you have a positive opinion of the  various modern commercial and governmental applications of computational linguistics.

 

[syndicated profile] languagelog_feed

Posted by Geoffrey K. Pullum

Linguistics is in the most desirable quadrant according to today's xkcd: low likelihood of being a crucial tool for a supervillain, and low probability of anything breaking out of the research environment and threatening the general population.

But I'm not at all sure that everything is positioned correctly. Molasses storage should be further to the right (never forget the Great Boston Molasses Flood of 1919); dentistry should be moved up (remember Marathon Man); robotics in its current state is too highly ranked on both axes; and entomology, right now (October 18, 2017), in addition to being slightly too low, is spelled wrong. Lots to quibble about, I'd say. But not the standing of linguistics as a safe thing to work on.

Randall Munroe did not pick molasses as a random threat, of course; his mouseover alt text reads: "The 1919 Great Boston Molasses Flood remained the deadliest confectionery containment accident until the Canadian Space Agency's 2031 orbital maple syrup delivery disaster."

And I think the misspelling of entomology must be another case of him toying with us; he knows people confuse etymology with the study of insects: see https://xkcd.com/1012/. I think he's just messing with our heads. As usual.

Thanks to Joan Maling and Meredith Warshaw.

"Artist=President Barack Obama"

Oct. 17th, 2017 04:00 pm
[syndicated profile] languagelog_feed

Posted by Mark Liberman

Alex Jones, contact LLOG immediately! Never mind Pizzagate, never mind Sandy Hook, never mind the FEMA concentration camps, never mind the fake moon landings. This morning I stumbled on evidence, lying around in plain sight, for a systematic program of deception so huge — and yet so improbable — that even InfoWars listeners will find it hard to believe: Donald Trump is actually Barack Obama in disguise.

For years, I've been collecting and analyzing the weekly addresses of various American presidents — see e.g. "Political sound and silence", 2/8/2016; "Some speech style dimensions", 6/27/2016; "Trends in presidential pitch", 5/19/2017; "Trends in presidential pitch II", 6/21/2017.

Today I was catching up with Donald Trump's weekly addresses, downloading the .mp3 files from whitehouse.gov. The most recent weekly address is available at

https://www.whitehouse.gov/featured-videos/video/2017/10/13/101317-weekly-address

with the mp3 download link

https://www.whitehouse.gov/videos/2017/October/20171013_Weekly_Address.mp3

After downloading the mp3 file, in order to check its characteristics, I ran soxi. I've done this before, but in the past I just looked at the things I cared about, namely the sampling frequency and number of channels. But this time, I happened to look at the ID3 metadata fields as well:

Input File : '20171013_Weekly_Address.mp3'
Channels : 2
Sample Rate : 16000
Precision : 16-bit
Duration : 00:03:26.17 = 3298752 samples ~ 15462.9 CDDA sectors
File Size : 3.45M
Bit Rate : 134k
Sample Encoding: MPEG audio (layer I, II or III)
Comments :
Title=Weekly Address
Artist=President Barack Obama
Album=The White House
Tracknumber=1
Year=2016
Genre=12

I wondered whether this was a one-time glitch, so I checked the history. The first of President "Trump"'s weekly addresses is available at

https://www.whitehouse.gov/featured-videos/video/2017/02/03/weekly-address

with the mp3 download link

https://www.whitehouse.gov/videos/2017/February/20170203_Weekly_Address.mp3

And the metadata is the same:

Input File : '20170203_Weekly_Address.mp3'
Channels : 2
Sample Rate : 16000
Precision : 16-bit
Duration : 00:04:20.24 = 4163904 samples ~ 19518.3 CDDA sectors
File Size : 4.27M
Bit Rate : 131k
Sample Encoding: MPEG audio (layer I, II or III)
Comments :
Title=Weekly Address
Artist=President Barack Obama
Album=The White House
Tracknumber=1
Year=2016
Genre=12

In fact this is consistent in all of the Weekly Addresses from Donald Trump's White House.

It's not an issue in all mp3 encodings from the White House — thus Melania Trump's 10/17/2017 "Hurricane Relief PSA" is attributed to "Artist=The White House", even if the year is still given as 2016:

Input File : '20171011_FLOTUS_DTC.mp3'
Channels : 2
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:31.50 = 504000 samples ~ 2362.5 CDDA sectors
File Size : 696k
Bit Rate : 177k
Sample Encoding: MPEG audio (layer I, II or III)
Comments :
Title=FLOTUS DTC – T6
Artist=The White House
Album=The White House
Tracknumber=1
Year=2016
Genre=12

And the same is true for the president's joint news conference with PM Theresa May back in January:

Input File : '20170127_POTUS_and_PM_May_JPA.mp3'
Channels : 2
Sample Rate : 16000
Precision : 16-bit
Duration : 00:18:19.20 = 17587168 samples ~ 82439.9 CDDA sectors
File Size : 17.8M
Bit Rate : 130k
Sample Encoding: MPEG audio (layer I, II or III)
Comments :
Title=POTUS and PM May JPA
Artist=The White House
Album=The White House
Tracknumber=1
Year=2016
Genre=12

It's just the weekly addresses that are attributed to "President Barack Obama"

By the way, you may be as disappointed as I was to learn that the "Genre=12" just means "Other" — I was hoping for maybe "[23] => Pranks" or "[58] => Cult" or "[136] => Christian Gangsta".

Jokes aside, what this means is presumably that the Trump White House inherited a recording and web-distribution set-up from the Obama White House, and neglected to change the ID3 metadata information for various categories of material.

 

Invitational spam from a junk journal

Oct. 17th, 2017 03:04 pm
[syndicated profile] languagelog_feed

Posted by Geoffrey K. Pullum

I continue to be astonished by the sheer volume of the junk email I get from spam journals and organizers of spamferences, and by the utter linguistic ineptitude of the unprincipled hucksters responsible for the spam. Every month I get dozens of new-journal announcements, calls for papers, requests for conference attendance, subscription offers, and so on. Today I got a prestige invitation based flatteringly on my published work. It began thus:

After careful evaluation and reading your article published in Journal of Logic, Language and Information entitled "On the Mathematical Foundations of", we decided to send you this invitation.

Clearly the careful evaluation and reading did not enable them to get to the end of my title (it does not end in of). And what was the invitation?

In light of your remarkable achievements in Critical Care, we would like to invite you to join the Editorial Board of Journal of Nursing.

Nursing. I'm an expert in critical care nursing, apparently. If the email were not so clearly machine-generated, I could almost have seen it as a cruel allusion to my year of looking after my wife Tricia before she died last year. But no, it's not that. They claim to have ascertained my distinction in critical care from their careful reading of a paper of which the full title is "On the mathematical foundations of Syntactic Structures." It's a technical examination of the formalism of Noam Chomsky's first book on syntactic theory (Journal of Logic, Language and Information 20: 277-296, 2011).

Almost all of the hundreds and hundreds of new rip-off journals who send me this sort of spam are based in China. This one "is supported and partially financed by the hosting organization, Beijing Spring City Educational Publications Research Center."

The support of this research center has allowed the publishers "to reduce the OA article publishing charges from $800 to $150 (additional $50 applied if print version is required)." So if you want to see your article about nursing in print, you send them $200. And I suspect that when choosing whether to publish your paper they will exercise all the care they showed in reading my syntax paper and confirming my credentials in critical care.

There are many things to worry about in connection with the birth of flocks of spam journals, scores at a time: confusion for students, pollution of the scientific literature, degrading of the concept of a refereed journal, publication of ill-reviewed junk science, and (if even a few libraries occasionally take out misguided subscriptions to these crap journals) waste of library budgets.

Gross syntactic errors in promotional material provide an almost infallible indicator of spamhood in a journal. Not many journals send unsolicited email to advertise themselves, but the few promotional emails I occasionally get from proper journals are always at least literate. Whereas this one says:

Our journal, Journal of Nursing, is a new journal which urgently needs professional like you to join our editorial board and help and support the journal to a healthy grow.

I hope none of you professional will support it to a healthy grow. You don't need to be much of a sleu to know they are not telling the tru; their journal is not wor one twelf of the paper that it costs an extra $50 to be printed on.

[syndicated profile] languagelog_feed

Posted by Victor Mair

Bilibili (bīlībīlī 哔哩哔哩; B zhàn B站 ("B site / station") "is a video sharing website themed around anime, manga, and game fandom based in China, where users can submit, view, and add commentary subtitles on videos" (Wikpedia).  When you register for this site, you're supposed to declare whether you're M(ale) or F(emale), in which case your posts will be referred to respectively as "tā de 他的" ("his") and "tā de 她的" ("hers").  If you do not specify your gender, your posts will be referred to as "ta的" or "TA的", i.e., neither M(ale) (tā de 他的) nor F(emale) (tā de 她的).

Here's a screenshot of a friend's bilibili page showing this usage:

Cf. also:

What seems to have happened over the long haul during the last century has been first a gendering of the third person pronoun, then a degendering, then a regendering accompanied by another degendering….  It's enough to make your head spin.  But all of that is in the written language: 他她它 ("he, she, it"), etc.  In the spoken language, they remain constant: tā.

[Thanks to Alex Wang]

Paramilitary

Oct. 16th, 2017 12:51 pm
[syndicated profile] languagelog_feed

Posted by Mark Liberman

Does Spanish paramilitar have a different meaning than English paramilitary, or at least stronger negative connotations? This question has recently become the focus of reaction to a New Yorker article by Jon Lee Anderson, "The increasingly tense standoff over Catalonia's independence referendum", 10/4/2017.

The first paragraph of Anderson's article (emphasis added):

Voting rights have been under siege in the U.S. in recent years, with charges of attempted electoral interference, legislation that seeks to make access to the polls more difficult, and gerrymandering, in a case that reached the Supreme Court this week. But no citizens here or in any democracy expect that they may be attacked by the police if they try to vote. Yet that is what happened on Sunday in the Spanish region of Catalonia, where thousands of members of the Guardia Civil paramilitary force, and riot police, were deployed by the central government in Madrid to prevent the Catalans from holding an “illegal” referendum on independence from Spain.

In El País, Antonio Muñoz Molina accused Anderson of lying ("En Francoland: En Europa o América, les gusta tanto el pintoresquismo de nuestro atraso que se ofenden si les explicamos todo lo que hemos cambiado"):

Pocas cosas pueden dar más felicidad a un corresponsal extranjero en España que la oportunidad de confirmar con casi cualquier pretexto nuestro exotismo y nuestra barbarie. Hasta el reputado Jon Lee Anderson, que vive o ha vivido entre nosotros, miente a conciencia, sin ningún escrúpulo, sabiendo que miente, con perfecta deliberación, sabiendo cuál será el efecto de su mentira, cuando escribe en The New Yorker que la Guardia Civil es un cuerpo “paramilitar”.

("In Francoland: Both Europe and the US love what they see as Spain’s quaint backwardness so much that they feel insulted when we explain to them how much we have changed"):

Few things make a foreign correspondent in Spain happier than the opportunity to corroborate our exoticism and our brutality. Even the renowned Jon Lee Anderson, who lives or has lived among us, is deliberately lying, with no qualms he is aware that he is lying and aware of the effect his lies will have, when he writes in The New Yorker that the Civil Guard is a “paramilitary” force. [translation from the El País web site]

This has resulted in an energetic discussion on Twitter (Twitzkrieg?), in which Anderson's position is that many English-language sources call the Guardia Civil "a paramilitary police force" or something similar, e.g.

and that Antonio Muñoz Molina is using a meaning difference between English and Spanish in a disingenuous way, e.g.

Before looking into it, my understanding of the English word paramilitary aligned with Anderson's, namely that it means "organized along military lines", whether in reference to governmental organizations that are not part of the military, or to civilian militia-like entities. It's easy to find examples in English where paramilitary is applied to non-military governmental organizations, e.g. these examples from Google Books:

Correctional officers (C.O.s) were organized in accordance with a rigid paramilitary chain of command.

There is an obvious need to change the bureaucratic paramilitary structure of police organizations, so prevalent in the majority of police organizations around the world.

But on looking into it, I found that things are more complex. I was surprised to find that the OED's only relevant gloss would specifically NOT apply to a police organization like Spain's Guardia Civil:

Designating, of, or relating to a force or unit whose function and organization are analogous or ancillary to those of a professional military force, but which is not regarded as having professional or legitimate status.

The OED's earliest citation is from 1935, but seems to originate in the 1934 "Reply of the United Kingdom Government" at a League of Nations "Conference for the Reduction and Limitation of Armaments". The OED citation is the first sentence of the following:

A difficult problem has been raised in regard to the so-called " paramilitary training" — i.e., the military training outside the army of men of military age. His Majesty's Government suggested that such training outside the army should be prohibited, this prohibition being checked by a system of permanent and automatic supervision, in which the supervising organisation should be guided less by a strict definition of the term " military training" than by the military knowledge and experience of its experts. They are particularly glad to be informed that the German Government have freely promised to provide proof, through the medium of control, that the S.A. and the S.S. are not of a military character, and have added that similar proof will be furnished in respect of the Labour Corps. It is essential to a settlement that any doubts and suspicions in regard to these matters should be set and kept
at rest.

The earliest use of the term in the New York Times is in a report about the same discussions —

"Simon to the Commons", 4/9/1935: (Following is the text of the account given to the House of Commons today by Foreign Secretary Sir John Simon of conversations recently held by him and Anthony Eden, Lord Privy Seal, with leading officials in Berlin, Moscow, Prague and Warsaw)

Regarding land armaments, Herr Hitler stated that Germany required thirty-six divisions, representing a maximum of 550,000 soldiers of all arms, including a division of Schutzstaffel and militarized police troops. He asserted that there were no paramilitary formations in Germany.

The next example has the same negative connotations and the same association with fascist groups — "France suspects Klan counterpart", NYT 11/17/1937:

The question or whether a French counterpart to the Ku Klux Klan really exists was again raised today through the arrest of a wealthy Lille contractor, Rene Anceaux, M. Vosselm, one of his employes, and Gerard de ia Motte-Saint Pierre on charges which remain unspecified, but are in the case of M. Anceaux plotting against the security of the State and for the others possessing weapons of war and "association with wrongdoers." […]

M. Anceaux served as an officer during the World War and was wounded. He was the president of the Lille branch of the dissolved Rightist "Paramilitary League."

The 1939 New Jersey statutes contain a law using the term in a similar way:

Any 2 or more persons who assemble as a paramilitary organization for the purpose of practicing with weapons are disorderly persons.

where

As used in this act, “paramilitary organization” means an organization which is not an agency of the United States Government or of the State of New Jersey, or which is not a private school […]

So in English as well as in Spanish (and French and presumably other languages), the term paramilitary and its cognates seem to have originated in the 1930s in reference to fascist groups "whose function and organization are analogous or ancillary to those of a professional military force, but which [are] not regarded as having professional or legitimate status", as the OED put it.

At some point, the "not regarded as having professional or legitimate status" clause seems to have faded away — though perhaps without being totally lost, since the term continues to be used to refer to non-governmental as well as governmental but non-military organizations. Thus "Charlottesville Joins Suit Against Paramilitary Groups Connected to August 12", NBC News 10/12/2017:

Charlottesville is joining a suit to prevent what it calls unauthorized paramilitary groups from returning to the city.

Georgetown Law Institute for Constitutional Advocacy and Protection filed a complain Thursday, October 12, asking Charlottesville Circuit Court to, "prohibit key Unite the Right organizers and an array of participating private paramilitary groups and their commanders from coming back to Virginia to conduct illegal paramilitary activity."

And my impression is that when someone uses the word "paramilitary" in connection with police forces, their attitude is often a critical one. Thus "Paramilitary police: Cops or soldiers?", The Economist 3/20/2014, begins with the subhed "America's police have become too militarised", and notes that

Special Weapons and Tactics (SWAT) teams (ie, paramilitary police units) were first formed to deal with violent civil unrest and life-threatening situations: shoot-outs, rescuing hostages, serving high-risk warrants and entering barricaded buildings, for instance. Their mission has crept. […]

Kara Dansky of the American Civil Liberties Union, who is overseeing a study into police militarisation, notices a more martial tone in recent years in the materials used to recruit and train new police officers. A recruiting video in Newport Beach, California, for instance, shows officers loading assault rifles, firing weapons, chasing suspects, putting people in headlocks and releasing snarling dogs.

This is no doubt sexier than showing them poring over paperwork or attending a neighbourhood-watch meeting. But does it attract the right sort of recruit, or foster the right attitude among serving officers? Mr Balko cites the T-shirts that some off-duty cops wear as evidence of a culture that celebrates violence (“We get up early to beat the crowds”; “You huff and you puff and we’ll blow your door down”).

Anyhow, there can be little question that Spain's Guardia Civil is a "paramilitary police force" in the current English-language sense of the word.

And it's not clear to me that the current Spanish usage is actually different. Thus the Real Academia's Diccionario de la lengua española defines paramilitar as

1. adj. Dicho de una organización civil: Dotada de estructura o disciplina de tipo militar.

without any stipulation of illegitimacy. And since the same dictionary defines civil in the relevant sense as "Que no es militar ni eclesiástico o religioso", and since the Guardia Civil is self-defined as "civil", it seems that paramilitar ought to apply to that organization without any untruthful intent or effect.

[h/t David Lobina]

 

 

Five things

Oct. 14th, 2017 10:04 pm
[syndicated profile] languagelog_feed

Posted by Mark Liberman

I've noticed recently that there's a tendency for things in the media to come in fives. Thus recently at The Hill (warning – autoplay videos): "Five things to know about Trump and NAFTA", "Five things to know about Trump’s controversial ObamaCare decision", "5 things to watch for at campaign cash deadline", "Five things to know about Trump’s immigration principles", "Five things to watch as Trump visits Puerto Rico", etc.

At the Washington Post: "Five things to watch in Alabama’s special election", "Five story lines to watch as NBA training camps get underway", "If Trump really wants to fix troubled schools, here are five things he could do", "Why are there protests in Poland? Here are the five things you need to know", "Five things I learned about Russia last week", etc.

At the New York Times: "Esteem, Money and Mystery: 5 Things to Know About the Nobels", "Five Things I Hate About New Cars", "Five Things to Remember Before You Renovate", "Five Things to Do This Weekend", "Five Things T Editors Are Really Into Right Now", etc.

At Politico: "5 things we learned from the Senate's Russia probe update", "Five things to watch in the Alabama runoff election", "Virginia governor's primary: 5 things to watch", "SESSIONS TESTIFIES TODAY – Five things to watch during today’s hearing", "5 things to know about Trump's FBI pick Christopher Wray", etc.

At The Independent: "Five things we learned from Crystal Palace's stunning upset victory over Premier League champions Chelsea", "Five things to look out for when the IMF and the World Bank meetings happen in Washington this week", "Five things we learned from Watford's superb comeback win against a misfiring Arsenal", "Five things to look out for in the economy this week", "Five things to bear in mind as Hurricane Irma hits the US", etc.

Things come in other cardinalities, of course, but in general five sticks out:

two things three things four things five things six things seven things
Bing News  16.5M  8.39M  2.35M  17.4M  3.84M  2.71M
 The Hill  738  263  66  967  9  34
 WaPo  5952  1923  438  1174  159  145
Politico 1162 358 87  453  59  57
 Atlantic  1830  464  98  170  19  14
Economist 4180  1580 128 252 15 16

I wonder when the press turned pentatonic?

Anyhow, these days the ratio of "five things" to "four things" seems to be a kind of click-baitiness index.

 

Easy versus exact

Oct. 14th, 2017 06:49 pm
[syndicated profile] languagelog_feed

Posted by Victor Mair

Ever since people started inputting Chinese characters in computers, I've had an intense interest in how they do it, which systems are more efficient, and why they choose the particular ones they adopt.  For the first few decades, because all inputting systems presented significant obstacles and challenges, I remained pretty much of an onlooker because I didn't want to waste my time struggling with cumbersome methods.  It's only after I discovered how simple and fast it is to use Google Translate as my chief inputting method that I became very active in entering Chinese character texts.

Because of the above considerations, during the last three to four decades, I have developed the habit of closely and carefully scrutinizing friends, colleagues, students, and others as they enter Chinese characters in their computers, cell phones, tablets, and other digital tools.  I have written about my observations in many Language Log posts, including the following:

"Chinese character inputting" (10/17/15)

"Stroke order inputting" (10/30/11)

"Cantonese input methods" (1/20/15)

"Google Translate Chinese inputting" (1/27/13)

"Creeping Romanization in Chinese" (8/30/12)

"Chinese Typewriter" (6/30/09)

"Chinese typewriter, part 2" (4/17/11)

"Zhou Youguang, Father of Pinyin" (1/14/14)

"Zhou Youguang, 109 and going strong " (1/13/15)

"Swype and Voice Recognition for mobile device inputting" (1/22/14) — esp. ¶¶ 3-5

"Language notes from Macao and Hong Kong" (6/22/14) — search for "Starbucks"

Usually I just watched what people did as they entered characters and drew my own conclusions from what I saw, not wanting to interrupt their typing.  Lately, however, as in the last post in the above list, I've had more opportunities to ask people how they choose from among the many inputting methods that are available to them.  The answers I've been receiving are quite revealing.

I shan't go through all possible methods, but will focus only on the two most popular means for inputting characters.  By far the most common method for inputting Chinese characters — especially for people who are around forty or younger — is Hanyu Pinyin.  The next most common method — particularly for those who are over forty or so — is to write the characters with the tip of one's finger on a glass touch screen or pad.  In several of the above posts, I have described the frantic flailing one witnesses when people input Chinese characters this way.

From my earlier observations, I noticed that people who entered Chinese characters via the tip of their index finger (less often with a stylus) frequently seemed frustrated and aborted the effort to produce a desired character because what they wanted was not showing up in the list of characters displayed.  Some would try again and again till they got what they wanted, or they would shift to Pinyin to call up the character they were after.

Recently, I have asked some of the people who were switching back and forth between writing the characters with their fingertip and typing them via Pinyin why they didn't just use Pinyin all of the time if they often had to resort to it anyway.  The usual answer was that they would start out writing with their fingertip on the glass screen or pad of their electronic device because, especially for very simple and common characters like nǐ 你 ("you") and hǎo 好 ("good"), because they felt it was the path of least resistance, but would switch to Pinyin when they were frustrated at calling up more complex and difficult characters such as lài 癞 / 癩 ("scabies") and pēntì 喷嚏 / 噴嚏 ("sneeze").

As I watched some of these individuals inputting a variety of characters and being stymied when their software proved incapable of quickly retrieving recalcitrant characters, I asked them precisely why they would change over to Pinyin.  The answer was that the fingertip writing offered too many possibilities for them to have to choose from (and many times none of the proffered characters was the one they were after), whereas when they switched over to Pinyin and typed by words in context, the choices presented by the software were much fewer, and, in many cases, were narrowed down to precisely the exact combinations they were after.

I wish to emphasize that the majority of people who are inputting Chinese text do use Pinyin exclusively or nearly so for inputting characters, and they do so because it is faster, more convenient, more accurate, and more efficient than other methods, and above all it does not require them to learn any special codes, mnemonics, or non-intuitive techniques for decomposing the characters.

[syndicated profile] languagelog_feed

Posted by Geoffrey K. Pullum

I continue to be astonished by the sheer volume of the junk email I get from spam journals and organizers of spamferences, and by the linguistic ineptitude of the unprincipled responsible parties. I have been getting dozens per month, for a year or more: journal announcements, calls for papers, requests for conference attendance, subscription information, and invitations to editorial boards. Today I got a prestige invitation that began thus:

After careful evaluation and reading your article published in Journal of Logic, Language and Information entitled “On the Mathematical Foundations of", we decided to send you this invitation.

Clearly the careful evaluation and reading did not enable them to get to the end of my title (it does not end in of). And what was the invitation?

In light of your remarkable achievements in Critical Care, we would like to invite you to join the Editorial Board of Journal of Nursing.

Nursing. I'm an expert in critical care nursing, apparently. And they have ascertained this from their careful reading of a paper called "On the mathematical foundations of Syntactic Structures," a technical examination of the formalism of Noam Chomsky's first book on syntactic theory.

Almost all of the hundreds and hundreds of new rip-off journals who send me spam are based in China. This one "is supported and partially financed by the hosting organization, Beijing Spring City Educational Publications Research Center."

The support of this research center has allowed the publishers "to reduce the OA article publishing charges from $800 to $150 (additional $50 applied if print version is required)." So if you want to see your article about nursing in print, you send them $200. And I suspect that when choosing whether to publish your paper they will exercise all the care they showed in reading my syntax paper and confirming my credentials in critical care.

There are many things to worry about in connection with the birth of flocks of spam journals, scores at a time: confusion for students, pollution of the scientific literature, degrading of the concept of a refereed journal, publication of ill-reviewed junk science, and (if even a few libraries occasionally take out misguided subscriptions to these crap journals) waste of library budgets.

Gross syntactic errors in promotional material provide an almost infallible indicator of spamhood in a journal. Not many journals send unsolicited email to advertise themselves, but the few promotional emails I occasionally get from proper journals are always at least literate. Whereas this one says:

Our journal, Journal of Nursing, is a new journal which urgently needs professional like you to join our editorial board and help and support the journal to a healthy grow.

I hope none of you professional will support it to a healthy grow. You don't need to be much of a sleu to know they are not telling the tru; their journal is not wor one twelf of the paper that it charges an extra $50 to be printed on.

Terror of singular 'they'

Oct. 13th, 2017 08:32 pm
[syndicated profile] languagelog_feed

Posted by Geoffrey K. Pullum

Joining a crowd of other recent fraudsters, Paul Roberts and Deborah Briton returned from their Spanish vacation and subsequently turned in a completely fake claim against the Thomas Cook package-vacation company, alleging that their time in Spain had been ruined by stomach complaints for which the hotel and the company should be held liable. They sought more than $25,000 in damages for the fictional malady. The judge sentenced them to jail. And in this report of the case my colleague Bob Ladd noticed that Sam Brown, the prosecuting attorney, showed himself to be so terrified of blundering into a singular they that he would not even risk using they with plural reference, preferring to utter a totally ungrammatical sentence:

*Sam Brown, prosecuting, said: "Both defendants knew that in issuing this claim he or she would be lying in order to support it."

Beware of struggling to obey prescriptive injunctions that don't come naturally to you; they can warp your ability to use your native language sensibly.

And also beware of trying to cheat Spanish hoteliers with spurious claims of stomach trouble. They're onto the scam. One hotel in Mallorca (see this story) became suspicious about the way about 200 claims from among 9,000 guests were distributed among nationalities:

United Kingdom Germany Netherlands Other
200 0 0 0

Notice also this statistic concerning when the illness was first reported:

While staying at hotel After returning to UK
0 200

And these data about exactly who did the reporting and made the claim:

Reported by guest Professional claims company
0 200

Somewhat improbable statistically, the hotelier thought.

The less… umm… fewer the better

Oct. 13th, 2017 02:26 pm
[syndicated profile] languagelog_feed

Posted by Geoffrey K. Pullum

Someone with a knowledge of usage controversies, German language, and modern political history put this on the web somewhere; I haven't been able to find out who or where:

[Hat tip: Rowan Mackay]

[syndicated profile] languagelog_feed

Posted by Victor Mair

Pro-Cantonese sign in Hong Kong:


A man holds a sign professing his love for Cantonese as he attends a Hong Kong rally in 2010 against mainland China’s bid to champion Mandarin over Cantonese. Picture: AFP

The sign says (in Cantonese):

ngo5 oi3 gwong2dung1waa2 ("I love Cantonese")

m4 sik1 bou1dung1gwaa1 ("I don't know Putonghua [Modern Standard Mandarin / MSM]").

Note that Pǔtōnghuà / Pou2tung1waa6*2 普通話 ("MSM") is here written punningly as bou1dung1gwaa1 煲冬瓜 ("stewed winter melon").

It could also be written with another pun:  paau4*2dung1gwaa刨冬瓜 ("shaved winter melon")

The above photograph and caption are from this sensible article by Lisa Lim in the South China Morning Post, "Language Matters" (9/29/17):

Why it’s hard to argue there is one Chinese language

To a linguist ‘the Chinese language’ is a family of languages – not dialects – that for the most part are mutually unintelligible and written different ways; an appreciation of this variety would help discussions about language policy.

Biographical note in the SCMP:

Lisa Lim has worked in Singapore, the UK, Amsterdam, and Sri Lanka, and is now Associate Professor and Head of the School of English at the University of Hong Kong. She is co-editor of the journal Language Ecology, founder of the website linguisticminorities.hk and co-author of Languages in Contact (Cambridge University Press, 2016).

Although some things the author says may be open to discussion (e.g., "Chinese" is comparable to the Romance or Germanic "families", is a branch of the Sino-Tibetan family, etc.), much of what she says is spot on (e.g., most of the "Chinese" language groups are mutually unintelligible, her calling into question referring to these groups as "dialects", and so forth).

Modern written Chinese is technically not bound to any specific variety, though it mostly represents the grammar and vocabulary of Mandarin. But Cantonese has its own written forms, for both formal (“High”) and colloquial (“Low”) vari­eties. The latter flourishes in Hong Kong, where, for instance, one finds  (fan) for “sleep” in addition to the more formal  (sèoih).

[VHM:  Nobody would understand you if you used the term fan3 瞓 in Mandarin, even if you pronounced it fèn à la mandarin.]

In classrooms, Chinese texts are often taught using H Cantonese, with Putonghua pronunciation having little currency – for example, the word for “no, not”, realised as  (m̀h) in colloquial Cantonese, is written as  in Standard Chinese, pronounced  in Putonghua, but the formal H Cantonese pronunciation b¯ a t is likely to be used. There is even Hong Kong Written Chinese, influenced by Cantonese and English.

Official references to these various systems are often blurred and confused under the label “Chinese language”. Parents’ and policymakers’ worries about students’ “Chinese language” proficiency, as well as the medium-of-instruction debate, will continue, with issues of mother-tongue-based education and national-vs-local identity at their core. A more nuanced appreciation of all that “Chinese language” encom­passes will go a long way towards more fruitful discussions.

[VHM:  These are the last three paragraphs of the article.]

What a breath of fresh air Lisa Lim's article is!

[Thanks to Bob Bauer and Abraham Chan]

Awesome / sugoi すごい!

Oct. 12th, 2017 07:48 pm
[syndicated profile] languagelog_feed

Posted by Victor Mair

From Diane Moderski:



Is this the beginning of the end of the need for interpreters?

[syndicated profile] languagelog_feed

Posted by Victor Mair

Three videos

Metro Manners PSA: Super Kind – Seat Hogging ホギング


Metro Manners PSA: Super Kind – Eating イーティング

Metro Manners PSA: Super Kind – Aisle Blocking ブロキング

From Nikita Kuzmin, who first learned about them on this Russian media website.

[syndicated profile] languagelog_feed

Posted by Victor Mair

For the last two decades or so, my brother Denis and I have been working on a translation of the Yìjīng 易經 (Classic of Changes).  We shall probably finish the first draft within a year.

Of all the Chinese classics, the I ching is the one that most Sinologists do not want to touch because of its maddening opacity.  In this regard, it is worth quoting at some length the words of James Legge (1815-1897), the Victorian translator of all the Confucian classics, a monumental achievement that still stands today as an invaluable resource for anyone who wishes to acquaint him/herself with these essential texts of early Chinese civilization.

On the I ching / Yi jing, Legge opines:

The peculiarity of its style makes it the most difficult of all the Confucian classics to present in an intelligible version. I suppose that there are sinologists who will continue, for a time at least, to maintain that it was intended by its author or authors, whoever they were, merely as a book of divination ; and of course the oracles of divination were designedly wrapped up in mysterious phraseology. But notwithstanding the account of the origin of the book and its composition by king Wăn and his son, which I have seen reason to adopt, they, its authors, had to write after the manner of diviners. There is hardly another work in the ancient literature of China that presents the same difficulties to the translator.

When I made my first translation of it in 1854, I endeavoured to be as concise in my English as the original Chinese was. Much of what I wrote was made up, in consequence, of so many English words, with little or no mark of syntactical connexion. I followed in this the example of P. Regis and his coadjutors (Introduction, page 9) in their Latin version. But their version is all but unintelligible, and mine was not less so. How to surmount this difficulty occurred to me after I had found the clue to the interpretation ;⎯in a fact which I had unconsciously acted on in all my translations of other classics, namely, that the written characters of the Chinese are not representations of words, but symbols of ideas, and that the combination of them in composition is not a representation of what the writer would say, but of what he thinks. It is vain therefore for a translator to attempt a literal version. When the symbolic characters have brought his mind en rapport with that of his author, he is free to render the ideas in his own or any other speech in the best manner that he can attain to. This is the rule which Mencius followed in interpreting the old poems of his country :⎯ ‘We must try with our thoughts to meet the scope of a sentence, and then we shall apprehend it.’ In the study of a Chinese classical book there is not so much an interpretation of the characters employed by the writer as a participation of his thoughts ;⎯there is the seeing of mind to mind. The canon hence derived for a translator is not one of license. It will be his object to express the meaning of the original as exactly and concisely as possible. But it will be necessary for him to introduce a word or two now and then to indicate what the mind of the writer supplied for itself. What I have done in this way will generally be seen enclosed in parentheses, though I queried whether I might not dispense with them, as there is nothing in the English version which was not, I believe, present in the writer’s thought. I hope, however, that I have been able in this way to make the translation intelligible to readers. If, after all, they shall conclude that in what is said on the hexagrams there is often ‘much ado about nothing,’ it is not the translator who should be deemed accountable for that, but his original.

——

From: Legge, James (1882). The Yî King. In Sacred Books of the East, vol. XVI. 2nd edition (1899), Oxford: Clarendon Press; reprinted numerous times.  Preface, pp. 14-16.

What Legge says about the difficulty of understanding the I ching and rendering it into another language, to varying degrees, is true of all other texts written in Literary Sinitic / Classical Chinese.  Many of them are brutally difficult to fully understand in their entirety.  That is why I always spend the first few days of my Introduction to Literary Sinitic / Classical Chinese trying to dissuade all those students who I think are not prepared for the excruciating challenges they will face during the coming year in my class, and, indeed, for as many years as they persist in reading texts written in this dead language.  Achilles Fang (1910-1995), one of my mentors, also did the same thing in his classes.  He would ask us, "Why do you want to read these 'dirty books'?" and he referred to our profession as "Assinology".  Once you convinced Achilles that you were determined to stick with the daunting task of learning how to read Literary Sinitic / Classical Chinese, he would go all out for you, and he was devoted to teaching you all the intricacies of utilizing all the tools and techniques at your disposal, if only you had the stamina to do so.

Literary Sinitic / Classical Chinese is not for the faint of heart, and I am grateful to Achilles for imparting that wisdom to me.

Incidentally, my recollection is that Achilles did not have much use for the I ching, nor, for that matter, did any of my other teachers and colleagues.  Many of them scorned it openly — and yet, it is the most lastingly influential of all the Chinese classics.  It is this Gordian knot that Denis and I trying to untangle in the way we translate, interpret, and explain the Yì 易 (Changes).

See also:

"Philology and Sinology" (4/20/14)

"Which is harder: Western classical languages or Chinese?" (3/6/16)

"Chinese, Greek, and Latin" (8/8/17)

"Chinese, Greek, and Latin, part 2" (8/15/17)

[Thanks to Jane Reznik]

"Moron" considered dangerous

Oct. 10th, 2017 08:17 pm
[syndicated profile] languagelog_feed

Posted by Mark Liberman

In all of the foofaraw about Rex Tillerson calling Donald Trump a "fucking moron", no one seems to have picked up on the fact the Mr. Tillerson may have endangered his immortal soul. (And not on account of the expletive.)

In "The S-word and the F-word", 6/12/2004, I noted that the gospel quotes Jesus delivering a strongly-worded threat to people who call other people stupid. Thus Matthew 5:22:

Original: Ἐγὼ δὲ λέγω ὑμῖν ὅτι πᾶς ὁ ὀργιζόμενος τῷ ἀδελφῷ αὐτοῦ ἔνοχος ἔσται τῇ κρίσει: ὃς δ᾽ ἂν εἴπῃ τῷ ἀδελφῷ αὐτοῦ Ῥακά, ἔνοχος ἔσται τῷ συνεδρίῳ: ὃς δ᾽ ἂν εἴπῃ Μωρέ, ἔνοχος ἔσται εἰς τὴν γέενναν τοῦ πυρός.

Transliteration: Egô de legô humin hoti pas ho orgizomenos tôi adelphôi autou enochos estai têi krisei: hos d' an eipêi tôi adelphôi autou Rhaka, enochos estai tôi sunedriôi: hos d' an eipêi Môre, enochos estai eis tên geennan tou puros.

KJV: but I say unto you, That whosoever is angry with his brother without a cause shall be in danger of the judgment: and whosoever shall say to his brother, Raca, shall be in danger of the council: but whosoever shall say, Thou fool, shall be in danger of hell fire.

NASB: But I say to you that everyone who is angry with his brother shall be guilty before the court; and whoever says to his brother, ' You good-for-nothing,' shall be guilty before the supreme court; and whoever says, 'You fool,' shall be guilty enough to go into the fiery hell.

The Greek word translated "fool" in that verse is precisely μωρός, which is the etymon of "moron", as the OED explains:

Etymology: ancient Greek μωρόν, neuter of μωρός , (Attic) μῶρος foolish, stupid (further etymology uncertain: a connection with Sanskrit mūra foolish, stupid, is now generally rejected).

I would have filed this post under "theology of language", but our wildly excessive number of categories doesn't include that possibility.

 

What is Trump demanding now?

Oct. 10th, 2017 08:05 pm
[syndicated profile] languagelog_feed

Posted by Ben Zimmer

Here's a nice crash blossom (that is, a difficult-to-parse ambiguous headline) noted on Twitter by The Economist's Lane Greene, with credit to his colleague James Waddell. In The Financial Times, a promotion of an article inside (a "reefer" in newspaper-speak) is headlined: "Trump demands dog 'Dreamers' deal."

The headline for the article as it appears online is: "Trump's demands temper hopes of immigration deal." And the lede explains: "The prospects for a bipartisan deal to protect 800,000 immigrants brought to the US illegally as children are facing new doubts as Donald Trump pushes a hardline list of immigration and border security demands to Congress as a condition for his backing."

As Lane observes, the ambiguity is set up by the use of demands as a plural noun and dog as a verb, when it's quite easy to go down the garden path thinking demands is a verb and dog is a noun. So a casual reader might think Trump is demanding a "dog 'Dreamers' deal" (and since this is a British newspaper, such a noun pile isn't out of the question). Alternatively, he could be demanding that "dog 'Dreamers'" have to deal with something.

The ambiguity is helped along by a couple of journalistic expediencies. First, in the intended reading, the subject of the sentence is the noun phrase "Trump demands." There would be no ambiguous reading if it simply read "Trump's demands," as in the headline to the online article. But given the space requirements for the "reefer" headline, there might not have been room for that extra "'s" in print.

We can also blame the terseness of headlinese for the verb dog, which lends itself well to crash-blossom readings — see, for instance, Mark Liberman's 2015 post about the Reuters headline, "China Nov inflation edges up, but deflation risks dog economy." The structure of "deflation risks dog economy" closely mirrors "Trump demands dog 'Dreamers' deal," with an intended reading of N-N V N ambiguously flipping to N V N-N, thanks to the use of dog as a verb. As Jonathon Owen commented on the 2015 post, "I think we can add the verb 'dog' to the list of words that journalists use that nobody else uses."

[syndicated profile] languagelog_feed

Posted by Victor Mair

Public notification posted in villages of Makit County (Màigàití xiàn 麦盖提县; Mәkit nah̡iyisi / Мәкит наһийиси مەكىت ناھىيىسى) near Kashgar, Xinjiang Uyghur Autonomous Region (XUAR):


Source

A few key terms:  kěn 垦 means "reclaim; reclamation".  Sìshíwǔ tuán 四十五团 refers to a particular military-agricultural bīngtuán 兵团 ("corps; brigade") in Xinjiang.  Some of my archeological investigations in Xinjiang were carried out in areas belonging to such bingtuan.  Life in these bingtuanis usually harsh and the land is generally stark.  One of the main tasks of the bingtuan is to reclaim desert land and make it suitable for agricultural use.

Jǐnjí tōngzhī

Guǎngdà jūmín tóngzhì:

Xiàn jiē dào kěnqū gōng'ān jú yāoqiú, yào duì běn xiáqū jūmín jiāzhōng suǒyǒu lìqì, jí càidāo, fǔzi, tiěqiāo, chútóu, tiěchā, gāngguǎn, pí jiákè xiǎodāo děng lìqì shàng jìnxíng dǎyìn shēnfèn zhèng hàomǎ, xiànqí 10 yuè 5 rì zhì 8 rì sān tiān, dìdiǎn zài gè xiǎoqū dàmén chù, fán bù jìnxíng dǎmǎ de dāojù, yījīng jiǎnchá yīlǜ mòshōu, wèi pèihé hǎo dǎmǎ gōngzuò, jūmín xūyào xiédài běnrén shēnfèn zhèng, bìng měi jiàn dǎmǎ yòngjù shōufèi 4 yuán, tècǐ tōngzhī.

Sìshíwǔ tuán yī shèqū

2017 nián 10 yuè 5 rì

紧急通知

广大居民同志:

现接到垦区公安要求,要对本辖区居民家中所有利器,即菜刀,斧子,铁锹,锄头,铁叉,钢管,皮夹克小刀等利器上进行打印身份证号码,限期10月5日至8日三天,地点在各小区大门处,凡不进行打码的刀具,一经检查一律没收,为配合好打码工作,居民需要携带本人身份证,并每件打码用具收费4元,特此通知。

四十五团一社区

2017年10月5日

Urgent Notice

All residents-comrades:

Now, in responding to the reclamation area Public Security Bureau’s requirement, it is necessary to stamp national ID numbers on all sharp implements, such as kitchen knives, axes / hatchets, shovels / spades, hoes, pitchforks, steel pipes, leather jacket knives (VHM:  presumably knives that are small enough to be put in the pocket of a leather jacket), etc. in the houses of all community residents in this jurisdictional area within the time limit of three days between October 5th and October 8th. The location is at the main gate of all communities. All tools which are not stamped with ID numbers will be confiscated upon examination. In order to coordinate with the task of stamping ID numbers, community residents must bring their own national ID card and will be charged 4 Yuan for each implement that is stamped. It is hereby announced.

The 45th Corps Community No.1

2017.10.5

[h.t. Geoff Wade; thanks to Jinyi Cai]

[syndicated profile] languagelog_feed

Posted by Victor Mair

Photo taken in Hangzhou by Nikita Kuzmin's Chinese teacher:

This can be read either as:

wúwèi zhīzú 吾味知足 ("I / my flavor know / feel sufficient / content")

or as:

wǔwèi zhīzú 五味知足 ("five flavors know / feel sufficient / content")

It seems as though the Chinese are having a lot of fun with this quadrisyllabic, disemous character, as is evident from this blog post and this online user forum.

The native speakers of Chinese whom I approached for their opinions about this character are pretty much evenly divided on which of the two readings they think is better, though there seems to be a slight preference for the latter:  wǔwèi zhīzú 五味知足 ("five flavors know / feel sufficient / content").

Three respondents remarked thus:

1.
My first reaction is that wǔwèi zhīzú 五味知足 ("five flavors know / feel sufficient / content") makes more sense to me since it's related to food. Wǔwèi 五味 ("five flavors") might refer to the five flavors: suān tián kǔ là xián 酸甜苦辣咸 ("sour sweet bitter hot / spicy salty"). When you can taste all five, it means you are eating something special (in a good way). Wúwèi zhīzú 吾味知足 ("I / my flavor know / feel sufficient / content") sounds a little bit strange to me. But it also makes sense. I interpret it as "I am satisfied with my taste / the things that I am tasting". But once I think about it more, it also fits the restaurant theme because when I am satisfied with the things I am tasting, I must be happy with the restaurant.

2.
I just realized why I naturally think wǔwèi zhīzú 五味知足 ("five flavors know / feel sufficient / content") sounds more common than wúwèi zhīzú 吾味知足 ("I / my flavor know / feel sufficient / content").

Somehow I am more familiar with the sequence of wǔwèi zhīzú 五味知足 ("five flavors know / feel sufficient / content"), meaning wǔ zhǒng wèidào (wǒ) dōu zhīdàole 五种味道(我)都知道了 ("[as for] the five kinds of flavors, [I] know all of them").

However, wúwèi zhīzú 吾味知足 ("I / my flavor know / feel sufficient / content") literally means wǒ chī de wèidào, wǒ hěn zhīzú / wǒ quánbù zhīdàole 我吃的味道,我很知足/我全部知道了 ("of the flavors I've eaten, I know them sufficiently / I know them completely"). (This is the sequence of Chinese sentences. But now my Chinese grammar is broken and I think the opposite way.) It's very interesting for me to reflect on this.

3.
I feel like wúwèi zhīzú 吾味知足 ("I / my flavor know / feel sufficient / content") makes more sense to me in terms of the visual pattern: every character around the outside borrows the "口" in the middle.

If anyone wants to observe an elaborate polysyllabic Chinese character in the wild and happens to be near the University of Pennsylvania, go over to the Han Dynasty restaurant at 3711 Market Street and you will see a version of this at the front desk:

That is read zhāocáijìnbǎo 招財進寶 ("bring / usher in wealth and riches").

Some earlier posts on polysyllabic characters:

[Thanks to Yixue Yang, Jinyi Cai, and Fangyi Cheng]

September 2017

S M T W T F S
     12
3456789
10111213141516
17181920212223
24252627282930

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags