I’ve received a lot of feedback on a post I wrote about content farming on Amazon. I suggested that there was little Amazon could do about the problem without spending money—lots of money—to regulate the content uploaded to its site. The result for indie authors would be increased costs (a fee for uploading most likely) because Amazon wouldn’t pass on the cost of regulation to consumers.
I also pointed to the obvious defects in proposals to regulate content farmers. Seth Godin’s idea that we be able to rate publishers, for example, is useless when it’s so easy to republish the same content under a different name. Pan and ban “Hephaestus Books” today, it becomes “Vulcan Books” tomorrow.
In this post I want to explain in more detail why it’s so hard to regulate content and why the only solution is likely consumer awareness. Let’s break it down into the three banes of online retailers and their customers: (1) recycled public domain books, (2) content farming and (3) copyright infringement.
1. Recycled Public Domain Books
Everyone knows that unscrupulous entrepreneurs prey on unwary buyers by uploading for sale e-books that have long been public domain (i.e., free because the copyright has expired). Amazon seems to have tried to combat this problem by offering public domain books for free. Nonetheless, the continued presence of PD material for sale suggests that enough unwary buyers can still be found.
On the face of it, the solution is to prevent anyone from selling public domain books. But this is a lot harder than it seems. Take Benjamin Jowett’s translations of Plato’s dialogues, which have been public domain for years. This dubious “publisher,” for example, has uploaded Jowett’s translations as an e-book and slapped a $5.99 price on them (you’ll notice my scathing Amazon review).
It seems like there’s a simple technological fix for recycled PD books: have Amazon scan uploads for public domain material and block them before they hit the shelves. As a second layer of surveillance, wary customers could report these books. The system wouldn’t have to be perfect to work; it only has to make uploading public domain books unprofitable.
But here’s where things get muddy. Like other legitimate publishers, Anchor Books also offers a print version of Republic in Jowett’s translation. Other publishers offer other titles. Then there are outfits like “Dodo Press,” which offer print and e-book versions of Jowett’s translations. This is where you get into the problem: how can you ban one publisher without also banning the other? And how do you practically (i.e., using software) distinguish the content farmer from the legitimate publisher?
We haven’t even broached the possible permutations that make gate-keeping so tricky. Consider, for example, that it’s common for scholars to combine Jowett’s translations with an original introduction, commentary and notes or to group thematic combination of his translations with the same accompaniments (e.g., the Euthyphro, Apology, Crito and Phaedo tetralogy). Hundreds of these books have been published over the years. But how does one tell the difference between these books and the cheap knock-off, that is, a Jowett translation plus a wiki article?
Now, I’m not suggesting for a minute that “C & C Web Press” or any other content farmer has any scholarly merit whatsoever. What I’m saying is that it takes an expert to tell the difference between a scholarly translation-plus-intro-and-commentary from tacked-on junk-copy intended to pass any form of “original content” requirement set up by Amazon. In other words, I know the difference between a Wikipedia article attached to a public domain translation of Plato, but software and non-experts cannot reliably tell the difference.
The bottom line is that there is no cheap solution to the problem. Hiring subject or content experts is expensive, an expense that will be passed on to sellers, not consumers. I know of no software solution that could work in principle without constant oversight by experts; likewise, empowering amateurs to act as gatekeepers (whether for pay or under a “wiki” model) is simply unrealistic. Only consumer awareness can slay this beast. That means it’s up to all of us to act as gatekeepers through the consumer review function.
2. Content Farming
A “content farmer” is someone who publishes worthless copy or cuts and pastes online material (generally someone else’s) into e-book formats and sells them on Amazon and the other retailers. I emphasize that last bit because some people are under the erroneous impression that Barnes & Noble and Sony regulate this sort of thing. Some of it maybe; but definitely not all of it. Here’s a whole series of crap that’s sold on all three retailers (B&N, Sony, Amazon), and that’s only the tip of the iceberg.
Now, I point to this particular series, not only because it puts the lie to the “only Amazon sells junk” myth, but because it illustrates how hard it is to combat content farming. If you look at the samples you’ll notice that none of the books seem to be copied directly from an online source (e.g., Wikipedia). They’re all just extremely short, low-quality essays. No software that I know of could detect this sort of crap. Sure, I can tell you it’s crap from the first page, because a lot of the material falls into my area of expertise. But it’s original crap, so it won’t be flagged by software designed to look for previous publication online or for copyright infringement.
Expert gatekeepers are probably the only solution. But the same experts will also prevent a lot of genuine “amateur” material from being published. Joe Blow’s personal “philosophy of life” won’t be published either. No big loss maybe; but I have to wonder if it’s really practical (to say nothing of fair) to enter into battles with individuals who want to upload their personal stories and reflections. Unlike the real content farmers, they’re not motivated by profit. They won’t just go away when it becomes unprofitable to upload junk content. They’ll just keep plugging away until someone relents, thereby representing an ongoing expense.
So, again, the only solution is buyer beware or expensive experts. But it turns out that there’s an additional negative side to the expert-solution: they’ll clean out a lot of indie material in the process of cleaning out the content farmers.
3. Copyright Infringement
I’ve been told that people have uploaded others’ books under different names, titles and covers. So far, it’s anecdotal. I haven’t really seen much evidence or read much to suggest that it widespread on the retailers. (I do think torrents and so forth will be one of, if the major problem for indies in the future, however).
The only solution here is the law and vigilance. If it happens to you, get a DMCA order and, if practical, sue for lost revenue. But managing copyright violations is as difficult in practice as managing content farming. Just as students paraphrase other’s papers to avoid keyword searches, so a smart thief will paraphrase a book to avoid the sort of filter that might pick up infringement. Like most people, I could still tell. But I know of no software that can detect crafty plagiarism.
Finally, let me reiterate the basic points I’ve made in this post and the last one: (a) I’m not saying it can’t be done; I’m saying it’s expensive to do it; (b) the cost will be borne by sellers (mostly indie writers), not consumers; and (c) the best solution is probably consumer awareness.