Skip to content

Legal Bloggers: Getting Your Content Scraped? Fight Back!

“Once your Internet footprint reaches a certain size, chances are people will start scraping your content.” Link.

As anyone who went to law school knows, it’s important to cite your sources. Note the quotation marks and the fact that the above his hyperlinked, for example of proper online citation form.

In an academic setting, failure to properly cite can give rise to charges of plagiarism.  In a court room, it robs one’s argument of authority.  On the Internet, however, it would seem that it’s a shortcut to page hits and google-ad dollars.

In fact, ctrl-c-ctrl-v-ing someone else’s blogged content has even been affixed a new name to go along with its new arena: Content Scraping.

Content scraping is an obvious copyright violation, but, if one writes well and often enough, as the introductory quote from blogger Eileen Smith’s post about dealing with the fallout of being ‘scraped’ indicates, it becomes a fact of life online.

So, what’s a legal content producer to do?

Here’s some options, ‘curated’ from the interwebs:

Cease-and-desist ‘em

The most direct way to stop content scraping is to contact the scraper via a cease and desist email.  The online anti-graphic redistribution group R.I.G.H.T.S has a model cease and desist email on their website (and it’s free to use).

The advantage of a cease and desist approach is that you only have to do it once your rights have been violated, and it’s a familiar process.  If you’re a first-time scraping victim, this is probably a good first step at getting back control of your content.  You know who has stolen your work, so you ask them to take it down or else.  As a legal writer, you’re probably familiar with the process of backing up a cease and desist email with follow-up communications and/or service of process should the offending content not be removed in a reasonably prompt fashion.

The disadvantage to the cease and desist approach comes when you’re no longer dealing with a single or small group of content scrapers.  When you’re attacked by piranhas, you can’t kill them one at a time.  That’s when you need to move on to more technical solutions.

‘Tag’ your content and set up google alerts to monitor its life-cycle

A second level approach to combating content theft is to tag your content with unlikely phrasing and then set a google alert for that phrase.  This is akin to the way Monsanto tags its seeds.   For example, in the course of writing an article on, say, the tax implications of declaring a dividend, find a quote from Learned Hand that anchors a point that you’re making.  Try and place that quote ‘tag’ deep into the body of your article or post to avoid it being excised along with the headline or lede if the content scraper is savvy enough to attempt cosmetic alterations to avoid the google alert approach.  Then, make the Google Alert, and you’ve essentially created a sonar system that will ping you back whenever someone has used your content, then, cease-and-desist them.

Creative Commons

Another approach to content rights management is publishing your content under a Creative Commons license.  You can read about Creative Commons as a philosophical approach to the Internet’s copyright ecology here.  But, to put it simply, Creative Commons is a way to allow people to repurpose your content within certain bounds, and is respected by content producers and distributors as an adaptation of copyright to a virtual environment that facilitates and is, itself, dependent upon socialized dissemination of information.  This choice allows your work to get greater exposure, and directs re-purposers to give credit to you, the original author.

Join a clearinghouse, once they’re up and running.

A step-up from Creative Commons, and an approach that directly monetizes your shared content is the clearinghouse.  A clearinghouse is essentially a name brand creative commons for professional writers.  In a clearinghouse regime, your content is placed in what has come to be called a digital ‘wrapper’.  That wrapper contains information about where the content came from.  It also communicates with the clearinghouse itself, registering impressions or click-throughs or however else that content is licensed to the content distributor who picked it out of the clearinghouse inventory.  The clearinghouse acts as a middleman of sorts, billing the distributors and paying out to the producers based upon that licensing agreement.

The advantages of this approach to content producers is obvious, as the clearinghouse does the monitoring, enforcement and billing for the producers.  Also, being registered with a clearinghouse as a content producer is, itself, a form of marketing for the content producer, as such will distinguish a content producer from the crowd.

The disadvantage is that the clearinghouse approach isn’t exactly in existence yet, but that will change very soon.  The Associated Press has a system that they report will be up soon, and Martin C. Langeveld of Circlabs reports that Mizzou is developing one as well:

“I’m working with Randy Smith and other faculty from the Journalism School, Law School, Computer Science and Agricultural Economics to study the emerging clearinghouse systems designed to govern an orderly exchange of news content under commonly accepted rights and payments protocols. These systems will be enabled by tagging systems already developed and deployed, and in addition to the rather small revenue opportunity created by deterring unauthorized re-publication, they will create transformative new opportunities for content to find a wider audience and for publishers to deliver a richer mix of content to their audiences.”

Anti-crawl and it’s discontents

In the meantime, I’d like to steer you away from Anti-Crawl and other content-tagging software.  Based upon my research, it looks like these solutions are awkward, easily evaded and often over-priced (though Anti-Crawl offers a free script to start out with).  The guerilla approach of self-tagging with quotations or phraseology accomplishes the same goal without the cost, and, as a legal professional, you’re in a position to handle most content scraping situations with an email with electronic letterhead.  Nobody wants to get into a paperwork war with a lawyer.

Tags: