Dealing with scrapers: when people steal your content

Share |

One of the great things about the social web is the culture of sharing that it fosters. One person writes a blog post; another quotes it, disagrees with some parts, corrects a passage or two, and adds some more information; a third synthesizes it all into a cool infographic. It's a little like the coolest potluck dinner in history.

But every great potluck dinner seems to attract the folks who have no intention of cooking a damn thing. They're there to gorge on as much Jell-o salad and chicken fingers as they can before someone notices they didn't bring any dishes themselves.

And in the Web 2.0 world, we call one particular breed of these people scrapers.

Scrapers use automated software to prowl the web for content, scoop it up and drop it in their own blogs - often without attributing it or linking back to the original source. Vendors of that software advertise it as a way of creating a fully-populated web site out of thin air - and gloss over the fact that it's essentially automated plagiarism.

One such site is the Bookmark Devil blog. Bookmark Devil itself is, well, kind of a baffling site - kind of a very low-traffic Digg - built, it appears, mainly to promote the author's search-engine optimization software.

But it also includes a blog, and that's where the scraping comes in.

The front page of their blog is an article that bears what we'll call a striking similarity to the Wikipedia entry on social bookmarking:

Screen capture of Bookmark Devil blog front page

Scroll down a little, and you'll see the list of "last posts". And if you've been following our blog recently, many of those titles will look mighty familiar. They sure look familiar to Alex and me: we wrote them.

Bookmark Devil list of other people's blog postsEach of those titles links to a page that gives no hint that the author was anyone other than Bookmark Devil. And each is a blog post lifted from this very site.

So who's getting hurt?

Like most authors, we're usually happy to see our work reaching a wider audience. That's why we blog, and why we publish a news feed.

But authors deserve credit for their work, and thinkers deserve credit for their ideas. (For that matter, we deserve to be held responsible if our ideas end up a little wonky.)

People who use search engines ought to be able to find what they're looking for without having to sift through a morass of link farms and spam blogs.

And when you do land on a page, you deserve to know you're dealing with the real thing... especially if you want to engage the writers in conversation, which is ultimately what makes the social web social.

How to tell if your blog is being scraped

Fortunately, scrapers usually want their work to be visible to search engines. And that makes those same search engines your allies in hunting down sites that are taking liberties with your content.

Services like Google Blog Search and Technorati allow you to create RSS feeds from searches on terms such as your name, your blog's name and your blog's URL - any of which a scraper can inadvertently include in a post they lift from your blog's news feed. Monitor those feeds (which you should be doing anyway, to see who's talking about you and your blog), check out any hits, and you'll know as soon as a scraper strikes.

Been scraped before, or feeling especially vigilant? Conduct periodic manual searches on distinctive phrases in your most recent blog posts, and see if they're turning up somewhere they shouldn't.

When you do find hits, ask yourself if you're really being scraped, or if this qualifies as fair use (fair dealing in Canada). A lawyer can help you clear up any gray areas. One big question to ask: is the site that uses your content claiming it for its own, or acknowledging you as the writer and linking back to you? If the latter, they may still be violating your rights, but at least they aren't plagiarists.

How to fight back

Before we go any further: if you think you may want to seek compensation or pursue legal remedies, then stop reading and call a lawyer. They'll be able to advise you.

Still with us? Great.

Preventive medicine is the best kind. Add a copyright notice to your site, spelling out just what you do and don't permit. Depending on your needs, Creative Commons could have the solution for you.

Designing your blog template to add a byline and a link to your site to every post means that, even if it gets scraped, people will be able to see where the original article came from.

If you're feeling extra-geeky, add that attribution to just your news feed (so it doesn't bother visitors to your site). If you're feeling less-than-extra-geeky, FeedBurner lets you do that easily with its FeedFlare feature - look for the Attribution FeedFlare, and add it to your feed.

Okay - let's say you're being scraped. First, take a breath and decide what outcome you want. Do you want the content removed? Do you just want an attribution to you and a link to your site? Do you want something in between - say, attribution and a link, but also the removal of all but an excerpt?

Now look at the site. If it isn't jammed to the gills with banner ads, AdSense blocks and text links, there's actually a chance the site's owner doesn't understand that what they're doing is wrong.

Look for contact information, and drop them a polite but firm note to let them know you're unhappy, and to tell them what you want them to do about it. Give them a deadline - 48 hours is reasonable.

If they respond, great. They may ask for more time; use your judgement in deciding whether to give it to them. They may throw a fit - "don't you know everything on the Internet is free, dude?!" - in which case point to the notice on your site and restate your expectation for their action.

If that doesn't get you anywhere - if you receive no response or an unsatisfactory one - then you have a few options:

  • You can call a lawyer, if you think it's really worth fighting over. (Chances are it isn't - especially if the scraper is located in a faraway country, as so many are.)
  • You can decide it's not worth the fight, and chalk it up to human nature.
  • You can leave comments on the posts on the scraper's site that indicate they are being republished without permission - but those are liable to being deleted by the scraper.
  • You can report them to the search services (Technorati, for example, uses their troubleshooting form to allow you to report spam blogs), and - if the service decides your complaint is well-founded - have them removed from those services' databases. If they're using your content to drive traffic and search results, you'll be hitting them where it hurts.
  • Or you can publish a post that explains why scraping is wrong and tells people what they can do about it... and wait to see if it turns up on the scraper's site. (Hola!) Just be sure you don't link to them and boost their Google ranking - although mentioning their web address in text (e.g. "bookmarkdevil.com/socialbookmarking") should be fine.

Whatever you decide, you may want to use your social media search services to stay on the lookout for people linking to and discussing the scraper's posts. A comment on those third-party posts may be all it takes to correct the record and assert your rightful ownership over your content.

And remember to keep it in perspective. You don't want to get so obsessive over this stuff that you forget why you share your thoughts and creativity in the first place (otherwise, you're in danger of becoming a record company).

Comments

Raul says

June 11, 2008 - 1:43am
Thanks for this Rob, I have to say that I haven't found my content being scraped (and I sure as hell hope I'm not inviting anyone to scrape my content!) - I guess my blog is so boring it doesn't invite scrapers :) Yours and Alex's on the other hand, are much better and thus sensitive to these kind of scrapers! Rebecca Bollwitt has told me she's had some content scraped, and I hear this story more and more. Which makes me think - am I really all that boring? :) Hope you're having a great week! Thanks for these posts!

Rob Cottingham says

June 11, 2008 - 4:24pm

Well, if I was getting into scraping, I'd be putting your blog high on the top of my list. Informative, provocative, insightful, personal and diligently updated.

(Gentle reader, if you're a scraper looking for great content - or just someone with an interest in environmental issues, especially in the Lower Mainland - head on over to Raul/Hummingbird604's blog.)

And Raul, I can put in a good word for you with the people at Bookmark Devil. Just say the word - I know a guy.

Monica Hamburg says

July 25, 2008 - 11:50am
Dear Spammy Scraper, You insult not just Raul but the community at large when you neglect to scrape from his site. Do you think he has no feelings? For shame!

johninnit says

August 19, 2008 - 6:22am
I think Bookmarkdevil deserve some kind of chutzpah-bot prize for this one: http://bookmarkdevil.com/socialbookmarking/tagging/dealing-with-scrapers-when-people-steal-your-content-2/ Found it whilst nearly being suckered into commenting on another of your stolen news stories on their site, rather than leaving it here :(

johninnit says

August 19, 2008 - 1:54pm
Gah! Add another to your list of those hurt by this - my pride. I got here through a trackback post to my blog from Bookmarkdevil, which suggested it was written today. I did a google search on their title to see who really wrote it, but I didn't think to double check the wrong date from Bookmarkdevil, so end up giving you a comment you knew 2 months back.

Michelle` says

August 7, 2009 - 9:35am

Hi-

Thanks for this info but it doesn't eliminate my fury. I am constantly getting articles stolen right off my website or from ezinearticles!  Some of the scrapers don't even have contact listed. This pisses me off. I contacted another and no response.  Is there a way to report to Google?

The funny thing is my articles are written with personal experiences and these idiots copy word for word!  no  link back to my site or credit.  I do all this hard work for some fool to steal it?  I hate thieves.

I'm just about fed up with this and while i can't afford to hire a lawyer, is there anything else i can do to at least not have my work discounted by the search engines and credited to another???

 

Thanks

Debi Ward Kennedy says

September 29, 2009 - 4:32pm

sigh.

Been dealin' with this $#@&%$^^(& for a MONTH now. Jerk is duplicating my posts from two business blogs, filling his blog (one of twelve spam blogs) with MY content and getting Google ads revenue from it. Yes, he links back to my blog (general link, not to the post itself) AND lists me as a 'contributor' to his blog - only problem is, he never asked nor notified me about this. So it's theft, despite the byline & linkback. He's also stealing third-party content, because he's publishing photos & info that my clients have given ME permission to publish - but not him. DUMB. And I'm not the only one - he has a list of 'contributors' who knew nothing about this.

I've learned SO much, and articles like this one are a Godsend. One thing I learned about that you don't mention here is filing a DMCA notice with the HOST of the offending blog or web site. It's what you do AFTER your nice polite emails are ignored by the blog/site author. The web host is required by law to investigate and if it is unauthorized use, they'll pull the site down.

I am going to add a list of links to info about this issue to my blog, because I think that if more of us stand up against this act, we help those who either don't know how yet or feel inadequate to fight the scrapers. I'm still working on it, but you can read more in my last month's worth of posts: http://decodivadebi.blogspot.com

t says

October 15, 2009 - 7:11am

I'm thinking of starting a feed centering on thoughts of today but I don't to post my creative thoughts and end up having them scraped. I wonder if it's worth having to go through the whole process of having to find a lawyer and all that. I think my material is very interesting and would be a welcome mat for scrapers.

thelamestdotcom says

December 17, 2009 - 9:12pm

sensational article! I've just launched a new blog and although it is in it's infancy I'm spending every spare waking hour, working on what I think are quality posts.

I've been a tad paranoid about people wanting to steal content and concepts, whilst balancing the need to get the blog and articles out there to a wide audience.

 Reading this article summed up perfectly what I should be doing and what angle to take to any perceived scrape. Thanks for the great post.

 

Anonymous says

March 8, 2010 - 12:58pm

Hi
Its really great to see your web site
It would really be a great help if you would write an article about how to deal with when you are working really well , possible alone and concentrating privately, when someone comes and not only sits beside you but also sits in your personal space right on top of you. Has anyone ever experienced that? So you feel you are loosing your concentration and dont know what to do about it?
Any suggestions welcome......thanks

David says

September 12, 2011 - 5:56am

What really annoys me is when content is scraped and then lightly spun to appear more original. I just came across one of my articles from my blog scraped and spun on to this pathetic site - artemklimenko.com (hope you don't mind a little name and shame, edit out if you do!). This is such a pathetic automated site that the guy has even left a 'sample page' tab on his menu bar, complete with lorum ipsum style filler! What a ****.

I do think that scraping isn't too big a problem if you ensure that your content is indexed first; even a modified post date on the scraper's site can't hide the fact that the big G found yours first. Ping your blog and feed right after posting - I think indexing date/time takes priority over posting date.

Only get really peeved and take action if the scraper outranks you for your keywords, or (worse) for a phrase of your original article. In this case, DMCA is the way forward.

Social Signal on...

RSS feedTwitterFacebookGoogle+

Work Smarter with Evernote

Get more out of Evernote with Alexandra Samuel's great new ebook, the first in the Harvard Business Press Work Smarter with Social Media series!

Available on Amazon, iTunes and HBR.

Join Newsletter

Rob on Twitter