Categories
Rambling Syndication

The Plague of Plagiarism

Bloggle Bloggle
I Can Haz Ur Blogz?

If you’ve got performance troubles with an application that stores data in SQL Server, and especially if it’s a home-grown application (not a store-bought app), you can get dramatic performance improvements simply by focusing on some basic indexing techniques.  These tips and tricks pay off more than pouring money into hardware that might look good sitting in the datacenter, but doesn’t really make the application significantly faster.

When I go into a shop to speed up an application I’ve never seen before, two of my favorite quick-hits are from the index performance tuning queries from SQLServerPedia:

  • Find unused indexes – these are indexes the SQL Server engine says it’s not using.  Unused indexes incur a speed penalty because SQL Server still has to add/update the indexes as records change, so they make writes slower.
  • Find missing indexes – these are indexes SQL Server wishes it had available.

The above paragraph and information sounds good doesn’t it? Well guess what? They’re not my words, they’re Brent Ozar’s words from his article on SQL Server Index Tuning Tip: Identify Overlaps. Now had I not mentioned that fact, would you think I wrote it? Well that is the meat of a hot topic that spawned today on Twitter in the SQL community. Apparently a young man who recently graduated college decided to open a blog focused on SQL Server, UNIX and Oracle. The problem lay in the fact that all the articles on his site were not his own. What this guy did was use an RSS aggregator to point to prominent sites (such as SQLServerpedia, SQL Server Central and SQLblog.com amongst others) and so their content was then published on his blog. The big deal was that 1) He didn’t ask permission to republish their content and 2) He didn’t make it clear that the article you were reading was written by someone else. The ONLY credit given is a very tiny blurb on his About Me page that says “materials in this site has been collected from various sites and blogs and for that I thank them”. Riiiiight, that’s not exactly proper citation. Given that this guy claims to also have gotten a Master’s degree, I would think at some point in his educational studies the mention of proper citation and what plagiarism is.

Upon learning of this site and its apparent violations, members of the SQL community who had their intellectual property infringed upon took action by leaving fairly straightforward messages on this about page informing him that he was in violation and needed to remove content immediately otherwise harsh actions could be taken. Within a relatively short period of time the author got the message (sort of) by removing the menu options on his site, yet the content still remains if you look for it. Due to this fact I expect DMCA notices to start flying shortly and if the blogger still fails to comply then his hosting company should drop the axe. What’s interesting about this situation is the conversation that was spawned afterward on Twitter.

Todd McDermid (@Todd_McDermid on Twitter) had the opinion that the SQL community reacted much harsher than we should have and blogged about it in his article Another Instance of Plagiarism. In his post Todd does bring up some good points in that perhaps the hardcore lynch mob approach was a bit rash and perhaps a gentle “hey buddy, do you realize what you’re doing is stealing?” might be a more diplomatic approach but the problem is that many of these guys in our community whom are prominent bloggers have been burned plenty in the past. Brent blogged on How to Take Action When Your Content is Plagiarized and in it described a situation (not first, definitely won’t be last) where someone decided stealing content was acceptable. Are the messages left on violater’s site that they have to take down content or face DMCA report the nicest? Maybe not. Is it a necessary evil because of the countless times these bloggers and authors have had to deal with this? Absolutely. Todd outlines that this particular case the blogger in question may be starting out and not know better but after a bachelor’s degree, master’s degree and reading the numerous blogs he’s aggregating (which incidentally have blogged before on just this topic) he should know better. One could argue about what exactly constitutes “common sense” in this right but I’d argue someone who has gone through a master’s program should be very well versed in the art of writing and proper citation.

Some would argue that “nobody lost out” by what this guy was doing but I would disagree. This is a violation of someone’s intellectual property. When you decide to blog on a technical level you are taking your time to help educate the masses. You’re putting in a lot of hard work into formulating something that is uniquely yours and sharing it. “But Jorge, this guy was just sharing FOR you (well not me because I’m not worth stealing from which is comforting on some level for the moment)”, yeah but he wasn’t making it clearly known that it was not his work. If someone has a SQL problem and they quickly Boogle out a question what if they come across the aggregator’s content before they get to yours (the source)? If the person needing a quick answer simply finds the answer on his blog and goes on his way, guess who gets credit for that? The thief. Sure this guy put a very tiny note in his about page but who is going to look in there when reading content? Now let’s take this up a notch. You’re on the job market and the prospective company (like many do) do a search for you on the internet to see what pops up. Imagine how good your word and reputation would be to them if they saw an entire community backlashing on you because you were knowingly stealing content? It’s not worth it!

So what does one do? Well you could ask the author for permission to repost content. For the record I asked Brent if I could borrow that first paragraph for this purpose (thanks Brent!). Or, and I know this is crazy, COME UP WITH YOUR OWN CONTENT! It’s not easy but it definitely pays off and in the end you get mad street cred *fist bump*. So don’t steal content, people work hard to produce this stuff and in the end you’re only going to make yourself look worse by pretending to be something you’re not.

Reblog this post [with Zemanta]

26 replies on “The Plague of Plagiarism”

Jorge,

A very well written and thought-out response. I agree that a swift and stern response is required, particularly when the offender is obviously well-educated (or at least claims to be) and would have known without a doubt that what he was doing was both unethical and illegal.

I will say that, having followed the dialog between Todd, Aaron, yourself, and others, I am impressed at how this community can disagree so strongly with each other yet remain civil and engaged in the debate. Even though Todd disagreed strongly with the community response to this guy, he behaved professionally and was treated with respect by those who had been burned by plagarizers before. A big +1 for the SQL Server community!

Tim

Great post, Jorge.

I was reticent at “throwing my hat in the ring” with that blog post and the first few comments in Twitter today. Thanks for the spirited and professional debate! I probably was a little too easy on the guy – I didn’t catch that he had masters degree at the time. That, and “it hasn’t happened to me yet.” 🙂

All in all, as you say, a very bad move for that guy. It could cause him significant grief in the future, as nothing posted on the intertubes really seems to go away. Here’s hoping he grabs a clue.

Todd

Thanks! As others have already said I have nothing but respect for you and everyone involved in today’s debate. We all managed to stay civilized and at the end we all came out with respect intact. One thing I have to say is that this is SQL community is unlike any other that I’ve been a part of. Everyone is supportive and even in instances like this we can forge community ties, extend knowledge and grow together. Hopefully this guy learns his lesson and can contribute his own stuff after this.

Great post of what went down today Jorge.

I thought about the situation quite a bit tonight and I think the way it played out was appropriate given the situation and the way the site was built. After seeing the listed credentials (possibly fiction) of the owner of the site, I agree that there is no way he could not have known what he was doing. There was no apology or reply to any of our emails that were sent either. Those emails were written professionally and not harsh in anyway and only described the problems with the content and methods. That leads us to assume he probably is not taking it as seriously as it really is. We can only hope the person has learned something from this. Somehow, I highly doubt it though.

Can you believe this SQL Community? I hope you Oracle people realize how lonely you really are! 😉

Ted

I totally agree that what he did was wrong but I still think the community reacted far too quickly in burning him.

If he worked for any of you, what would you have done about it? Sack him? Destroy his career? or tell him to take it down and assign him a mentor from the SQL Server Community?

Too me, his site felt very much like a personal collection of useful articles and his only mistake was putting it on the web in a mis-guided attempt to be helpful.

As to why he hasn’t responded I expect its because there are dozen axes waiting for him to stick his head up (or so I imagine he will feel). If I had screwed up this badly early in my career I think I would have been scared to stand-up at this point too.

As for me, I was a little disappointed not to have written anything that he considered worth plagiarising! ;-D

The thing is the community acted in such a quick way because many have been burned in the past and so swift reaction is needed. Many of us don’t have time to scour through the entire site, collect the person’s life history and analyze what “intentions” are for the site. If you violate a copyright, you violate it, end of story. I would hate to see someone take your new book, copy the entire contents, slap a generic cover on it without your names on it and start handing it out. You, Brent and many others worked hard to produce that material and its not right that someone would be distributing that work and passing it off as their own (intentional or not).

Yes, from the public history we see from this guy he’s early in his career but he’s out of school now and welcome to the adult world. Taking responsibility for actions is a lesson we all learn.

Everyone else said it better, but here’s my thought: Maybe we were a little harsh, but the guy had no excuse at all. I’ve been the victim of plagarism IRL, and it SUCKS, even if it doesn’t cost you anything, and everyone knows he did it.

Its been a really interesting debate and I’ve enjoyed mulling over the different angles and opinions.

I think its actually added a certain depth to the community in having to deal with something so contentious and emotive and I’m glad I was a part of it! 🙂

I think the response from the community was spot on. Many of the folks impacted rely on their blog content as a tool for marketing. Pass off their hard work as your own and you have potentially lost them some income.

Reposting information from an RSS reader is fine as long as you give credit to the source as it may very well drive MORE traffic to the source site (big title link saying the post originally came from http://www.somewhere.blah). Or better yet, have your entire post just be a link back to the source material.

Great post, Jorge!

One point I’d like to mention that I didn’t see anyone else make is that we now have an opportunity to decide how to react to similar occurrences in the future as a group. And believe me, they will definitely occur.

Maybe you should lead the charge – via a new blog post – about your opinion of “what to expect from THIS community” when this happens again.

Best regards,

-Kev

Thanks Kevin! You’re right, we really haven’t had a standardized “OK guys, here’s the deal” type post. I’ll get that up as soon as the craziness of SQLSaturday and publishing deadlines have passed!

Well it is “weak” to copy other people’s work to make personal gains, but if someone cares that much about their work they should do one of the following…

1) not post it publicly. If its valuable information that is best not shared or distributed, keep it to yourself.

2) post it in a signed or watermarked format, so it can still be accessed, still be shared, but will retain the markings of the original author. Plus in this way your work won’t be modified or distorted when others share it.

3) have proper copywriting or trademarking done. Not all content posted is protected from being reused, and a lot of people probably got all protective even though they didn’t even focus on their “I.P.” prior.

Thanks for your feedback but I have disagree with you. The solution isn’t to NOT post information publicly. If that were the case people should stop writing books, close down the internet and only talk to themselves! Most blogs are covered under Creative Commons license (http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=creative+commons). Also, if you’ve been in any sort of education system you know what plagiarism is and why it’s wrong and you shouldn’t do it. A blog, while not necessarily a revenue generator, is still someone’s work. How would you like it if I came to your house and took your car even though you worked to pay it off. It’s theft!

True that plagiarism is wrong, but I think you guys may be jumping on the bandwagon and having a crusade over the idea of plagiarism instead of looking at what this guy was doing.

Taking other’s work, and making it seem it is your own personal work, that is plagiarism.

Indexing the publicly available content from other websites is not exactly plagiarism. If it was, would google also be guilty of this?

Perhaps he could have engineered his site better to include the URL of the posts he was indexing, infact I would have most definitely done that if I was him.

But just seems a bit heated. As I read on one of the responses, the copying of your work verbatim often increases traffic to your site, so in many ways the fact that someone finds your BLOG contributions worth reading and even worth copying is a positive indicator of how well your blog is doing. And it most likely will do even better based on people doing things like this.

I’m sorry but you’re wrong. Google isn’t reposting your work and passing it off as its own. And if you want a search engine to not index your pages its simply a matter of tweaking a setting on a page. When someone re-syndicates your content (via RSS for example) the first reaction typically is to tell the person to knock it off (or at least ask permission from the author first). Also as I mentioned before much of the content published on the web is covered by a license type of some type so syndicating without express permission violates those licenses.

As for it being heated, the SQL community takes plagiarism very seriously as folks who write take a lot of time and effort to put together that content so its a little disconcerting when someone rips it off of you and tries to pass it off as their own (intentionally or not). Honestly, I’m not sure why you’re trying to justify plagiarism as a valid medium because it’s not, it’s theft.

Plagerism sucks. Showing someone elses work with out linking back and crediting it to them sucks. Its hard putting out good content and it hurts if someone is jacking your hard work and then not giving you credit. I am appreciative of any community that helps curb plagerism. It hurts everyone by making content providers not want to provide content.

You’re wrong on your third point. As soon as you put something into any sort of tangible form, it’s protected by copyright. This iincludes the Internet. Registering the copyright strengthens the protection, but it still exists.

PS, do you have a link to this site that was indexing the content? I think it may be a quite useful site and would like to check it out. 🙂

True points, I agree there is probably some level of protection always. But really, the idea of this data protectionism is somewhat dated. These days everything is interconnected to everything else. If it’s not, then it’s privatized to some degree, such as having member login etc.

But yes it is wrong not to link back, that is the area the “heat” should be focused on – saying “hey dude, add this single parameter to your spidering: our linkback URL.

The thread should be called the-plague-of-not-linking-back, as that is what really bothered everyone to begin with. I’m sure you don’t mind your work being all over the internet, but as long as the credit was given in a bit more of an expressive and direct way.

look up any troubleshooting topic online. See how many sites are really just compilations of useful content spidered from many other sites. They don’t even put any linkbacks, they just archive and display the content. But it helps people find the content. It is a positive thing, right? If you think your content is so valuable and needs to be protected, print a book and sell it. Otherwise, be happy that your BLOGS are being read and used. If some jackme thinks it’s cool to jack your posts, well he’s the one losing out in the end…

IF that is what he’s doing. Which he’s not.

just sayin’

Leave a Reply to Greg Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.