website comments have lots of spam

Issue #112 resolved
René Dudfield created an issue

We have a problem with doc comments on the website and lots of spam.

There is also no way to easily let other people moderate the comments to mark comments as spam.

Here are some notes about ways to progress.

== Plan 1 - Clean up comments, and import them into disquss ==

We have a json file with all of the comments [0].

Disquss(a commenting service) has been used with great success on the project comments. It has anti spam tools, and a moderation interface. Disquss supports importing comments (in WXR format [1]) too.

So perhaps we can clean the json file of bad comments, and then convert the comments into the WXR comment format for disquss to import.

Another task is getting the comment interface on the webpage working well. It needs to load the disquss comment tool for each function/method when the user asks to see those comments. Otherwise it will overload the page and fill it with junk - when 50 comment boxes load up for some pages. You can only load one disquss comment on a page at a time, but you can reload it [2]. So we can use javascript to reload comments for each section.

== Plan 2 - start fresh with disquss. ==

Maybe we don't even import the comments into disquss at the moment, and just start a fresh with the disquss ones.

The import of the old comments can be done later at a different time.

== Plan 3 - something else? ==

Maybe there is another option?




Comments (4)

  1. Lenard Lindstrom

    Some off the top ideas:

    1. Treat anonymous comments differently from registered comments. Add a spam option to a comment box. It a certain number of registered users mark the anonymous comment as spam, it goes.
    2. Track all changes for a given pygame site account. Then, whenever someone on the mailing list mentions that a there is a lot of spam on the site and offers to clean some of it up, give that person's account Spam Police privileges for a day. Then the volunteer can disable entire spam accounts, removing all comments and wiki changes made under it.

    These suggestions will not prevent spam. But they could simplify removing it.

    As for using Disquss, I do not know enough about such services to make a suggestion. Filtering out spam comments before they are posted would be nice. But what does Disquss get in return?

  2. TankorSmash

    Alright, I just wrote a quick script that culls the JSON file for a few words I've decided were spammy enough to want to delete. I've uploaded the files to my site here: . You'll find a json with the spam removed, a json with the comment I declared were spam, and the text file is the words I used to determine whether or not a comment is spam or not.

    There's a few false positives I've found, such as the long code, I believe his name was Matthew wrote a bouncing ball script, but I had left it in the spam json, due to time constraints.

    Original comments: 1099 and the cleaned comments left: 918, with 181 one spam comments removed.

    Hopefully this puts us in a good place to work from. Let me know what you think.

    edit: I can put the JSON files up on pastebin if you are uneasy about going to a proprietary link like that

  3. TankorSmash

    Speaking of spam, the wiki page for tutorials is covered in it, and I'm unable to properly access the editing page for it to help. If you check out you can see 'penis excercises' in the Brazilian section for example. This version is linked by Google from the onsite Google search bar on the main page.

    When you go to edit it, the content is blank. If you click edit on the top of the page, you are led to the editing page for instead, which is free from spam, and so you can't help the spam proofing at all.

  4. René Dudfield reporter

    Thanks. I finally got around to writing a script to use the spam1.json file to remove spam comments.

  5. Log in to comment