Wiki

Clone wiki

q2a-bulk-content-generator-public / Home

BCG - Bulk Content Generator

Description

BCG is a Question2Answer plugin that allows admins to generate large amounts of content on their sites simply by uploading files (supporting .xlsx, .csv and .ods file formats).

Features

  • Create questions, answers and comments taken from a file
  • All posts can be set an author or be anonymous
  • Questions can be set a category, tags, amount of views and the Q2A extra value, if configured in the site
  • Answers can be selected
  • All posts can be voted up
  • Ability to disable notifications on the created posts (to avoid any future email)
  • Ability to disable notifications while creating the posts (to avoid emails only while creating the posts)
  • All posts can be set a particular creation date in the past or it can even be generated randomly
  • Work is split into batches and progress is updated after processing each batch
  • Administrators can select what user levels can perform imports
  • Very simple installation
  • Upgrade system support: The plugin will be able to properly upgrade from previous versions
  • Simple setup

Note this plugin is not intended to enqueue and schedule posting of questions, answers or comments.

Requirements

These are the technical requirements for the plugin to work properly:

  • PHP version 7.2+
  • PHP extension ext-zip enabled
  • PHP extension ext-xmlreader enabled
  • Q2A core version 1.7.*, 1.8.*
  • External users feature needs to be disabled

Installation

  1. Copy the plugin directory pupi-bcg into the qa-plugin directory and enable the plugin.
  2. Navigate to admin/plugins and enable the plugin.
  3. In that same page, execute the database setup.
  4. Navigate to admin/pages and add or edit the Import page.
  5. Make sure that:
    1. The Visible for field matches the user level that is allowed to perform imports.
    2. The URL of link field matches pupi-bcg-import.

Upgrading from v1.x

  1. Make sure PHP versions matches 7.2+.
  2. Make sure extensions ext-zip and ext-xmlreader are enabled.
  3. Make sure you are not using external users.
  4. Supported file types are now XLSX, CSV and ODS. Any other format, including XLS are no longer supported.
  5. Columns Id and ParentIdInFile have to be removed from the files.
  6. Row order becomes relevant (this is explained in sections below).
  7. Usernames now only accept a URL or the plain username. In v1, first%20second would have been a valid username and it would have become first second. In v2 first%20second is processed as first%20second. In order for the URL characters to be decoded, the full profile URL will have to be used, such as https://www.example.com/user/first%20second. In this case, the username will become first second.
  8. Depending on each server, in v2.x max_execution_time and memory_limit might be decreased, if needed. This is because memory usage has been significanlty decreased and the import process has been modularized, which allows work to be processed in many smaller batches.

Sample usage and screenshots

As the interaction with this plugin is mostly focused on file uploads, it is important to understand what the files look like and what impact they have in the site. The files can be in .xlsx, .csv and .ods format. No other file formats are supported. These files are all spreadsheets where each row represents a post and columns represent a piece of data or setting related to that post. The following sections explain some rules that define how posts are created.

Parent and child

Since version 2.0.0, parent and child relationships depend on the order in which questions, answers and comments are present in the file. The rows should be thought as a group of trees (questions) that can have many branches (answers and comments) and some of those branches (answers) can have other branches as well (comments).

So the logic is simple:

  1. Posts are created from top to bottom (starting on the beginning of the file)
  2. If the current row is of type question, then it has no parent
  3. If the current row is not of type question, then the parent will be the next posible parent that has appeared in the file (this means moving up).

A "posible parent" covers cases such as 2 consecutive answers. It is not posible for answer to be a child of an answer so the previous row would be skipped in the analysis.

This behavior is affected by the ParentIdInSite column. This is because that field forces the parent to be out of the file itself. This generates a different branch.

The process is very visual and harder to explain in words. So here are some examples to understand how the creation process works. For each of them, an extract of the Type and ParentIdInSite columns will be displayed, exactly as it would show in a real file.

Example 1

The input file (excluding unnecesary columns):

┌──────┐
│ Type │
├──────┤
│ Q    │
│ A    │
│ A    │
│ C    │
│ Q    │
│ A    │
└──────┘

Resulting hierarchy:

Question 1
├─ Answer 1
└─ Answer 2
   └─ Comment 1
Question 2
└─ Answer 3

Example 2

The input file (excluding unnecesary columns):

┌──────┐
│ Type │
├──────┤
│ Q    │
│ C    │
│ C    │
│ A    │
│ C    │
│ C    │
│ A    │
│ C    │
│ C    │
└──────┘

Resulting hierarchy:

Question
├─ Comment 1
├─ Comment 2
├─ Answer 1
│  ├─ Comment 3
│  └─ Comment 4
└─ Answer 2
   ├─ Comment 5
   └─ Comment 6

Example 3

The input file (excluding unnecesary columns):

┌──────┬────────────────┐
│ Type │ ParentIdInSite │
├──────┼────────────────┤
│ Q    │                │
│ C    │                │
│ A    │            123 │
│ C    │                │
│ A    │                │
│ C    │            456 │
│ C    │                │
│ Q    │                │
└──────┴────────────────┘

Resulting hierarchy:

Question 1
├─ Comment 1
└─ Answer 2
   └─ Comment 4
Question 123
└─ Answer 1
   └─ Comment 2
Question 456
└─ Comment 3
Question 2

In the example above, the first two rows should be clear. When the first answer appears it has a reference to a parent question in the site. So the answer will be added to Question 123 rather than Question 1. Then the logic goes on as normal, and the second comment becomes a comment of Answer 1, as it is the first posible parent moving up. Then the second answer appears and the only possible parent for an answer is Question 1 so this marks the end of the Answer 1 branch and the beginning of the Answer 2. The first comment under Answer 2 happens to have a parent in the site. Let's assume the parent is a question (but it could be an answer as well). Then, a new comment appears and the previous possible parent is Answer 2. The last question is then created.

File structure

The first row of a file will contain all the headings of the columns so the plugin just ignores it. These are the columns/fields that need to be present in the file exactly in the order they appear below:

  • Type: Required. The type of post. It could be either Q for questions, A for answers and C for comments. No other value is possible.
  • ParentIdInSite: Optional. The post Id of an already existing post in the site that will be the parent of this post. This field requires the post Id, which is a number that can be seen in the URL of a question or answer link, for example, https://yoursite.com/413/how-do-birds-fly. It must be a number greater than 0.
  • Title: Optional. The title of the question. This should only be filled if the post being created is a question.
  • Content: Optional. The content of the post. Applies for questions, answers and comments.
  • Format: Optional. The format used to create the post. This value is usually set by editors behind the scenes. For example, the basic Q2A editor and viewer does not support HTML and just uses plain text to display posts. Other editors, such as the WYSIWYG editor that comes with Q2A (CKEditor), support HTML while other editors, such as the markdown editor, support markdown format. This means the value set in this field depends on the editor used to display the posts. For the most popular editors, it can be markdown for the markdown editor, html for the CKEditor and an empty field (unfilled) for plain text (which is the default for comments).
  • CategoryId: Optional. The category Id for the question. This should only be filled if the post being created is a question. The category Id is a number that can be seen in the URL of a category only when editing categories in the admin section, for example, https://yoursite.com/admin/categories?edit=2. If the input category exists, then the question will be created under that category. If this field is filled, then CategoryUrl must be empty.
  • CategoryUrl: Optional. The category slug for the question. This should only be filled if the post being created is a question. The category slug is the piece of text generated from the category name that is URL-friendly. It can be found when editing a category in the admin section or in the URL when displaying the posts under a given category https://yoursite.com/questions/category-slug. If this field is filled, then CategoryId must be empty.
  • Tags: Optional. The tags for the question. This should only be filled if the post being created is a question. The category tags is a comma-separated list of tags.
  • UserName: Optional. The username (or handle) of an already existing user in the site. It applies for all kind of posts. If the username contains special characters, it can be taken from the URL itself while viewing their profile. For example, for user john doe, one option would be to input john doe. Another option would be to input https://yoursite.com/users/john%20doe. Take into account https://yoursite.com will have to match exactly the Preferred site URL set in the admin/general page. If this field is filled then AnonymousName must be empty.
  • AnonymousName: Optional. The display name of a user that will anonymously create the post. It is not an existing user in the site. This field can be empty. If this field is filled then UserName must be empty.
  • Notify: Optional. Defines whether the author of the post will be notified by email of any updates to it such as a child post being added. It can be true, false or empty. If empty, it is assumed to be true for registered users and false for anonymous users. This field cannot be set to true when the creating user is anonymous.
  • ExtraValue: Optional. The custom extra field that can be added to the ask form. It can be configured in the admin/posting page. This should only be filled if the post being created is a question.
  • DateTimeFrom (Required) and DateTimeTo (Optional): This two fields work together to generate a single post creation date. It is very important to note that, when these fields hold a date and time value, the cells in the spreadsheet must be formatted as a date or a date time for XLSX and ODS files or respect the YYYY-MM-DD HH:mm:SS format for CSV files. There are different combinations of these two fields that generate different results. Here is a table with all the needed examples:

    DateTimeFrom DateTimeTo Resulting date
    2012-11-28 14:20:15 2012-11-28 14:20:15
    2012-11-28 14:20:15 2012-11-29 14:20:15 A random date and time between 2012-11-28 14:20:15 and 2012-11-29 14:20:15
    2012-11-28 14:20:15 now A random date and time between 2012-11-28 14:20:15 and the current server time
    now The current server time
    any A random date and time between the input provided while submitting the form with the file and the current server time
  • Selected: Optional. Whether the answer is selected or not. Applies only for answers. The user selecting the answer is assumed to be the user who asked the question. The time in which the user selects the answer is defined by a delay set while submitting the form with the file. Note if the answer is being created for an existing question in the site (using the ParentIdInSite column) that already has a selected answer, then the newly created answer will become the selected answer for the question.

  • Views: Optional. Amount of views that the question will be assigned. This field only applies to questions. Leaving the field blank will result in 0 views.
  • Votes: Optional. Amount of votes that the post will be assigned. It is important to know that, in order for votes not to be deleted when they are recalculated, they need to be cast by real users. So this process involves creating dummy users that will cast votes on the created posts automatically. Leaving the field blank will result in 0 votes cast. There is more information on votes and dummy users in the Dummy users section.

To ease the setup of a new file to import it is possible to download a sample file from here. It can be used as a reference. Do not add or remove columns. Do not remove the header row. Just replace the example rows and add new ones.

Dummy users

In order for a vote to be created, it needs to be linked to a post and a user. So the plugin will automatically create the dummy users that are needed to vote. These are normal site users that are managed by the plugin. The plugin keeps track of them and reuses them in each import event. The amount of users needed will be the same as the maximum amount of votes that have been imported.

The usernames for them are generated by randomly concatenating first and last names stored in the data/first-names.txt and in the data/last-names.txt files. Maximum size for each first name and each last name is 10 characters.

All users are also assigned the same domain name, which can be configured using the configuration variable DUMMY_USER_EMAIL_DOMAIN located in the PUPI_BCG_Constants.php file.

Deleting a dummy user will have the same effect as deleting any user. This means that all the votes the dummy user has cast will be removed.

If you do not want the plugin to create any user, then do not set any vote for any post when importing a file.

Import form

The import form can be accesed from the admin/import section or clicking on the Import submenu option in the Admin options.

Here is a screenshot of the file submission form:

Import form

This is a short description of the components in the form:

  • The first control of the form is the input for the file to import.
  • Minimum amount of minutes to delay and Maximum amount of minutes to delay are both related to dates set with value any. In this case, posts are given a random date. This might result in a question being posted in a given time and an answer posted just a few seconds later or maybe years later. In order to avoid these timing issues, it is possible to define a range to delay the child posts (answers or comments). These fields control how many minutes a child post will be delayed. For example, if an answer is created on 2012-11-28 14:20:15 then a comment to that answer will be created between 30 and 300 minutes after the question's creation date (between 2012-11-28 14:50:15 and 2012-11-28 19:20:15).

    Additionally, these ranged delay is also applied when selecting answers. Following the same values as before, an answer will be selected between 30 and 300 minutes after its creation).

    If a question has one or more answers, and a new answer is being created, the new answer will base its creation date on the latest answer for the question, rather than the question date. In a similar way, if question or answer has one or more comments, and a new comment is being created, the new comment will base its creation date on the latest comment of the parent post.

  • Posts should be created after is a field used to define the oldest date in which a post (actually, a question) that is set to any in the DateTimeFrom field can be created. Mostly applies to questions as the question's children inherit the same date as the question plus the delays.

  • Send notifications defines whether email notifications are sent while the import process is running. This field is not the same as the Notify field. The Notify field is a flag that defines whether the owner of a post will receive notifications once child posts are added. The Send notifications setting, defines whether users, regardless of the Notify value set in each post, will receive notifications during the import process execution. In general, email notifications while importing are not needed, so disabling them will decrease execution time.

The buttons at the bottom of the form are the following:

  • Validate: Validates a file without importing it. This process takes a short amount of time (usually seconds). The file has to be completely sent to the server. As a reference, the validation process could take between 5 to 20 seconds to validate 30,000 rows, depending on server specs and load.

    If this process returns a server issue (or just doesn't return, which would be a server issue as well), then it is likely that the file is too large for the server to receive it. The file should be split into smaller ones or the server will have to be tweaked (see the Performance and troubleshooting section).

  • Import: Validates a file and then starts the import process. The import process is split in the following steps:

    • Validation: This step is the same as the one explained above
    • Saving: Rows of the file are saved in the server. These are used for the creation process later. As a reference, the saving stage could take between 2 to 5 minutes to save 30,000 rows, depending on server specs and load.
    • Creation: Posts are read from the server and created one by one. As a reference, with the Send notifications checkbox unchecked, the creation stage could take between 30 to 60 minutes to save 30,000 rows, depending on server specs and load.
  • Undo import: Deletes all posts that have been created by the last run import process. As a reference, the undo import stage could take between 5 to 10 hours to remove 30,000 rows, depending on server specs and load.

    There are a few things to take into account about this process:

    • If the import process is run twice, only the last run can be undone.
    • All posts that are child of a post that had been created by the process will be deleted, no matter if the children were created by the process or added manually later.
    • If the import process created and selected an answer for an existing question in the site (using the ParentIdInSite column) that already had a selected answer, then the originally selected answer for the question will not be restored.

With the exception of the validation stage, stages are processed in small batches. After each batch is processed, a progress update will be displayed in the form, as shown in the image below:

Import form progress

Performance and troubleshooting

It is important to note that some web servers have very small file size, execution time or memory limits set. This can result in a file not being able to be uploaded or processed. Some of these errors can be detected and some can be avoided by the plugin, but it is better to take action and tweak those values beforehand. This can easily be done by knowing your web server's limits and creating the files accordingly (less rows means less file size, execution time and memory needs). Alternatively, it is possible to use zip-compressed file formats such as the XLSX one, which will result in smaller file sizes.

If a process finishes with a blank page or a server error, it is likely that you have exceeded the amount of information you can send to your own web server, which is unrelated to this plugin. There are 2 values that work together: one is the file size amount that can be attached to the request and the other is the size amount of the request itself. You should not exceed either of those values. Those values are usually around 2MB. You can either increase those values (or ask your web server administrator to do so) or decrease the file size you upload, in other words, reduce the amount of rows of the file.

Another situation that could lead to a blank page or server error could be the amount of time the web server is allowed to run a script. As a reference, free web hosting plans give around 30 seconds, which is also the default value in most PHP versions. Increasing that value in the web server should solve the issue. However, bear in mind the plugin is ready to handle this scenario as it checks after processing each row how close it is to the time limit and, if it is close enough, the plugin will finish processing the batch beforehand and avoid a server error.

Here are some (not necessarily all) links to relevant PHP settings:

When it comes to MySQL, there are some additional tweaks that might be needed such as updating system variables:

SET GLOBAL max_allowed_packet = 100000000;
SET GLOBAL wait_timeout = 28800;

The exact values depend on the files you are uploading.

Remember that no matter what you set in your server, your hosting company may be enforcing limits that override those values. Make sure to check with them if they are overriding any of those values or if they are limiting any other aspect that would not allow a bulk upload to finish successfully (for example, bandwidth).

The last tweak that should be taken into account is the batch size. In file PUPI_BCG_Constants.php, the following settings can be found and freely edited:

const BATCH_SAVE_SIZE = 1000;
const BATCH_USER_CREATE_SIZE = 100;
const BATCH_POST_CREATE_SIZE = 100;
const BATCH_UNDO_SIZE = 50;

They control how many posts will be processed in each stage before a step in that stage is complete. The longer the step, the more time it will take to see progress updates in the form and the more memory the process will consume. Note that even though memory usage will be increased, it will not be significantly increased. On the other hand, the smaller the step, the more steps will be needed which will result in more web requests to the server, which will lead to more network delays.

Finding the right batch sizes is an empiric process and depends on the content of the files, the size of the files, the server specs, the server configuration and the server load.

Important: If an import process goes wrong, do not run it again. First, undo the halfway done import process, tweak whatever variables you have to tweak, and then run it again.

Uninstall

If you need to uninstall this plugin all you need to do is:

  1. Remove the pupi-bcg directory.
  2. Drop all tables starting with the ^pupi_bcg_ prefix where ^ stands for your Q2A config.php settings (which is qa_ by default).
  3. Remove all rows from the ^options table that start with pupi_bcg_.

Support

Technical support must all go through the issues section on this site and be in English.

  • Bugs: if you find a bug or error in the plugin, create a bug issue detailing the steps needed to be performed in order to reproduce the issue. Provide the steps from scratch, i.e., starting just after a clean Q2A setup. Attaching screenshots might be useful too. Also include your PHP, MySQL, Q2A and plugin versions. Make sure you have no other plugin in the qa-plugin directory while reproducing it and that you are using the default Q2A theme.
  • New features: feel free to request features using the issues section. Select the enhancement kind in the combobox and explain in detail the feature and why you consider it would be useful to others.
  • Questions and other technical support: create a task issue explaining the situation in detail.

Administrative (all non-technical) issues should go through private message to my profile in the Q2A site. Make sure to provide your email address.

FAQ

  • Do you give refunds in case I end up not liking the plugin?

    No. The reason for this is that not only this wiki is quite descriptive but also the plugin can be seen and interacted with live in the demo site. You can fully test the plugin before buying.

  • Does this plugin create tags, users or categories in addition to posts?

    Tags are created in the same way as when a question is asked. Users and categories should already exist in the system.

  • Can multiple users upload files at the same time?

    Nothing will stop the users from doing so. However, the results will not be the expected ones. Do not run more than one process at the same time.

  • Plugin is a bit slow when I try to upload several thousand posts. Is there any way to increase the speed?

    The plugin is slow because it needs to make sure the information being uploaded and referenced is present in the site (e.g., making sure referenced users or categories already exist in the site). That is crucial in order to avoid data errors generated by data not well formatted in the file. This means there is no way to increase the speed, rather than following some tips mentioned here and also tweaking your own server.

  • Are bug fixes charged separately?

    No. Bug fixes are already included in the price. New features, however, might not be free.

  • Would you mind doing some customizations to this plugin?

    Sure. Feel free to get in touch as explained in the Support section.

Get the plugin

Click here to navigate to the main plugin page. From there, you will be able to buy the plugin.

Updated