pilerpurge.py run very slow on CentOS7 and mariadb5.5.6
Hi
I found the problem pilerpurge.py python script run very slow. (5600 emails, take time about 1 hour)
I try to search the old piler Issues but no found solution
So I try to print the status of each step on pilerpurge.py and found the slow responses.
we found slow on step query v_attachment table on purge_attachments_by_attachment_id function and purge_attachments_by_piler_id function as below.
+-----------------------------------------------------------------+
| def purge_attachments_by_attachment_id(opts={}):
| format = ", ".join(['%s'] * len(opts['referenced_attachments']))
|
| cursor = opts['db'].cursor()
|
| cursor.execute("SELECT i, piler_id, attachment_id, refcount FROM " +
| "v_attachment WHERE i IN (%s)" %
| (format), opts['referenced_attachments'])
+-----------------------------------------------------------------+
+----------------------------------------------------------------+
|def purge_attachments_by_piler_id(ids=[], opts={}):
| format = ", ".join(['%s'] * len(ids))
|
| cursor = opts['db'].cursor()
|
| cursor.execute("SELECT i, piler_id, attachment_id, refcount FROM " +
| "v_attachment WHERE piler_id IN (%s)" % (format), ids)
+------------------------------------------------------------------+
So I try to query v_attachment table on DB software (MariaDB Version 5.5.60)
=======================================================
MariaDB [piler]> select * from v_attachment where i = '21';
+----+--------------------------------------+---------------+------+----------+
| i | piler_id | attachment_id | ptr | refcount |
+----+--------------------------------------+---------------+------+----------+
| 21 | 400000005adea55a225aedf400d8777bf7fc | 2 | 0 | 361 |
+----+--------------------------------------+---------------+------+----------+
1 row in set (5 min 39.96 sec)
MariaDB [piler]> select * from v_attachment where piler_id = '400000005adea55a225aedf400d8777bf7fc';
+----+--------------------------------------+---------------+------+----------+
| i | piler_id | attachment_id | ptr | refcount |
+----+--------------------------------------+---------------+------+----------+
| 20 | 400000005adea55a225aedf400d8777bf7fc | 1 | 0 | 31 |
| 21 | 400000005adea55a225aedf400d8777bf7fc | 2 | 0 | 361 |
| 22 | 400000005adea55a225aedf400d8777bf7fc | 3 | 0 | 360 |
| 23 | 400000005adea55a225aedf400d8777bf7fc | 4 | 0 | 31 |
| 24 | 400000005adea55a225aedf400d8777bf7fc | 5 | 0 | 31 |
| 25 | 400000005adea55a225aedf400d8777bf7fc | 6 | 0 | 31 |
| 26 | 400000005adea55a225aedf400d8777bf7fc | 7 | 0 | 49 |
+----+--------------------------------------+---------------+------+----------+
7 rows in set (4 min 57.83 sec)
=======================================================
the query time is about 4-6 min per time. (when I run pilerpurge.py, It query v_attachment 2 times and runs about 10 min to deleted email 1000 record)
As I check. the v_attachment is the VIEW table, I guess this view table has fully generate new data every time to call it. when the data is a big volume, it will take time to create data.
So I try to direct query by SQL code by bringing the code from v_attachment table. Please refer query code below.
===============================================
MariaDB [piler]> select id as i, piler_id, attachment_id, (select count(0) from attachment where (ptr = i )) AS refcount from attachment WHERE id IN ('21');
+----+--------------------------------------+---------------+----------+
| i | piler_id | attachment_id | refcount |
+----+--------------------------------------+---------------+----------+
| 21 | 400000005adea55a225aedf400d8777bf7fc | 2 | 361 |
+----+--------------------------------------+---------------+----------+
1 row in set (0.01 sec)
MariaDB [piler]> SELECT i, piler_id, attachment_id, refcount FROM (select id as i, piler_id, attachment_id, ptr, (select count(0) from attachment where (ptr = i )) AS refcount from attachment) AS v_attachment WHERE piler_id IN ('400000005adea55a225aedf400d8777bf7fc');
+----+--------------------------------------+---------------+----------+
| i | piler_id | attachment_id | refcount |
+----+--------------------------------------+---------------+----------+
| 20 | 400000005adea55a225aedf400d8777bf7fc | 1 | 31 |
| 21 | 400000005adea55a225aedf400d8777bf7fc | 2 | 361 |
| 22 | 400000005adea55a225aedf400d8777bf7fc | 3 | 360 |
| 23 | 400000005adea55a225aedf400d8777bf7fc | 4 | 31 |
| 24 | 400000005adea55a225aedf400d8777bf7fc | 5 | 31 |
| 25 | 400000005adea55a225aedf400d8777bf7fc | 6 | 31 |
| 26 | 400000005adea55a225aedf400d8777bf7fc | 7 | 49 |
+----+--------------------------------------+---------------+----------+
7 rows in set (0.01 sec)
===============================================
The query time is very short.
So I try to modify command on piler as below and run pilerpurge.py by dry-run option. It is very fast more than a query from v_attachment and I recheck some file name from the result on DB, it in target data that I set the retained data to remove email.
But I did not sure that can I direct query instead view table (v_attachment)? and my modified code is may have some hidden problem?
Can you support checking the code below? or can you suggest another solution for us?
+-----------------------------------------------------------------+
|def purge_attachments_by_attachment_id(opts={}):
| format = ", ".join(['%s'] * len(opts['referenced_attachments']))
|
| cursor = opts['db'].cursor()
| print "query database attached file"
| cursor.execute("select id as i, piler_id, attachment_id, (select count(0) from attachment where (ptr = i )) AS refcount from attachment " +
| " WHERE id IN (%s)" % (format), opts['referenced_attachments'])
+-----------------------------------------------------------------+
+-----------------------------------------------------------------+
|def purge_attachments_by_piler_id(ids=[], opts={}):
| format = ", ".join(['%s'] * len(ids))
|
| cursor = opts['db'].cursor()
| cursor.execute("SELECT i, piler_id, attachment_id, refcount FROM " +
| "(select id as i, piler_id, attachment_id, ptr, (select count(0) from attachment where (ptr = i )) AS refcount from attachment) AS v_attachment " +
| " WHERE piler_id IN (%s)" % (format), ids)
+-----------------------------------------------------------------+
Comments (6)
-
repo owner -
reporter As the number row on attachment table
MariaDB [piler]> select count(1) from attachment;
+----------+
| count(1) |
+----------+
| 26966961 |
+----------+
1 row in set (7.15 sec)==============================
and number row on v_attachment table
MariaDB [piler]> select count(1) from v_attachment;
+----------+
| count(1) |
+----------+
| 26966967 |
+----------+
1 row in set (3 min 42.39 sec)
===============================and result of
explain select * from v_attachment where i = '21';
MariaDB [piler]> explain select * from v_attachment where i = '21';
+------+--------------------+------------+------+-----------------+-----------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------+------+-----------------+-----------------+---------+------+----------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 26952690 | Using where |
| 2 | DERIVED | attachment | ALL | NULL | NULL | NULL | NULL | 26952690 | |
| 3 | DEPENDENT SUBQUERY | attachment | ref | attachment_idx3 | attachment_idx3 | 9 | func | 4 |
Using index |
+------+--------------------+------------+------+-----------------+-----------------+---------+------+----------+-------------+
3 rows in set (0.01 sec)
================================================
and result ofexplain select * from v_attachment where piler_id = '400000005adea55a225aedf400d8777bf7fc';
================================================
MariaDB [piler]> explain select * from v_attachment where piler_id = '400000005adea55a225aedf400d8777bf7fc';
+------+--------------------+------------+------+-----------------+-----------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------+------+-----------------+-----------------+---------+------+----------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 26952699 | Using where |
| 2 | DERIVED | attachment | ALL | NULL | NULL | NULL | NULL | 26952699 | |
| 3 | DEPENDENT SUBQUERY | attachment | ref | attachment_idx3 | attachment_idx3 | 9 | func | 4 | Using index |
+------+--------------------+------------+------+-----------------+-----------------+---------+------+----------+-------------+
3 rows in set (0.00 sec)=================================================
-
repo owner I have a different output for these:
MariaDB [piler]> explain select * from v_attachment where i = '21'; +------+-------------+------------+-------+-----------------+-----------------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+-------+-----------------+-----------------+---------+-------+------+-------------+ | 1 | PRIMARY | attachment | const | PRIMARY | PRIMARY | 8 | const | 1 | | | 3 | SUBQUERY | attachment | ref | attachment_idx3 | attachment_idx3 | 9 | func | 1 | Using index | +------+-------------+------------+-------+-----------------+-----------------+---------+-------+------+-------------+ 2 rows in set (0.002 sec)
and
MariaDB [piler]> explain select * from v_attachment where piler_id='400000005f33bf5e2d909aac00cecdf9cfcb'; +------+--------------------+------------+------+-----------------+-----------------+---------+-------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+------------+------+-----------------+-----------------+---------+-------+------+-----------------------+ | 1 | PRIMARY | attachment | ref | attachment_idx | attachment_idx | 144 | const | 1 | Using index condition | | 3 | DEPENDENT SUBQUERY | attachment | ref | attachment_idx3 | attachment_idx3 | 9 | func | 1 | Using index | +------+--------------------+------------+------+-----------------+-----------------+---------+-------+------+-----------------------+
Also check out the following query and see if you miss any index defined below:
MariaDB [piler]> show index from attachment; +------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | attachment | 0 | PRIMARY | 1 | id | A | 602 | NULL | NULL | | BTREE | | | | attachment | 1 | attachment_idx | 1 | piler_id | A | 602 | NULL | NULL | | BTREE | | | | attachment | 1 | attachment_idx2 | 1 | sig | A | 602 | NULL | NULL | | BTREE | | | | attachment | 1 | attachment_idx2 | 2 | size | A | 602 | NULL | NULL | YES | BTREE | | | | attachment | 1 | attachment_idx2 | 3 | ptr | A | 602 | NULL | NULL | YES | BTREE | | | | attachment | 1 | attachment_idx3 | 1 | ptr | A | 602 | NULL | NULL | YES | BTREE | | | +------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
-
reporter Could you please share command you have used to create view “v_attachment”. We want to compare with ours.
-
repo owner -
repo owner - changed status to closed
No news is good news.
- Log in to comment
How many rows do you have in the attachment table? Also show me the following queries: