Manage unused attachments
Friendly warning: These scripts are provided "as is" and without any guarantees.
I developed them to solve a specific problem.
I'm sharing them because I hope they will be useful to others too. If you have any improvements to share, please let me know.
Author: Sarah Maddox
This repo contains five Python scripts that you can use to list all the attachments on a Confluence page, find those that are not referenced in the page content of the space, and then remove them.
The scripts check for attachments referenced in pages only. Not blog posts, not comments.
When getting all attachments of a page, the attachments will include those referenced in comments. If your attachments in comments are important, you should not rely on these scripts to find attachment usage.
When running a script that accesses Confluence, you will need to supply a username. The scripts can see only the content that that user has permission to see.
Use the scripts in the following order:
Optional. Run this script if you want to identify the pages in your Confluence space that have a large number of attachments.
This script gets the number of attachments on each page in a given Confluence space. It outputs a text file containing the page URL and the number of attachments for each page, in descending order of number of attachments.
This script gets all attachments on a given Confluence page. It puts the list of attachments into a text file, and prints a report of the number of attachments and total file size.
3. wherePageContent.py > getConfluencePageContent.py
The "wherePageContent.py" script is a dummy, which simply tells you where to get "getConfluencePageContent.py". (We need content re-use on Bitbucket!)
The "getConfluencePageContent.py" script is available in another Bitbucket repo.
The script gets the content of all pages in a given Confluence space. It puts the content of each page into a separate text file, in a given directory. The content is in the form of the Confluence "storage format", which is a type of XML consisting of HTML with Confluence-specific elements.
This script reads a text file containing attachment file names, matches them against the source of Confluence pages, and produces a report on used and unused attachments.
This script reads a text file containing attachment file names, accepts a Confluence page name, and removes the given attachments from the page.