1. Sarah Maddox
  2. Manage unused attachments

Overview

HTTPS SSH

Manage unused attachments

Friendly warning: These scripts are provided "as is" and without any guarantees.

I developed them to solve a specific problem.

I'm sharing them because I hope they will be useful to others too. If you have any improvements to share, please let me know.

Author: Sarah Maddox

Source: manage-unused-attachments

Example use case: How to manage attachment usage in Confluence wiki with some Python scripts

This repo contains five Python scripts that you can use to list all the attachments on a Confluence page, find those that are not referenced in the page content of the space, and then remove them.

Notes:

  • The scripts check for attachments referenced in pages only. Not blog posts, not comments.

  • When getting all attachments of a page, the attachments will include those referenced in comments. If your attachments in comments are important, you should not rely on these scripts to find attachment usage.

  • When running a script that accesses Confluence, you will need to supply a username. The scripts can see only the content that that user has permission to see.

Use the scripts in the following order:

1. getAttachmentCount.py

Optional. Run this script if you want to identify the pages in your Confluence space that have a large number of attachments.

This script gets the number of attachments on each page in a given Confluence space. It outputs a text file containing the page URL and the number of attachments for each page, in descending order of number of attachments.

2. getConfluencePageAttachments.py

This script gets all attachments on a given Confluence page. It puts the list of attachments into a text file, and prints a report of the number of attachments and total file size.

3. wherePageContent.py > getConfluencePageContent.py

The "wherePageContent.py" script is a dummy, which simply tells you where to get "getConfluencePageContent.py". (We need content re-use on Bitbucket!)

The "getConfluencePageContent.py" script is available in another Bitbucket repo.

The script gets the content of all pages in a given Confluence space. It puts the content of each page into a separate text file, in a given directory. The content is in the form of the Confluence "storage format", which is a type of XML consisting of HTML with Confluence-specific elements.

4. findAttachmentUsage.py

This script reads a text file containing attachment file names, matches them against the source of Confluence pages, and produces a report on used and unused attachments.

5. deleteAttachments.py

This script reads a text file containing attachment file names, accepts a Confluence page name, and removes the given attachments from the page.