Bitbucket is a code hosting site with unlimited public and private repositories. We're also free for small teams!

Close

Check duplicate pages

Friendly warning: These scripts are provided "as is" and without any guarantees.

I developed them to solve a specific problem.

I'm sharing them because I hope they will be useful to others too. If you have any improvements to share, please let me know.

Author: Sarah Maddox

Source: check-duplicate-pages

Usage guide: How to find duplicate page names across Confluence spaces

This repo contains three Python scripts. Two of them are 'duplicate-checker' scripts, which check a text file for duplicate entries. The third script produces a list of Confluence pages, which you can use as input into the duplicate-checker if you want to.

The duplicate-checker scripts

  • pageduptest.py -- This is the latest version of the script. It has been adapted to work in Python 3.2.3. It uses a dictionary to store and compare the page names for each space.
  • pageduptest-old.py -- This is the older version of the script. It works in Python 2.7. It uses nested lists to store and compare the page names for each space.

Both these scripts accept the same input and produce the same result. You should choose the script that suits you best. The scripts read a text file containing Confluence space keys and page names, and report the duplicate page names. The scripts assume an input text file of a specific format. This text file is called 'pages.txt'.

To produce the text file, you can do one of the following:

  • Option 1: Use a Children macro on a Confluence page, to list all the pages in your space. Copy the page names and paste them into a text file.
  • Option 2: Use the 'pagelister.py' script, also contained in this repo, to list all the pages in a given set of Confluence spaces.

The script that lists Confluence pages

pagelister.py -- This script lists the names of all pages in a given set of Confluence spaces. It puts the page names and space keys into a text file in the format required by the pageduptest scripts.

The input file 'pages.txt'

The file contains a list of space keys and page names.

The space key is on a separate line at the start of each set of page names. The line for the space key starts with "Spacekey="

The line for a page name contains just the page name.

Example illustrating the format of the input file

Spacekey=DOC
This is the name of a page
This is the name of page BB
How to eat a chocolate
Spacekey=JIRA
This is the name of page BB
This is the name of page D
My page F
Spacekey=FISHEYE
This is the name of page BBB
Talking about pages
My page F

Recent activity

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.