Sorted columns and scatter charts not linked

Issue #884 resolved
Eric Jensen
created an issue

Sorting data in excel updates what data is shown in charts created with excel, but not in charts created with openpyxl.

A minimum working example:

import numpy as np
from openpyxl import *
from random import random
from openpyxl.utils.cell import get_column_letter
from openpyxl.chart import (
    LineChart,
    BarChart,
    ScatterChart,
    Reference,
    Series,
)
from openpyxl.drawing.text import CharacterProperties

wb = Workbook()

ws = wb.create_sheet()
ws.title = 'interactiveChart'

num = 9

ws.cell(column=1, row=2, value='X')
ws.cell(column=2, row=2, value='Y')
for i in range(num+1):
    ws.cell(column=1, row=3+i, value=random()*100)
    ws.cell(column=2, row=3+i, value='=A{0}*3+4+ABS(5/(11-A{0}))+ABS(10/(35-A{0}))+ABS(30/(67-A{0}))'.format(3+i))

textSize = 10
modeChart = ScatterChart()
modeChart.title = 'Resonance'
modeChart.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.style = 48
modeChart.x_axis.title = "X"
modeChart.x_axis.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.y_axis.title = "Y"
modeChart.y_axis.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.legend = None

xvalues = Reference(ws, min_col=1, min_row=2, max_row=num+3)
yvalues = Reference(ws, min_col=2, min_row=2, max_row=num+3)
series = Series(yvalues, xvalues, title_from_data=False, title='Resonace')
modeChart.series.append(series)
s1 = modeChart.series[0]
s1.marker.symbol = "diamond"
s1.marker.graphicalProperties.solidFill = "6495ED"
s1.marker.graphicalProperties.line.solidFill = "6495ED"
s1.graphicalProperties.line.noFill = True

modeChart.x_axis.tickLblPos = "low"
modeChart.y_axis.tickLblPos = "low"

modeChart.width = 12
modeChart.height = 7
ws.add_chart(modeChart, "F6")

ws.auto_filter.ref = 'A2:B{}'.format(num+3)

ws = wb.get_sheet_by_name("Sheet")
wb.remove_sheet(ws)
wb.save('aTest.xlsx')

I have tried to examine the .xlsx files to see what differences there are. First I noted that there is no calcChain file in the openpyxl .xlsx file, and it is not referenced in the -rels file. I do not think this is the issue since the calcChain file in the excel-generated .xlsx file only lists the cells containing the calculated y values (see above code). Comparing the two chart1.xml files I note that all the tags are prefaced by "c:" in the excel-generated file and that some of the schema references are different. The excel-generated chart1.xml file also contains the value of every datapoint, listed individually, while the openpyxl chart1.xml only contains a range reference.

I'd like to be able to solve this myself, but I do not know how to change the xml files in the .xlsx files. Perhaps if someone could give me a quick guide to doing this I could use it to identify what causes this link and possible generate a pull request down the line?

Comments (9)

  1. CharlieC

    Assuming you've created a file like this, what do you do in Excel to sort it?

    The XML differences sound marginal: the c: is just the namespace prefix that Excel attaches for charts, openpyxl uses the default namespace; in XML they're equivalent. Excel caches the values in charts but this is an optional optimisation (the strCache and NumCache elements are optional). The same is true for calcChain.xml.

    I suspect there is not a lot that can be done here. openpyxl itself never sorts because this involves physically rearranging the values of cells. Using the code provide I can create and open the file. If I subsequently sort the data then the chart is updated.

  2. Eric Jensen reporter

    Hi Charlie

    I suspected the c: was irrelevant and I am happy to learn that the strCache and NumCache are as well. I am, however, surprised that the code works for you. When I create the Excel file and open it in Excel I use the drop-down options in either the X or Y columns to fx only show values less than 40; the table is updated, but the chart remains unchanged. The chart still shows all values. If I create a chart in Excel of the unsorted data and then sort the data in the same manner, the Excel-generated chart automatically updates and no longer shows the removed values. This is not what you experience? My openpyxl comes from the latest python 3.6 anaconda installation.

  3. CharlieC

    What you're seeing is related entirely to the client application (Excel) and could be down to settings. I'll paste the file I created. If I open it and filter or sort the chart is automatically updated. There's nothing in the file format that affects this.

  4. Eric Jensen reporter

    Hi Charlie

    You are absolutely right. I downloaded your file and it works exactly as intended. I compared the two files and other than the the order of some of the tags they were nearly identical, except for what I think might be the culprit: <plotVisOnly val="1"/> I searched for this and there is an issue with this here: Setting property plotVisOnly for charts Apparently it is resolved in openpyxl 2.5 and I am running openpyxl 2.4.8. I am sorry I did not find this earlier. Thank you so much for you help! I honestly thought this was an issue that had not been addressed. Marking as resolved.

  5. CharlieC

    Yes, I tend to try stuff out on my development branch. 2.5 has some minor changes in charts because I added read support. You'll need to update your code slightly because paragraphs now have lists of text runs. So p[0].r[0]… but nothing that should affect this.

  6. Eric Jensen reporter

    I removed openpyxl with pip uninstall openpyxl and installed the latest version from a tarball with python setup.py install. I did not change my code and double-checked the openpyxl version with openpyxl.__version___. The code ran find without changes and the chart in the Excel file now updates automatically. Thanks for all the help!

  7. Log in to comment