Median values for charts
It would be useful if median values were available on charts.
Unfortunately django aggregation doesn't include median, as it isn't included in the underlying database. It should be possible to add a median class to Aggregate, but I haven't worked out how to do it yet.
Comments (77)
-
reporter -
reporter There's a 2009 discussion in the Google Groups Django users group about custom aggregate objects: https://groups.google.com/forum/#!topic/django-users/vVprMpsAnPo. It includes an example of implementing a MyMax user-defined aggregate function. Not sure if the example given there will work with django 1.6 that is currently being used by OpenREM. If it does then it may be possible to use the technique to implement a median function.
-
reporter The following site looks very useful. It has django code to add a new median aggregate, and SQL code to add a median function to the database: https://github.com/tomi77/python-utils/tree/master/django
-
Is it database specific?
-
reporter I have tried the above method, but the median for the DAP values comes out as zeros. It works for kVp and uAs - I think there's a problem with the calculation method when the values are all less than 1.
I've also tried removing the database functions and aggregates above, and using just a single one from here: https://wiki.postgresql.org/wiki/Aggregate_Median. This works for kVp and uAs, but not for the DAP. Again, I suspect the problem is with all the DAP values being less than 1.
-
reporter Hi @edmcdonagh. The database functions I'm looking at to begin with are probably postgresql-specific, but it should be possible to come up with a database-neutral version once I've worked out how to get it working properly.
It would also be good if I can find a way of automatically adding the required database function and aggregate without the user having to log in to their database and manually add it.
-
reporter The median values being returned for the DAP are being rounded to integers, and I can't see why. I have been able to obtain median values by making the database function multiply the values by 10^10. This ensures that all decimal places in the stored DAP values are moved to the left of the decimal point. The returned values will have to be divided by 10^10 to make them correct in views.py
CREATE FUNCTION _final_median(anyarray) RETURNS float8 AS $$ WITH q AS ( SELECT val FROM unnest($1) val WHERE VAL IS NOT NULL ORDER BY 1 ), cnt AS ( SELECT COUNT(*) AS c FROM q ) SELECT (AVG(val)::float8) * 10000000000 FROM ( SELECT val FROM q LIMIT 2 - MOD((SELECT c FROM cnt), 2) OFFSET GREATEST(CEIL((SELECT c FROM cnt) / 2.0) - 1, 0) ) q2; $$ LANGUAGE sql IMMUTABLE; CREATE AGGREGATE median(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=_final_median, INITCOND='{}' );
-
reporter Added new django aggregate function to calculate median. The database needs to have a median function added (included in medianAggregateSQL.txt) - this is postgresql-specific at the moment. References issue
#241.→ <<cset 1a436ab72cc2>>
-
reporter Added median data as a second series to the plot of acquisition DAP per protocol. Renamed the chart option to 'mean and median' rather than just 'mean'. References issue
#241.→ <<cset e6ac8cfa6b7c>>
-
reporter Updated median SQL calculation to be more generic. I'm unsure of the UNNEST at the moment. References issue
#241.→ <<cset df6657e7faab>>
-
reporter It looks like this method of adding a new aggregate function won't work for MySQL or sqlite. MySQL can have additional aggregates added, but from a brief search it appears they have to be written in C or similar - it certainly doesn't let you use SQL.
-
reporter Added postgresql-specific SQL back as it seems that other databases won't be able to use this. I've removed the float8 conversions, as it seems to be faster without. References issue
#241.→ <<cset 5fe3b0273d21>>
-
reporter @edmcdonagh, I'd really like to have median values on the charts. I'm pretty sure that there is some European patient dose audit guidance coming out soon that is going to say that medians should be used rather than means. However, I don't like the fact that my current solution is specific to PostgreSQL. Do you have any suggestions as to how medians could be calculated in a database-neutral way?
-
From following your investigations, it would seem the pragmatic path to follow would be to recommend PostgreSQL (as django do, at least to some degree), and if you have Postgres you will have medians, and if you don't, then you won't (it will need to fail gracefully). If you don't, then a simple export to Excel will sort you out.
-
reporter Thanks for the reply @edmcdonagh. We could have a system-wide flag that is set to 1 if it is a PostgreSQL server, and zero if it is anything else. The flag could then be checked to see whether to try the median calculations or not. Something like this:
if 'postgresql' in settings.DATABASES['default']['ENGINE']: median_available = 1 else: median_available = 0
I think that this may have to be stored in the user profile, because the html templates need to know whether they should include JavaScript that expects medians or not. Something like this could then be used in the html templates:
{% if request.user.userprofile.median_available %} HTML and JavaScript code that includes plotting of median data {% else %} HTML and JavaScript code that does not include plotting of median data {% endif %}
-
reporter Added DAP per acquisition JavaScript chart files for mean, mediand and combined plot. Updated dxfiltered.html to point to the combined one. References issue
#241.→ <<cset a13bfcc85b76>>
-
reporter Added new userProfile field called median_available, set to False by default and cannot be edited by the user. Added code to views.py to check status of median_available and also if PostgreSQL is being used in order to determine if median values can be calcualted. Pass the result to the dx plot calculations. References issue
#241.→ <<cset d8f8b7d04456>>
-
reporter Added user choice to plot either mean, median or both. Incorporated this option into views.py and dxfiltered.html so that the required DAP per acquisition protocol plot is produced. References issue
#241.→ <<cset 3a814629f53e>>
-
reporter Added median to the DX mAs plot. References issue
#241.→ <<cset 5d897e743e14>>
-
reporter Added median to the DX kVp plot. Tweaked one of the mAs plot JavaScript files. References issue
#241.→ <<cset 221821c7d340>>
-
reporter Added code to create a median acquisition DAP over time plot. The median is used if the user has selected 'median' or 'both' in the 'Averages' choice. A plot with median and mean would be too busy. References issue
#241.→ <<cset 41c5b118675a>>
-
reporter All of the DX charts now have a median option.
-
reporter I think that the inclusion of these median calculations should wait until we are using at least Django 1.7, as this version and above has a migration class that can run SQL. This will make it easy to include the required median and aggregate database functions. See https://docs.djangoproject.com/en/1.7/ref/migration-operations/#runsql.
-
reporter I have just installed OpenREM 0.7.0b1 using SQLite as the database and then populated it with several hundred fake radiographic studies. The plots etc all worked as they should. I then copied my median updates over, ran a schemamigration and restarted the server: all still worked fine. As expected, the "Average" choice drop-down does not appear, and medians are not (indeed cannot) be plotted. So it seems my code does fail gracefully when a database other than PostgreSQL is being used.
-
Lovely. Would you have time to see how we upgrade to django 1.7 (or 1.8?)?
-
reporter I'll take a look at https://docs.djangoproject.com/en/1.8/howto/upgrade-version/ and try it out.
-
Thanks. You'll need to remove the django=1.6 from setup.py
-
reporter Initial experience of upgrading to Django 1.8 is that it works fine. I looked at https://docs.djangoproject.com/en/1.8/topics/migrations/#upgrading-from-south and https://docs.djangoproject.com/en/1.8/howto/upgrade-version/ and then did:
-
pip install -U Django
-
Commented out
south
from theINSTALLED_APPS
section of thesettings.py
file -
Deleted all the numbered migration files in the
migrations
folder, including the .pyc files -
python manage.py makemigrations
-
python manage.py migrate --fake-initial
-
python manage.py runserver --install
OpenREM is now running with Django 1.8.
The above tests were carried out on a 0.7.0b1 system with an SQLite backend containing a few hundred radiographic studies. I haven't yet tested it with PostgreSQL.
The only complaint I've seen is from the command-line server with the following message, relating to a depreciated bit of code in the pagination package:
...\site-packages\django\core\handlers\wsgi.py:126: RemovedInDjango19Warning: MergeDict is deprecated, use dict.update() instead. self._request = datastructures.MergeDict(self.POST, self.GET)
-
-
That's really good news. So different from when I tried last time, before the painful model renaming!
The next thing to understand would be the upgrade from 0.6 to 0.7 with a corresponding upgrade from 1.6 to 1.8, plus the additional models, your data migration for the time of day, and the sql thing from this issue...
-
reporter I've just tested upgrading Django from 1.6 to 1.8 with a system that uses PostgreSQL. The database has 7000+ CT and 18000+ Radiographic examinations. The OpenREM code was already up-to-date with this issue241MedianValues branch, so this is only a test of whether Django 1.8 works OK with PostgreSQL and my custom median calculations. It worked, using the same method as above.
-
reporter Mean and median were being calculated when just median selected. Altered code so that only required averages are calculated, saving time. References issue
#241.→ <<cset 3faafb1a56c6>>
-
reporter Added Django 1.8 migration file to add median function to PostgreSQL database. Not sure what will happen if you're using something else. References issue
#241.→ <<cset 73c90733c02c>>
-
reporter Added reverse_sql to the median migration file so that the migration can be reversed. I haven't tested this yet. See https://docs.djangoproject.com/en/1.8/ref/migration-operations/#runsql. References issue
#241.→ <<cset 8bafe8358d35>>
-
reporter Added a check for PostgreSQL use in the median migration file so that the median functions are only executed if PostgreSQL is being used. Untested. References issue
#241.→ <<cset d8c15b56f2bb>>
-
reporter Changed Django requirements to >=1.8; removed south. References issue
#241and#118.→ <<cset 7c35e56573e2>>
-
reporter The migration file for this should also incorporate the data migration mentioned in issue
#223. -
reporter Combined datetime and median migrations into one file. Needs testing. References issue
#241and#118.→ <<cset 05776b94a13a>>
-
reporter Commit 0f927c4 has a working migration file for use with a version 0.6 to 0.7 upgrade.
0.7 will install Django 1.8. The user must:
-
Manually remove / rename all numbered migration files in their migration folder
-
Run
python manage.py makemigrations
-
Run
python manage.py migrate --fake-initial
-
Rename the
00xx_python manage.py
file to0002_python manage.py
-
Run
python manage.py migrate
to apply the migrations -
Finally run
python manage.py makemigrations
thenpython manage.py migrate
.
You can now restart the server.
-
-
reporter I've just tested this using an SQLite-based 0.6 install of OpenREM, populated with some test data. The upgrade to Django 1.8 and the above migration works for this too. There is no median function available with the SQLite install, as expected. No errors were shown.
-
Should step 4 be 4. Rename the
00xx_add_median_function.py
file to0002_add_median_function.py
?If so, is this necessary for step 3. to work? Or could the file be named 0002 in the first place as we are deleting all previous migrations so we know this is the first post initial one?
-
reporter Updated name of new migration file. Removed requirement for other migration file. References issue
#241.→ <<cset 0732df0454bf>>
-
reporter Hi @edmcdonagh, I've just checked. You can't have any numbered .py files when the first makemigrations is run - you receive a screen full of errors if the new 0002 file is present. However, the simplified list of commands below works fine:
Manually remove / rename all numbered migration files in their migration folder. Leave the new
0002_add_median_function.py.unactive
file there.-
Run
python manage.py makemigrations
-
Run
python manage.py migrate --fake-initial
-
Rename the
0002_add_median_function.py.unactive
file to0002_add_median_function.py
-
Run
python manage.py makemigrations
-
Run
python manage.py migrate
to apply the migrations
You can now restart the server.
-
-
Do you want me to put a current build on pypi @dplatten?
-
reporter Hi @edmcdonagh - it would be good to put a build on there once the Median branch has been merged back into develop - are you OK to do that?
-
Added new django aggregate function to calculate median. The database needs to have a median function added (included in medianAggregateSQL.txt) - this is postgresql-specific at the moment. References issue
#241.→ <<cset 1a436ab72cc2>>
-
Added median data as a second series to the plot of acquisition DAP per protocol. Renamed the chart option to 'mean and median' rather than just 'mean'. References issue
#241.→ <<cset e6ac8cfa6b7c>>
-
Updated median SQL calculation to be more generic. I'm unsure of the UNNEST at the moment. References issue
#241.→ <<cset df6657e7faab>>
-
Added postgresql-specific SQL back as it seems that other databases won't be able to use this. I've removed the float8 conversions, as it seems to be faster without. References issue
#241.→ <<cset 5fe3b0273d21>>
-
Added DAP per acquisition JavaScript chart files for mean, mediand and combined plot. Updated dxfiltered.html to point to the combined one. References issue
#241.→ <<cset a13bfcc85b76>>
-
Added new userProfile field called median_available, set to False by default and cannot be edited by the user. Added code to views.py to check status of median_available and also if PostgreSQL is being used in order to determine if median values can be calcualted. Pass the result to the dx plot calculations. References issue
#241.→ <<cset d8f8b7d04456>>
-
Added user choice to plot either mean, median or both. Incorporated this option into views.py and dxfiltered.html so that the required DAP per acquisition protocol plot is produced. References issue
#241.→ <<cset 3a814629f53e>>
-
Added median to the DX mAs plot. References issue
#241.→ <<cset 5d897e743e14>>
-
Added median to the DX kVp plot. Tweaked one of the mAs plot JavaScript files. References issue
#241.→ <<cset 221821c7d340>>
-
Added code to create a median acquisition DAP over time plot. The median is used if the user has selected 'median' or 'both' in the 'Averages' choice. A plot with median and mean would be too busy. References issue
#241.→ <<cset 41c5b118675a>>
-
Mean and median were being calculated when just median selected. Altered code so that only required averages are calculated, saving time. References issue
#241.→ <<cset 3faafb1a56c6>>
-
Added Django 1.8 migration file to add median function to PostgreSQL database. Not sure what will happen if you're using something else. References issue
#241.→ <<cset 73c90733c02c>>
-
Added reverse_sql to the median migration file so that the migration can be reversed. I haven't tested this yet. See https://docs.djangoproject.com/en/1.8/ref/migration-operations/#runsql. References issue
#241.→ <<cset 8bafe8358d35>>
-
Added a check for PostgreSQL use in the median migration file so that the median functions are only executed if PostgreSQL is being used. Untested. References issue
#241.→ <<cset d8c15b56f2bb>>
-
Changed Django requirements to >=1.8; removed south. References issue
#241and#118.→ <<cset 7c35e56573e2>>
-
Combined datetime and median migrations into one file. Needs testing. References issue
#241and#118.→ <<cset 05776b94a13a>>
-
Updated name of new migration file. Removed requirement for other migration file. References issue
#241.→ <<cset 0732df0454bf>>
-
Won't be able to do the build right now - might not be till this evening now
-
Changing spelling of migration file. Refs
#241→ <<cset a9d51c326742>>
-
New version uploaded David - I changed unactive to inactive - I hope you don't mind!
-
reporter Just tried upgrading from 0.6 to 0.7.0b2.
makemigrations failed at the first attempt because 'south' hasn't been commended out in settings.py. Once I commented that out it worked as expected.
There may need to be a migration file for fresh 0.7 installations that creates the PostgreSQL database functions.
-
reporter - changed status to resolved
Now working, provided that PostgreSQL is being used.
-
- changed milestone to 0.7.0
-
reporter Hi @edmcdonagh. I've never liked my implementation of the PostgreSQL median function. I specifically don't like the fact that the values that it returns are 10,000,000,000 larger than they should be.
This morning I've removed this "feature" from my test install, and everything, including the calculation of the median of values that are all less than 1.0, works fine. I'm going to commit my changes to the custom database function, and also the necessary changes to views.py, but will wait until the
#332branch is merged back into develop. -
Does this have any implications for non-PostgreSQL databases? And database migrations?
-
reporter There are no implications for non-PostgreSQL databases.
Users who are already using the median function (anyone using beta versions of 0.7) will have to reapply a bit of SQL that will replace their existing median function with an updated one. This can be done via a migration file.
Users who are still on 0.6 won't be affected, as they don't have the median function yet. The migration file for these users includes the creation of the database median function, which can be adjusted so that it doesn't include the
*10000000000
bit.What do you think?
-
Sounds fine. We just need to make sure we get the migration file and docs right.
-
reporter Added updated median function for users that are already using the median database function. Removed need for median values to be multiplied by 10,000,000,000. References issue
#241→ <<cset 343fcc2e3fed>>
-
reporter Updated migration files. They are untested in their current state. References issue
#241.→ <<cset 7891e1c07c31>>
-
reporter @edmcdonagh, the djp demo site is up-to-date with the current develop branch, including the revised median database function.
-
reporter The code in the link below may enable medians to be calculated regardless of the database the user has installed: https://medium.com/squad-engineering/hack-django-orm-to-calculate-median-and-percentiles-or-make-annotations-great-again-23d24c62a7d0
-
reporter Case-insensitive setting: removed unnecessary annotation of the QuerySet; doing this in the Pandas DataFrame instead. I've also just realised that we don't need the database median function anymore, as the averages are calculated by Altair: medians will now work with any database (although I've not verified that this works). References issue
#477and#241→ <<cset af235ae18b13>>
-
reporter Removed the checks for median availability from the CT chart section of view.py. References issue
#477and#241→ <<cset d1b04d1c35ce>>
- Log in to comment
This is something that other users of django have wanted to do. I can't find a definitive solution. http://stackoverflow.com/questions/942620/missing-median-aggregate-function-in-django has a function that will calculate the median given a queryset and the field to calculate the median on, but it doesn't work within an aggregate or annotate call: