Export creates enormous PDFs
USE CASE: WHAT DO YOU WANT TO DO?
Export a clustered matrix to PDF.
STEPS TO REPRODUCE AN ISSUE (OR TRIGGER A NEW FEATURE)
- Open file "test_dataset1.txt" from the uploaded test datasets.
- Cluster both axes with default settings.
- Export whole matrix as a PDF. Probably very long waiting time occurs, Adobe Reader may temporarily freeze when attempting to open PDF.
- Inspect file properties and check file size.
CURRENT BEHAVIOR
test_dataset1.txt has 4175 rows and 3748 columns --> 15.6m+ data points. If a PDF is created for export, the export code runs for a long time. Subsequent opening of the PDF temporarily freezes the PDF software. File size ends up being 3.40GB (see screenshot)
EXPECTED BEHAVIOR
File size is reasonable (we should probably discuss this and get orientation from other software)
DEVELOPERS ONLY SECTION
SUGGESTED CHANGE (Pseudocode optional)
...
FILES AFFECTED (where the changes will be implemented) - developers only
ExportHandler.java
LEVEL OF EFFORT - developers only
medium
COMMENTS
Comments (17)
-
-
I exported large_6kx6k.tx to some of the formats. Here is the file size.
PDF - 7GB
SVG - 31GB
PS - takes a lot of time
PNG - 7MB
PPM - 920MB
-
Note, this issue is somewhat loosely related to (or may affect) issue #383.
-
- changed version to alpha04
Tentatively assigning version to alpha04. Feel free to change.
-
- changed component to Import/export
-
- removed milestone
Removing milestone: Import/export data (automated comment)
-
- changed milestone to 03
-
- changed milestone to Faizaan/Srikanth - 03
-
- changed milestone to F/S - 03
-
-
assigned issue to
-
assigned issue to
-
Just a heads up on this issue.
For this issue, my idea is to create a high dimension PNG first and then convert into PDF. I am using itextpdf for this issue. It has an AGPL licence and can be used if we have our code open sourced.
I can see significant improvements with small files, and I would like to try with the largest 6x6 file we have (just not getting enough RAM on my local machine). Right now, PDF creation is taking twice the time it takes for PNG (because of converting and loading).
EDIT - I have a test jar file if anyone wants to test with large files. you might need to append -Xmx4G while starting treeview if you are trying to export large_6x6.
Test jar - https://bitbucket.org/smd_faizan/treeview3/downloads/treeview3-all-a83adc2.jar
-
@smd_faizan I just wanted to emphasize that, while the matrix itself can be in raster format, the tree and the labels (once implemented) should be in vector and should be editable from the PDF. Would that be the case with this PNG -> PDF conversion?
-
Oh wow, i didn't know we were editing the pdf using vector graphics. Let me find out the options, thanks for pointing out. This png->pdf doesn't support as of now.
-
- changed status to open
-
Hey @abarysh, If we use raster format for matrix and vector graphics for trees, it is way faster.
test_dataset1.cdt.txt -> takes ~30 seconds to export & PDF/PS/SVG file size is ~40MB (previously, PDF was 3.4GB)
-
- changed status to resolved
submitted PR 126
-
- changed version to beta2
- Log in to comment
I've been aware that large matrices produce very large PDFs. How big is the PNG version of the export? Perhaps a solution could be to embed a PNG inside a PDF?