1. jhove2
  2. main
  3. Issues

Issues

Issue #181 new

Do not fail on 2GB+ files when having 2GB+ Memory

Asger Askov Blekinge
created an issue

The uk.gov.nationalarchives.droid.binFileReader.FileByteReader fail at reading files.

The code is funny, in that the best way to run Jhove2 right now is with as little memory as possible. The more memory you give it, the longer file sizes it will attempt to read into memory, which will cause a SEVERE slowdown. If you give it to much memory, it will fail but only after allocating and reading 2 GB

I have added comments in the code below to show how the execution proceed and fails.

BufferedInputStream buffStream = null;
        try {

            int numBytes = binStream.available(); //If the file is 2GB+ we get something close to maxInt here

            if (numBytes > 0) {
                fileBytes = new byte[numBytes]; //We now allocate a 2GB array. If we do not have oodles of RAM, we throw a OutOfMemoryException. Assume we do not throw

                buffStream = new BufferedInputStream(binStream);
                int len = buffStream.read(fileBytes, 0, numBytes); //We then read the first 2GB of the file

                if (len != numBytes) {
                    //This means that all bytes were not successfully read
                    this.setErrorIdent();
                    this.setIdentificationWarning("Error reading file: " + len + " bytes read from file when " + numBytes + " were expected");
                } else if (buffStream.read() != -1) {//But if there was more than 2GB of file, we fail
                    //This means that the end of the file was not reached
                    this.setErrorIdent();
                    this.setIdentificationWarning("Error reading file: Unable to read to the end");
                } else {
                    this.numBytes = (long) numBytes;
                }
            } else {
                //If file is empty , status is error
                //this.setErrorIdent();
                this.numBytes = 0L;
                this.setIdentificationWarning("Zero-length file");
            }
            isRandomAccess = false;
        } catch (IOException e) {
            this.setErrorIdent();
            this.setIdentificationWarning("Error reading file: " + e.toString());
        } catch (OutOfMemoryError e) { //Catching of the OutOfMemory. This is in itself a bad practice
            try { //And then we load the file as a Random Access File
                randomAccessFile = new RandomAccessFile(file, "r");
                isRandomAccess = true;

                //record the file size
                numBytes = randomAccessFile.length();
                //try reading in a buffer
                randomAccessFile.seek(0L);
                boolean tryAgain = true;
                while (tryAgain) {
                    try {
                        fileBytes = new byte[randomFileBufferSize];
                        randomAccessFile.read(fileBytes);
                        tryAgain = false;
                    } catch (OutOfMemoryError e4) { //And once more we catch the OutOfMemory. Is the REALLY the best way to figure out how much to load of the file?
                        randomFileBufferSize = randomFileBufferSize / RAF_BUFFER_REDUCTION_FACTOR;
                        if (randomFileBufferSize < MIN_RAF_BUFFER_SIZE) {
                            throw e4;
                        }
                    }
                }
                rAFoffset = 0L;
            } catch (FileNotFoundException e2) {
                this.setErrorIdent();
                this.setIdentificationWarning("File disappeared or cannot be read");
            } catch (Exception e2) {
                this.setErrorIdent();
                this.setIdentificationWarning("Error reading file: " + e2.toString());
            }
        }

Comments (0)

  1. Log in to comment