Maintenance routine to remove cloud file photos from cloud storage if they are already downloaded [2021:HPR]

Issue #1092 resolved
Brian Lewis repo owner created an issue

Process:

Find all document records relating to a cloud file, where cloud file removed is 0.

Check the FileDb to see if that document is already downloaded.

If so:

  • send a remove request to google drive to remove the cloud file
  • update the document to set docCloudRemoved = 1

Implement this as a REST endpoint. Return to the client a list of

cloudID - docID - result of remove operation.

Comments (11)

  1. Ghislain Hachey

    NOTE: that below is addressed by simply increasing the limit= parameter. I think this is behaving ok. I am leaving here for archival purposes. It’s good to know that some files on fileDB will never be reached of using low limit and therefore not be cleaned up and remove from cloud.

    The condition: if document is a cloudfile and is in fileDB is not enough to capture all photos to cleanup. For example, all the following were loaded in full in order to cache into fileDB.

    Then I executed the /api/cloudfiles/purge?limit=10. Looking in the database I observe that 5 correctly cleanup while two do not. But as you see the two that did not correctly cleanup are in fact in the fileDB and in document as cloudfile type.

  2. Ghislain Hachey

    Note: this is solved in the branch

    The following gets thrown when clicking on a newly added photos in a survey that was loaded (access from within the loaded survey).

    The following change fixes this (make the field null-able.

    Problem with the fix above is that it saves it with a NULL docCloudRemoved value (and not 0) and when never gets cleaned up.

    I believe setting the docCloudRemoved to 0 in the [pInspectionWrite].[RegisterInspectionPhotos] procedure should work.

  3. Ghislain Hachey

    Possible other subtle issue to investigate. If a user clicks on the photo to load it but from within the cloud file it will never make it to the fileDB. It is cached and retrieved fast but not from fileDB and hence can never be cleaned up. Even if I clear the browser cache completely it remains very fast yet it is not from the browser cache nor from having been downloaded into fileDB. It seems there is another cache at play here.

    I have a feeling that it is this built-in caching of ImageProcessor.Web.Caching.DiskCache that “creates” all our current issues. It is certainly entirely responsible for one issue: the one described within this very comment. As to the issue below in comment https://bitbucket.org/softwords/pineapples/issues/1092/maintenance-routine-to-remove-cloud-file#comment-60544763 it could also be that disk caching the small images when loading an inspection with its photos bypasses the path to Store in fileDB.

    I notice also that accessing a photos from within the cloud file and that the cloudfile has been LOADED this will trigger the Store to fileDB for that individual file.

  4. Ghislain Hachey

    Note: this requires no software fix at the moment.

    Another issue which is more of a process thing. Deleting photos in the cloud works and frees up spaces. But if the survey (and its photos) remain locally on the tablet they will resync I believe as new cloudfiles with new ID. I see this as when I cleanup on development service account and try a quick edit of the survey it resyncs with added photos (though I have not added new photos).

    Worth noting that leaving the tablet survey there untouched will not sync the photos again. A user would have to edit the survey (e.g. new comment) to trigger a sync which will had the photos.

    This is not too problematic as easily address in process: Clean on tablet when cleaning from FedEMIS. Failure to do so simply results in new same images with different IDs can be get cleaned up the same way. Only downside is unlikely possibility of ending up with duplicate photos with different IDs in fileDB. Nothing to do with this issue I would say.

  5. Ghislain Hachey

    @Brian Lewis With our current approach, if a cloud file is deleted and was never loaded (a discarded survey.) it will never be in the fileDB and never meet the condition to get cleaned up. Photos from that cloud file will remain and clutter the cloud drive.

  6. Ghislain Hachey

    Biggest issue with this remains. In order for the /api/cloudfiles/purge to pick up any files it must be viewed in full size individually. In other words, simply loading the survey with all the thumbnails is not be enough.

    I also found other instances where the photo is viewed in full from within the loaded school accreditation photos and that they do not get saved to fileDB. A bit subtle and I don’t yet understand why but I see it.

    I wonder if it is not easier to just load all photos when approving?!

  7. Ghislain Hachey

    I have a solution based on the downloading of all files on approval of an inspection. This happens in the background asynchronously and the Approval process has no overhead as it does not wait for it. Any error is captured and an email is sent to an admin user (need a new key in the web.config for this email receiver.

    This may or may not be our final approach but I’ve done a fair bit of testing and it seems to work well and will help us with our urgent need to free up space from the service cloud account. As a perk this approach solves all the remaining issues stated in above comments.

  8. Log in to comment