Files (without extension) don't seem to be uploading to S3

Issue #12 resolved
Ophir Horn created an issue

Hi,

I just received an upload of a few GIGs of data (~50 files, 142MB each). These files all sit in my uploads folder, but none of them have been uploaded to S3.

The crontab looks like:

sudo incrontab -l
/home/ECMWF/home/ECMWF/uploads IN_CREATE,IN_DELETE,IN_CLOSE_WRITE /usr/local/bin/movetos3.sh /home/ECMWF/home/ECMWF/uploads $# telluslabs-ftp/ECMWF/uploads/ $%

The dir looks like:

ls -l
total 8579608
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:17 D1D10310000103100011
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:18 D1D10310000103103001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:18 D1D10310000103106001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:19 D1D10310000103109001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:19 D1D10310000103112001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:19 D1D10310000103115001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:20 D1D10310000103118001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:20 D1D10310000103121001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:21 D1D10310000110100001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:21 D1D10310000110103001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:21 D1D10310000110106001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:22 D1D10310000110109001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:22 D1D10310000110112001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:23 D1D10310000110115001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:23 D1D10310000110118001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:23 D1D10310000110121001
-rw-r--r-- 1 ECMWF ECMWF 142640760 Oct 31 11:27 D1D10310000110200001
-rw-r--r-- 1 ECMWF ECMWF 123189720 Oct 31 11:27 D1D10310000110203001
...

Looking at the log file, I get the following:

The user-provided path /home/ECMWF/home/ECMWF/uploads/D1D10310000103100011.tmp does not exist.
2017-10-31 11:17:56 - Failed to move file /home/ECMWF/home/ECMWF/uploads/D1D10310000103100011.tmp to s3
2017-10-31 11:17:58 - Received event IN_CREATE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103103001.tmp
2017-10-31 11:18:17 - Received event IN_CLOSE_WRITE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103103001.tmp
2017-10-31 11:18:17 - Moving file /home/ECMWF/home/ECMWF/uploads/D1D10310000103103001.tmp to s3...

The user-provided path /home/ECMWF/home/ECMWF/uploads/D1D10310000103103001.tmp does not exist.
2017-10-31 11:18:17 - Failed to move file /home/ECMWF/home/ECMWF/uploads/D1D10310000103103001.tmp to s3
2017-10-31 11:18:20 - Received event IN_CREATE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103106001.tmp
2017-10-31 11:18:42 - Received event IN_CLOSE_WRITE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103106001.tmp
2017-10-31 11:18:42 - Moving file /home/ECMWF/home/ECMWF/uploads/D1D10310000103106001.tmp to s3...

The user-provided path /home/ECMWF/home/ECMWF/uploads/D1D10310000103106001.tmp does not exist.
2017-10-31 11:18:42 - Failed to move file /home/ECMWF/home/ECMWF/uploads/D1D10310000103106001.tmp to s3
2017-10-31 11:18:44 - Received event IN_CREATE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103109001.tmp
2017-10-31 11:19:04 - Received event IN_CLOSE_WRITE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103109001.tmp
2017-10-31 11:19:04 - Moving file /home/ECMWF/home/ECMWF/uploads/D1D10310000103109001.tmp to s3...

The user-provided path /home/ECMWF/home/ECMWF/uploads/D1D10310000103109001.tmp does not exist.
2017-10-31 11:19:04 - Failed to move file /home/ECMWF/home/ECMWF/uploads/D1D10310000103109001.tmp to s3
2017-10-31 11:19:07 - Received event IN_CREATE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103112001.tmp
2017-10-31 11:19:30 - Received event IN_CLOSE_WRITE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103112001.tmp
2017-10-31 11:19:30 - Moving file /home/ECMWF/home/ECMWF/uploads/D1D10310000103112001.tmp to s3...

The user-provided path /home/ECMWF/home/ECMWF/uploads/D1D10310000103112001.tmp does not exist.
2017-10-31 11:19:30 - Failed to move file /home/ECMWF/home/ECMWF/uploads/D1D10310000103112001.tmp to s3
2017-10-31 11:19:33 - Received event IN_CREATE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103115001.tmp
2017-10-31 11:19:53 - Received event IN_CLOSE_WRITE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103115001.tmp
2017-10-31 11:19:53 - Moving file /home/ECMWF/home/ECMWF/uploads/D1D10310000103115001.tmp to s3...

The user-provided path /home/ECMWF/home/ECMWF/uploads/D1D10310000103115001.tmp does not exist.
2017-10-31 11:19:53 - Failed to move file /home/ECMWF/home/ECMWF/uploads/D1D10310000103115001.tmp to s3
2017-10-31 11:19:55 - Received event IN_CREATE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103118001.tmp
2017-10-31 11:20:18 - Received event IN_CLOSE_WRITE on file system object /home/ECMWF/home/ECMWF/uploads/D1D10310000103118001.tmp
2017-10-31 11:20:18 - Moving file /home/ECMWF/home/ECMWF/uploads/D1D10310000103118001.tmp to s3...

...

Comments (10)

  1. Robert Chen

    Hi Ophir,

    The SFTP client is probably using some kind of resume/transfer feature where it streams bits into a temp file. This feature interferes with how SFTP Gateway operates, since it's renaming the file with an extension (usually .tmp or .filepart).

    Do you happen to know what SFTP client is being used? If so, I can try to figure out if there's a way to disable the resume/transfer setting.

    Thanks!

    Robert

  2. Ophir Horn reporter

    Hi Robert,

    No, unfortunately, I don't know (or have control of) the SFTP client being used...

    Is there a way to kick off the upload of everything on the folder? I tried moving all the files to a tmp directory, then moving them back - but I got the same errors back (this is why I wonder if it is a property of the SFTP client...)

  3. Robert Chen

    Can you try doing the following?

    cd /home/ECMWF/home/ECMWF/uploads/
    find . -type f -exec touch {} \;
    

    This should touch all the files in the uploads/ directory. Sometimes, touch is all that's necessary to trigger a file to upload. (these files are kind of large, so you can either wait, or tail the /var/log/movetos3/movetos3.log file to monitor progress.

    If that doesn't work, let me know and I can try to troubleshoot further.

    (A note about moving the files to a tmp directory: the 'mv' command doesn't trigger the IN_CLOSE_WRITE event that we're looking for. Something has to be written to the file, and in many cases, touch will work)

  4. Ophir Horn reporter

    OK - this seems to have triggered the move... Now it seems that I am having memory issues (I am at t2.small instance). Will increase instance memory and retry

  5. Ophir Horn reporter

    So, increasing the instance size definitely helped.

    What I am concerned about is how to prevent this going forward... I will be having these payloads dumped in daily. They are pretty large and a timely move to S3 is essential for us.

    Any suggestions?

  6. Robert Chen

    As for preventing this issue going forward...

    You could have a cron job that touches all the files.

    sudo crontab -e
    
    * * * * * cd /home/ECMWF/home/ECMWF/uploads && find . -type f ! -name *.tmp ! -name *.filepart -exec touch {} \;
    

    This runs every minute, and touches all the files in the ECMWF user's uploads directory.

    The thing to be careful about -- if the user is in the middle of uploading a large file with the resume/transfer feature, you want to avoid running touch on any .tmp or .filepart file (which will prematurely upload it to S3). So the find command excludes these file extensions.

    On our roadmap, we're trying to figure out a way for SFTP Gateway to work with SFTP clients with the resume/transfer feature. I can notify you once this feature is available. But in the meantime, try using the cron job and let me know if you run into any issues.

    Thanks!

  7. Robert Chen

    Just as an fyi, you might want to add -mtime +1 (won't touch anything modified within the past day), just in case someone's in the middle of uploading a file. And then just run it once at midnight.

    sudo crontab -e
    
    0 0 * * * cd /home/ECMWF/home/ECMWF/uploads && find . -mtime +1 -type f ! -name *.tmp ! -name *.filepart -exec touch {} \;
    
  8. Log in to comment