sst_import_boundry speed issues

Issue #1249 new
Brent O'Connor created an issue

I seem to be having issues with importing the boundary file for the state of Washington. The boundary file (http://streamlinedsalestax.org/ratesandboundry/Boundary/WAB012011NOV29.zip) is 519M and when I run the Django management command sst_import_boundry it will run for hours and the amount of ram the python process uses will gradually grow to where it takes over 3GB of ram before I finally kill the import. I know it's importing the records because when I go and look at the boundaries in the admin site it shows a total of 400K+ records.

I would like to know A.) If the amount of records I'm seeing sounds like a correct amount for Washington and B.) If the importer isn't "broke" is there a way to speed up the importer so that it takes less time and uses less ram in the process. If the amount of records for the boundaries is 400K+ then it should be noted documentation that it will take forever and consume a lot of ram in the process.

I'm using Satchmo version 0.9.2 (2114) that is installed using pip (-e hg+http://bitbucket.org/chris1610/satchmo#egg=Satchmo). It should also be noted that I'm experiencing issue <<issue 1155>>.

Also, if someone has a better way of going about charging city and county sales tax for the State of Washington then I'm all ears.

The following steps are the steps I took in order to try and use the US_SST module. It might be a good idea to add these steps to the documentation because it might save someone else the time I used trying to figure it out.

  1. Add the us_sst module (tax.modules.us_sst) to your INSTALLED_APPS in your Django settings file.

  2. Run Django syncdb management command.

  3. Update your Satchmo settings so Satchmo uses the new module.

    • Go to the /settings/ url in your browser.
    • Under the Tax Settings, change the active tax module to "USA: Streamlined Sales Tax."
  4. Download the CSV files from http://streamlinedsalestax.org/ratesandboundry/

  5. Import the boundary and rates CSV files

    $ django-admin.py sst_import_rate /path/to/state-rate-file.csv

    $ django-admin.py sst_import_boundry /path/to/state-boundry-file.csv

Comments (1)

  1. Brent O'Connor reporter

    UPDATE

    I started my import last night before bed and let it run all night. When I got up this morning it finally finished. This is the output I got which could be added to the documentation.

    $ django-admin.py sst_import_boundry WAB012011.csv /Users/oconnor/.virtualenvs/store/lib/python2.6/site-packages/registration/models.py:4: DeprecationWarning: the sha module is deprecated; use the hashlib module instead import sha Processing: 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, ... continues to output 5199 more numbers ... Done: New: 164404. End date changed: 29. Unchanged: 357088

    So since it does eventually Finish, it would be nice if the importer could be improved to have better performance and use less ram. Also, it would be nice if the documentation could be updated to use some of my notes.

    It should also be noted that I'm still getting issue #1155.

  2. Log in to comment