Slow Performance with a lot of Repos

Issue #296 new
Marcus Marcus created an issue

Hi,

we have Version 0.3.3 (and 0.3.99) with LDAP Authentification and MySQL.

If you try to open the login-page it takes up to 8 Seconds.

You can reproduce it, if you create a lot of repos, run with only one (we have 20) kallithea process and wait for the login page. Also try to clone a repo. You will see the delay.

We analyze the problem and see that Kallithea is always reading all the repos. If the login-page is requested it makes a request for user "none" (or default) to the database. The result is (in our case), that the db gives back all the 4000 repos we have, because the default User has the permission "none" in all these repositories.

SELECT
         repo_to_perm.repo_to_perm_id AS repo_to_perm_repo_to_perm_id,
         repo_to_perm.user_id AS repo_to_perm_user_id,
         repo_to_perm.permission_id AS repo_to_perm_permission_id,
         repo_to_perm.repository_id AS repo_to_perm_repository_id,
         repositories.user_id AS repositories_user_id,
         repositories.statistics AS repositories_statistics,
         repositories.downloads AS repositories_downloads,
         repositories.landing_revision AS repositories_landing_revision,
         repositories.locked AS repositories_locked,
         repositories.changeset_cache AS repositories_changeset_cache,
         repositories.repo_id AS repositories_repo_id,
         repositories.repo_name AS repositories_repo_name,
         repositories.repo_state AS repositories_repo_state,
         repositories.clone_uri AS repositories_clone_uri,
         repositories.repo_type AS repositories_repo_type,
         repositories.private AS repositories_private,
         repositories.description AS repositories_description,
         repositories.created_on AS repositories_created_on,
         repositories.updated_on AS repositories_updated_on,
         repositories.enable_locking AS repositories_enable_locking,
         repositories.fork_id AS repositories_fork_id,
         repositories.group_id AS repositories_group_id,
         permissions.permission_id AS permissions_permission_id,
         permissions.permission_name AS permissions_permission_name
        FROM repo_to_perm
        INNER JOIN repositories ON repo_to_perm.repository_id = repositories.repo_id
        INNER JOIN permissions ON repo_to_perm.permission_id = permissions.permission_id
        WHERE repo_to_perm.user_id = 1;

This SQL-select will also called by every hg clone, Push, Pull etc, so every hg-Transaction will have a delay for answering the request.

To fix this we acitvated the Cache inside the auth.py file:

LazyProperty
    def permissions(self):
        return self.__get_perms(user=self, cache=False)
to
return self.__get_perms(user=self, cache=True)

Is there a reason why this cache was deactivated? (I know if you make changes to the permissions it takes the cache-refresh-time so the permissions takes effekt) If you use this cache, the login page came back in 0.2 seconds.

We also try to fix the SQL-Statement to come with a faster result:

Change inside the kallithea/kallithea/model/db.py

classmethod def get_default_perms(cls, default_user_id): 
    q = Session().query(UserRepoToPerm, Repository, cls) \
     .join((Repository, UserRepoToPerm.repository_id == Repository.repo_id)) \
     .join((cls, UserRepoToPerm.permission_id == cls.permission_id)) \
     .filter(UserRepoToPerm.user_id == default_user_id

to

classmethod def get_default_perms(cls, default_user_id):
    q = Session().query(UserRepoToPerm, Repository, cls) \
        .join((Repository, UserRepoToPerm.repository_id == Repository.repo_id)) \
        .join((cls, UserRepoToPerm.permission_id == cls.permission_id)) \
        .filter(UserRepoToPerm.user_id == default_user_id) \
        .filter(cls.permission_id<>1)

The Webgui then works fine, but an hg clone refuses with an Permission denied.

We are thinking about an own cache-region only for the repositories and an flush Routine, if you update the Permissions in Kallithea. What do you think?

Regards Marcus

Comments (4)

  1. Thomas De Schampheleire
    • edited description

    Thanks a lot for this analysis.

    With your last change that causes a clone failure, is that a clone with a user contained in the URL (and thus authentication) or just a plain clone as anonymous (I'm guessing the latter) ?

    The 'cache=False' parameter in 'def permissions' was already present since the fork of Kallithea from Rhodecode.

    However, conceptually it makes little sense to get permission data for all repos when e.g. logging in, cloning one particular repo, etc. It does make sense when presenting a list of all repos, as on the main page or admin pages.

  2. Thomas De Schampheleire

    I dug a little deeper, adding a traceback on the __get_perms function to see how it gets called. When accessing the login page this is through the login.html template, which has a line:

    %if h.HasPermissionAny('hg.admin', 'hg.register.auto_activate', 'hg.register.manual_activate')():
    

    Which needs to check the global permissions, which seems to take a lot of time with many repositories.

      File "_base_root_html", line 211, in render_body
      File "_login_html", line 68, in render_body
      File "/home/tdescham/repo/contrib/kallithea/kallithea-incoming/kallithea/lib/auth.py", line 940, in __call__
        global_permissions = request.user.permissions['global'] # usually very short
      File "/home/tdescham/repo/contrib/kallithea/kallithea-incoming/kallithea/lib/vcs/utils/lazy.py", line 43, in __get__
        value = self._func(obj)
      File "/home/tdescham/repo/contrib/kallithea/kallithea-incoming/kallithea/lib/auth.py", line 541, in permissions
        return self.__get_perms(user=self, cache=False)
      File "/home/tdescham/repo/contrib/kallithea/kallithea-incoming/kallithea/lib/auth.py", line 600, in __get_perms
        traceback.print_stack()
    

    It looks to me that method '__get_perms' is too coarse grained. As the log in that method indicates, it gets the full permission tree for a user, which could be very large.

    I guess the real solution to solve this issue is to rework that code so that only the needed permission checks are done.

    In the past @kwi has shown to be an 'lib/auth.py' master :-D but I'm not sure if he's still up for it.

    @Skywalker28 You are also more than welcome to further discuss this and suggest a solution in this direction.

  3. Marcus Marcus reporter

    I am tryign to clone like:

    hg clone http://myUser@mercurial.de/Group1/repo1
    

    I totally agree, that Kallithea should only check the Permissions of the requested Repo (and even no check at the login page if no credentials are given)

    I will test for a workaround with some other changes and inform after tests are completted.

  4. Mads Kiilerich

    The idea is that permissions should be cached instead of computing almost-the-same-thing multiple times.

    And if no credentials are given, it would still be relevant to check if there should be anonymous access.

    But apparently, something doesn't work well in your setup.

  5. Log in to comment