Support local mirrors for all downloads (Git, files, etc) when building CEF/Chromium

Issue #1728 resolved
Adam Gross created an issue

automate-git.py is hardcoded to try to use python.bat and git.bat from depot_tools (ref: https://bitbucket.org/chromiumembedded/cef/src/5780ea8baa0f84ba447379a84f061bc2b9ced036/tools/automate/automate-git.py?at=master&fileviewer=file-view-default#automate-git.py-564 ) but AFAICT those scripts are not available anymore on the depot_tools master branch (ref: https://chromium.googlesource.com/chromium/tools/depot_tools.git/+/master ). We need to come up with a different mechanism for using git and python or just start requiring that developers have git and python in their %PATH%.

Comments (17)

  1. Marshall Greenblatt

    The bat files are created by depot_tools on first run. See depot_tools/bootstrap/win/win_tools.bat. What command-line arguments are you passing to automate-git.py?

  2. Adam Gross reporter

    Unfortunately my company has rules where the official build servers are not allowed to access external resources; we need our builds to be reproducible some day in the future and if we are depending on external git repositories, there is always the chance that they could go down or be moved. As a result, we have internal mirrors of cef.git, chromium_depot_tools.git, and chromium_src.git.

    Currently I am playing with two different approaches, each with their own pitfalls:

    1) Use "git submodule" semantics to clone the 3 git repositories into subdirectories of a specific folder, using the subdirectory naming conventions expected by automate-git.py. Then I run automate-git.py with command-line params: --download-dir=%SRCROOT% --no-update --depot-tools-dir=%SRCROOT%\depot_tools --url=<internal_cef_mirror> --branch=2454

    Even if I extract some .zip files manually to get git.bat and python.bat, I still seem to be missing supplemental python scripts such as gyp.py. As a result, I end up with an error like what is attached in error.txt. It sounds like from your previous comment, running win_tools.bat would be a prerequisite to calling automate-git.py and that would be fine; that file doesn't seem to have any hardcoded external paths in it like many other files do. I'm not sure if I will get much farther though.

    2) Use "git submodule" semantics to clone cef.git and chromium_depot_tools.git (although the latter isn't 100% necessary). Then I would set the DEPOT_TOOLS_UPDATE environment variable flag to 0 and run automate-git.py with command-line params: --download-dir=%SRCROOT% --no-cef-update --no-depot-tools-update --depot-tools-dir=%SRCROOT%\depot_tools --url=<internal_cef_mirror> --chromium-url=<internal_chromium_mirror> --branch=2454

    Note that this command uses some additions to the automate-git.py script to disable updating only depot tools and cef and overriding the Chromium git repository URL.

    In this case, I don't sync cef because it's a submodule; this is important because this means that new official builds that only have a cef hash bump will actually have a new overall hash. I also need to prevent updates of depot tools because update_depot_tools scripts have an external URL hardcoded.

  3. Marshall Greenblatt

    Generally speaking depot_tools doesn't need to be updated as frequently as it currently does by default. Usually you're pretty safe if you update it once per release branch (for example, update it immediately before building a new release branch for the first time). You could do the following:

    1. Download depot_tools using Git on your local Windows machine.
    2. Run update_depot_tools to download the necessary dependency sub-directories (python, git, etc) and create the bat files.
    3. Zip up the resulting depot_tools directory and put it somewhere that your build system can access.
    4. set DEPOT_TOOLS_UPDATE=0
    5. Run automate-git.py with --depot-tools-archive=<path_to_zip>

    Having an internal mirror of cef.git is easy, you can currently specify the URL via the --url command-line flag.

    Having an internal mirror of the various chromium Git repos is reasonably strait-forward if you rewrite the .gclient and src/DEPS files.

    However, there are still many downloads performed by Chromium's gclient runhooks step (see the hooks section of src/DEPS). Are you planning to rewrite all of those scripts as well?

  4. Marshall Greenblatt

    Having an internal mirror of the various chromium Git repos is reasonably strait-forward if you rewrite the .gclient and src/DEPS files.

    Note that automate-git.py currently supports a mechanism for patching Chromium's DEPS file: just create a patch/patches/DEPS.patch file and check it into your CEF repo. Then you'd need to:

    1. Add support in automate-git.py for a --chromium-url flag to customize the .gclient file or just write the .gclient file yourself.
    2. Keep the DEPS.patch file up-to-date with changes to the source Chromium DEPS file.
  5. Adam Gross reporter

    Thanks for all of the direction; I really appreciate it. Yeah I will definitely overwrite the part of automate-git.py that writes the Chromium URL into .gclient. Past that I agree that adding patches to "patch/patches" should work well.

    The main thing that I'm trying to figure out is which of the various DEPS files I need to patch. If I search the Chromium source for "src.chromium.org", "googlesource.com", and "googlecode.com", I see scattered mentions throughout the code, particularly in what appears to be test-specific code. Various files I have found (not an exhaustive list):

    src\build\util\lib\common\perf_test_results_helper.py
    src\chrome\browser\policy\test\bootstrap_deps
    src\chrome\test\chromedriver\test\run_java_tests.py
    src\content\test\gpu\bootstrap_depssrc\infra\scripts\legacy\site_config\config_default.py
    Several scripts in src\media\tools\layout_tests
    src\rlz\DEPS
    src\sync\protocol\sync.proto
    src\tools\bisect-builds.py
    src\tools\sync-webkit-git.py
    src\tools\cros\bootstrap_deps
    src\tools\deep_memory_profiler\download.sh
    rc\tools\deps2git\buildspec_to_git.py
    src\tools\deps2git\svn_to_git_public.py
    src\tools\dromaeo_benchmark_runner\dromaeo_benchmark_runner.py
    src\tools\findit\chromium_deps_unittest.py
    src\tools\findit\config.ini
    src\tools\gyp\DEPS
    src\tools\gyp\docs\Buildbot.md
    src\tools\perf\bootstrap_deps
    src\tools\symsrc\source_index.py
    

    Are most of these unused during builds? Or is what I'm trying an exercise in futility?

  6. Marshall Greenblatt

    The main thing that I'm trying to figure out is which of the various DEPS files I need to patch.

    You should only need to worry about the scripts run by the hooks section of the top-level src/DEPS file. In current master this includes:

    • src/build/download_nacl_toolchains.py -- disable by setting GYP_DEFINES=disable_nacl=1
    • src/build/download_sdk_extras.py -- only used on Android
    • src/build/linux/sysroot_scripts/install-sysroot.py -- only used for official Google Chrome builds
    • src/build/vs_toolchain.py -- disable by setting DEPOT_TOOLS_WIN_TOOLCHAIN=0
    • src/third_party/binutils/download.py -- uses download_from_google_storage on Linux
    • src/tools/clang/scripts/update.py -- uses urllib2 directly
    • src/build/get_syzygy_binaries.py -- uses urllib2 via depot_tools/gsutil.py
    • src/third_party/instrumented_libraries/scripts/download_binaries.py -- disabled by default
    • depot_tools/download_from_google_storage.py -- uses urllib2 via depot_tools/gsutil.py

    So it looks like there are 2 scripts responsible for performing all of the downloads, and both scripts use urllib2:

    You could either modify these scripts or perhaps set up a local urllib2 proxy (configured via environment variables) that intercepts the download requests.

  7. Adam Gross reporter

    I am getting pretty far in the process and am hoping to submit a few changes to the active branches (master and 2526?). I have attached a patch (script_changes.diff) that contains my recent changes but am not sure how best to send it out for review; the patch is relative to branch 2526. I also apologize that the patch contains a combination of different changes rolled into one diff. The changes are:

    1. Changes cmake requirement from 2.8.12.2 to 2.8.12.1. That happens to be what we have internally and it has worked well for me. Please let me know if there are any extra factors that specifically required 2.8.12.2.
    2. New automate-git.py command-line arguments:
      • --chromium-url: Allows overriding the synced Chromium URL
      • --no-cef-update: Allows bypassing the step to sync cef. Internally we use it as a git submodule for build reproducibility.
      • --no-depot-tools-update: Allows bypassing the step to update depot_tools. Again we use this as a git submodule for build reproducibility.
      • --distrib-subdir: Allows specifying the subdirectory name of chromium/src/cef/binary_distrib. Currently make_distrib.py creates a very specific directory name that is difficult for our scripts to parse and I wanted to bypass this.
    3. Support in make_distrib.py for --distrib-subdir.

    That last "--distrib-subdir" is one that I want to explain a bit more because I'm not sure I'm doing it the best way. The problem is that right now the scripts create a subdirectory named, for example on branch 2454, "cef_binary_3.2454.0.gUnknown_windows32". Our build then needs to extract that folder to a staging directory that is consumed by other components. I have found that my scripts are a lot easier to understand if I override that subdir name to something more static, like "cef-package".

    This works for us because we only build either debug or release builds and only x86, so there is no conflict between multiple folders. But I am wondering if it would be better if I just add a simpler command-line arg like "--use-basic-distribdir-name". In this case make_distrib.py can do something like "cef-package-[debug,release]-[x86,x64]". This would still work for our purposes but would make it more future-proof if building multiple configurations.

  8. Techwolf Lupindo

    I second and voted for this issue. I'me just starting out looking at doing a from source build on a non-debian system. The problem I saw right off the bat was these automated build scripts download a ton of stuff into the local sandboxed build and that sandbox is deleted with every rebuild done by the distro build and packaging system. Meaning for each rebuild I do, that ton of stuff is re-download every time, adding hours for each local fix and re-test. Being able to seperly download all the depends and point the automated build scripts to the local "mirror"(that consist of local files, not a server) would help here greatly.

  9. Marshall Greenblatt

    @grossag : Please submit a pull request against the current master branch with your changes.

    Changes cmake requirement from 2.8.12.2 to 2.8.12.1.

    What changed between these versions?

    New automate-git.py command-line arguments

    These all sound OK. We should probably add separate --no-cef-update, --no-chromium-update and --no-depot-tools-update flags, and have the existing --no-update flag just turn on all of the --no-*-update flags.

    I am wondering if it would be better if I just add a simpler command-line arg like "--use-basic-distribdir-name". In this case make_distrib.py can do something like "cef-package-[debug,release]-[x86,x64]".

    No, I think the version that you initially propose (completely user-specified folder name) is better. The caller of automate-git.py knows what they're requesting so they can append the platform or other information themselves if they choose.

  10. Adam Gross reporter

    I think we are close to having this resolved on the Cef side and maybe even finished. I'm going to leave this issue open for a little while longer but I believe that the rest of the work is on my side in implementing a urllib2 proxy as Marshall mentioned. If others want more info on how I set up my build project and what commands I pass to automate-git.py, let me know.

  11. Log in to comment