Link Search Menu Expand Document

ASWF CI Working Group

Meeting: 27 May 2020

Attendees

  • Daniel Heckenberg (Animal Logic, TAC Chair)
  • Jean-Francois Panisset (VES Technology Committee)
  • Andrew Grimberg (The Linux Foundation, Release Engineering)
  • Patrick Hodoul (Autodesk / OCIO)
  • Brian Cipriano (Google / OpenCue)
  • Sean Looper (AWS)
  • Larry Gritz (OSL / Sony Imageworks)
  • Aloys Baillet (ASWF Docker / Animal Logic)

Agenda & Notes

ASWF CI Goals for Year 2

  • GPU Build & Test (Close to production on OCIO)
    • See discussion on AWS / LF infrastructure
  • Mac, Windows & Linux (New focus)
  • Packaging / Distribution
  • Testing with commercial components
  • CUDA version / GCC
  • VFX-2021 images

Working Group process

Follow ups

  • GPU Resources
    • OCIO request for AWS setup JIRA ticket (Andy)
    • AWS / LF resource request (Sean, Andy)
      • Waiting for infrastructure in place. Andy sat down with required people, proof of concept from JF runs into issues with internal Terraform usage at LF, also issues with instances not being torn down if the build fails. Take a look at “CodeBuild”: CI system inside AWS which can be launched from a GitHub Action, as long as AIM has correct right, it will generate instances you need and tear them down at the end of the run. GitHub Action would set up IAM in proper way, call CodeBuild to run another YAML file to run execution on top of GPU container or VM and tear down at the end. Protects against the dangling resource problem, and wouldn’t need a separate AWS sub-account. Andrew trying to get proof of concept running.
      • How about using S3 bucket to store Terraform state? Because of use of stateful Terraform, this is a problem. Also need a DynamoDB for S3 lock files, hence the suggestion for AWS CodeBuild. Will also allow consumption of ASWF Docker containers directly. CodeBuild has an option for requesting a GPU container, so should be possible to translate the PoC into CodeBuild. Will need to create a CodeBuild project.
      • Drawback is lock-in to AWS, whereas Terraform-based approach worked on multiple clouds.
      • Daniel: this only becomes a real concern when the amount of configuration in CodeBuild becomes non-trivial. Have used CodeBuild tangentially as a way to bring AWS into internal CI build pool, but will be migrating away due to limitations placed on machine image and mount points, so moving back to working on raw VMs (Animal Logic). Andrew: seems to be able to run on raw machine images.
  • GitHub Secrets at Org level (JF)
    • https://lists.aswf.io/g/tac/message/1444
    • Andrew: SonarCloud token converted to an organizational token, each ASWF project becomes a separate “project” under the SonarCloud ASWF organization, OpenEXR has two since it runs two separate scans. This happens automatically for Java projects.
    • LF releng will create SonarCloud project for each new project.
  • Shared Templates for GitHub Actions
    • Andrew: no movement from Microsoft so far
    • Aloys: main reason why aswf-docker is still on Azure Pipelines, since it uses templates a lot. Could possibly use Matrix instead.
    • Andrew: some frustration in releng team on lack of support for templates. One of the first things discussed in monthly meeting with Microsoft.
    • Daniel: could ASWF also try to mention our need for this functionality to Microsoft?
  • GitHub Actions docker support for non-Linux OS? (Daniel)

  • Windows
    • Proof of concept of building Windows container with GitHub Actions and WIndows version supported by GHA (JF)
      • Containers not currently supported by GHA on Windows or Mac build hosts
      • Would need additional steps (custom GHA builders?)
      • JF: no progress this month, but still committed to try this
    • LF releng has not pushed Windows containers to a repo yet, have not run anything non-proprietary in the infrastructure they manage. Conversation started on TAC mailing list. (Andrew)
      • Microsoft doing a lot do making Windows more container / cloud friendly. Reference an external base image, container gets instantiated on a machine that has the proper Windows license.
      • Daniel: Aloys, do you see issues with creating containers layered on top of a reference image? Not clear, haven’t done much investigation so far. Should organizations that need to use Windows contribute more on this front?
      • JF: is it worth trying to make the aswf-docker project multi-platform? Aloys: should look pretty much the same, structure should be pretty similar, would make sense. Mac should look pretty similar. Larry: could use bash scripts on WSL
      • Daniel: all projects are doing Windows and macOS somehow, but not as nice a Linux, want to be able to leverage common infrastructure. Larry: want to get rid of the repetition and possible errors as Linux-centric developers try to cut-n-paste Linux code in the Windows/Mac build infrastructure.
      • Daniel: consensus seems to be that what we get from aswf-docker is what we would want to emulate on other platforms. Commonality is a big goal.
      • JF: could the configuration encapsulated in the aswf-docker repo be broken out into a separate repo to be consumed by other projects that need to know the required component versions for the ASWF / VFX Reference Platform environment?
  • Mac
    • Stability of Mac OS VM versions on GHA
      • An issue for OpenEXR / OCIO: difficult to keep your CI system stable as the macOS version is quickly upgraded on CI system builders
      • An argument for container-based build system
    • Docker alternatives on Mac (Wave)
      • Daniel in contact with some people at Apple, how to address build stability vs staying up to date.
    • MacStadium / Orka (JF, Andy)
    • Code signing becoming more and more important on macOS platform if we want to distribute binaries?
    • Andrew: LF does not have an infrastructure for building signed macOS apps. At present no infrastructure for acquiring code signing certificates.
    • Daniel: does Artifactory offer this kind of infrastructure? Andrew: some support for signing artifacts at release time, but no support for code signing macOS binaries AFAIK.
    • Andrew: code signing is possible in a public / open source project, but may not be doing the code signing in a public CI system, typically small projects where an individual developer would do the code signing manually.
    • Andrew: LF runs an application called Sigul developed by the Fedora project to sign builds, that’s how LF releng signs Java artifacts and RPMs, ISO files. Current version based on Python 2.7 / CentOS 7, currently working at LF releng to start supporting Python 3.
    • Daniel: would be good to gather any info on open source projects that are doing code signing.
    • LF releng investigating whether Hashicorp Vault would be a good service to offer projects, could be used to store certificates. There is a plugin that brings some of the functionality in Sigul into Vault. Could be the only way to offer code signing in multi-cloud environments.
  • ASWF-docker updates
    • Newer compiler testing (e.g. GCC 9) / testing for VFXPlatform 2021 (Aloys)
      • Aloys: ci-common includes the base packages including DTS-6 in v1, v2 now includes DTS-9 including gcc 9 and clang 10 as a proposal. We don’t know yet what VFX Platform 2021 will include. Started building as many components as possible. CUDA 10 currently doesn’t support gcc 9, but CUDA 11 should support gcc 9 (coming soon).
      • Aloys: about to submit PR against aswf-docker for gcc 9 support, will need to patch Qt, OCIO fails with some new warnings, some CUDA-related build failures. But should be testable for OSL soon. Larry: would love to try it. Aloys: 2021 tentative image should have same content dependencies as previous years. OpenEXR seems to be fine.
      • Larry: have you contemplated being able to test against matrix of different llvms (important for OSL)?. Not a VFX Platform mandate, but important for OSL. Aloys: should be fine, LLVM is pretty big (1GB), so adding multiple LLVMs to image would become big. Larry: not using the LLVM in the container anyway, so shouldn’t be an issue. Aloys: compiling clang against DTS provided libraries. Aloys: resolved some space limitation issues on build machines by uninstalling OS components that are not required. Larry: uses pre-packaged Ubuntu packages.
      • Aloys: select clang 7 somewhat arbitrarily, experience with it at AL, and clang 10 as “latest”. Now clear how many versions should be best? Larry: there’s a range of clang versions that OSL tries to support (currently supports 7-8-9-10, used to support only current and previous, but LLVM has increased release cadence).
      • Daniel: does using DTS pre-built versions of clang toolchain make this more straightforward?
      • Larry: being able to find pre-built clang versions would help limit OSL CI build times.
      • Daniel: new mechanism in RedHat called “Software Collections” for software packaging to support parallel versions, use “scl” command to instantiate environment.
      • Larry: simple to have multiple clang / LLVM versions in parallel without interference with system compilers.
    • Aloys: Working on documentation and release process. Creating a tag should trigger the build and deployment (83 different images can be built from single repo, will be even more with 2021 builds). Currently managing all these releases is somewhat manual, so trying to automate / formalizing it.

Project Specific Goals / Problems

  • USD-WG goal (Daniel)
    • “Cross-platform build recipes / CI for USD”
    • USD Working Group Meeting Notes 13-May-2020
    • Daniel: propose that we make our CI system and docker images a way to help people build USD. Aloys has already sent some initial emails to usd-interest mailing list.
    • Sean: offering resources from release engineering at AWS to help Aloys. Asking Aloys to provide some info on where he could use some help.

Next Steps

  • Follow up meeting: 25 June 2020