Link Search Menu Expand Document

ASWF CI Working Group

Meeting: 29 April 2020

Attendees

  • Daniel Heckenberg (Animal Logic, TAC Chair)
  • Jean-Francois Panisset (VES Technology Committee)
  • Christina Tempelaar-Lietz (OpenEXR)
  • Aloys Baillet (Animal Logic)
  • Patrick Hodoul (Autodesk / OCIO)
  • Andrew Grimberg (LF RelEng)
  • Eric Reinecke (Netflix, OpenTimelineIO)
  • Sean Looper (AWS)
  • Michael Dolan (OCIO)
  • Larry Gritz (Imageworks / OSL)
  • Brian Cipriano (OpenCue)
  • Ryan Botriell (Imageworks DevOps)
  • Jeff Bradley (Dreamworks)
  • Doug Walker (Autodesk)

Agenda & Notes

ASWF CI Goals for Year 2

  • Ordering based on survey feedback

  • GPU Build & Test

    • Getting close!
  • Mac, Windows & Linux

    • JF: will update proof of concept of building Windows container with GitHub Actions and WIndows version supported by GHA

    • Andrew: LF releng has not pushed Windows containers to a repo yet, have not run anything non-proprietary in the infrastructure they manage. Should definitely be checked with John, conversation to be started on TAC mailing list.

    • Daniel: are we confident that Docker is the correct approach for those platforms as well? There are other package management systems on those platforms.

    • Michael: OCIO currently building on all 3 platforms, currently the ASWF Docker containers are really useful for Linux builds. On other platforms only doing minimal installs, using a lot of components from the provided builder VM. Would be very valuable to be able to use Container.

    • Andrew: GitHub actions may only support Linux containers

    • Michael: not a lot of warning when the CI services update compilers or VM OS versions. So don’t have a lot of control over the environment.

    • Andrew: no specific contacts at MacStadium, but may be possible to find someone to talk to (context is the Orka offering for Docker macOS containers).

    • Eric: which macOS version do we typically target? Michael: GitHub Actions only supports a small number, 10.15.

    • Eric: started making Python Wheels to distribute OpenTimelineIO, suggested platform was 10.13 to maximize compatibility, does that apply widely? Michael: on Azure Pipelines were doing 10.13 builds initially, but that got deprecated.

    • Daniel: some of these issues could be discussed with Wave.

    • Christina: EXR macOS stopped working as well after an update, would also really like to have Docker (container) images for macOS. Also need to move EXR to GHA.

    • Daniel: has there been any movement on VFX Platform support for Windows and macOS? Stated goal, but not necessarily specific movement.

  • Packaging / Distribution

    • Discussion a year ago: packaging ourselves vs making them “well behaved” by standard packaging systems
  • Testing with commercial components

    • Missing a strong candidate with exception of Houdini in OpenVDB

GitHub Actions Migration

  • Docker invocation differences to Azure TAC thread

    • Aloys: writing a script to manage the environment variables required by GHA. Issues caused by running Docker as root user.

    • Brian: seems to work for OpenCue, will be helpful to have common infrastructure.

    • Andrew: no news from GitHub side, “they do everything as root because of how the workspace gets mounted inside the container and you won’t necessarily have rights to that workspace, better to use su / runuser”. Azure Pipelines forced a “user set off root”, and caused problems with Alpine images since it doesn’t have sudo, so they would go back to default Docker behavior.

    • Aloys: recognized the issue when running the USD unit test, which had a test looking at whether /etc was writable. Looks like we will have to live with this setup long term. So will need to provide helper scripts to change directory owners.

    • Brian: might be a good idea to use a custom action to check out code and set permissions.

    • Daniel: is there a mechanism to run actions locally? Andrew: this is an often requested feature, hopefully for later this year. Also currently no offline lint-ing capabilities.

  • Secrets handling GH issue

    • No update so far.
  • OCIO has migrated to GHA

GPU Resources

  • OCIO request for AWS setup JIRA ticket

  • Michael: OCIO migrated to GHA, process went pretty smoothly. GHA doesn’t support templates like Azure Pipeines, but 2x the number of concurrent jobs supported, which helped.

  • Andrew: GHA team is aware of lack of templates, should be able to get organization-level workflow sharing soon.

  • Andrew: need to look at work done by JF (Terraform) due to secrets sharing issue. Also use of Terraform Cloud may be an issue. JF LF releng can use an S3 bucket as a Terraform backend instead, should simplify things. Andrew: Should be able to be up and running mid next week.

  • Daniel: let’s set a goal to having some kind of GPU build running by next month’s meeting.

  • Sean reached out with information about Open Source program, initial request from LF was too large, so being broken up into separate requests.

Contributors for aswf-docker

  • Based on discussion between Daniel and Aloys

  • Call for collaborators: robustness and succession plan. This is critical infrastructure for our projects. Many of us are in a position to contribute.

  • Aloys: hijacked PRs from Brian and Larry due to complexities on how to build packages, currently doing it on his own machine since the workflow is not automated yet. Also need better documentation to make it easier to pick up, write down all the steps required to add a new package.

  • Adding packages and new releases is still a messy process, we need to move to GHA and automate all the steps, but building some packages cannot be done with free azure/github jobs (clang: 4h build, Qt: 6+h builds, PySide: very long as well) so ideally we get some AWS resources made available when building packages.

  • Aloys: Started on OSL image / package, reminded him of a few missing steps in the documentation. For instance adding a new year version.

  • Aloys: once we get more streamlined access to AWS resources this will help with very long running jobs (building Qt can take over 6 hours and runs into job timeouts on Azure Pipelines).

  • Larry: working his way into it, so far seems to work. Just kicked off a job to try it out further. Appreciate the effort in getting the 2020 images done, that will really help.

  • Aloys: 2020 images are now as complete as the other images. Only issue has been PyAlembic for Python 3, cannot seem to get it compiled so far.

  • Larry: should we pull the Alembic crew into it? Getting it to work with Python 3 should be seen as important.

  • Larry: we should have 2021 images as well, even if we know they are not final, so that projects can start working them into their CI pipelines. The earlier the better.

  • Larry: it’s also important to have versions with modern compilers, not just the more conservative VFX platform versions.

  • Aloys: it would be nice to have more compilers available, but may have impact on upstream dependencies. Larry: usually hasn’t seen this a huge problem. It’s useful to try gcc9 / c++ 17 for instance, look for new warnings for instance rather than link compatibility.

  • Aloys: should be possible to add other devtoolset versions into the images, they are just set with environment variables. A bit more complicated with clang. Larry: will work on a prototype, you can install a number of clang versions in parallel in the same Docker image.

  • Daniel: there’s been a thread on the vfx-platform mailing list talking about future compiler versions. Interaction between C++ standard version we use to build modules and mandate for the platform, related to the OS C++ standard library and the compiler C++ standard library. Red Hat devtoolset lets you build with more recent compiler by adding the more recent symbols on top of the c++ standard library which shipped with RHEL / CentOS 7, injects a small static library into every build. That introduces some different types of interoperability issues. Are we in supported range of GCC? Supported range of devtoolset? RH explicitly don’t support mixing objects built with different versions of devtoolset.

  • Daniel: Larry, are you using DTS and the OS-supplied libstdc++, or using clean builds of newer compilers?

  • Larry: at work using DTS 6 for the more modern builds, and system compilers (gcc 4.8) for older versions of Maya plugins. For “personal” projects have always used clang for a long time, haven’t run across compatibility issues. There’s a clang flag to specify which version of gcc standard C++ library. Once you do that, you don’t run into too many compatibility projects between clang versions. Understand that RedHat doesn’t want to support too many combinations.

  • Daniel: with basic precautions, we can use some flexibility in practice. Docker containers are a good way to test out these possible scenarios.

  • Larry: in CI matrix for OSL and OIIO, test against several compilers (system, clang, llvm), haven’t really run into real problems.

  • Daniel: request on Aloys part to feel that people should feel free to get involved / contribute to the aswf-docker project. Better / easier automation and documentation will help.

  • Aloys: used components from aswf-sample-project and OpenEXR. A full TSC is likely overkill. JF is now in CODEOWNERS. Issue with every branch being protected in ASWF GitHub organization: could this be related? CODEOWNERS have special privileges but can’t do a force merge for instance. Andy: TAC voted to allow certain people to have admin rights over certain repositories. Aloys: would it make sense for him to have this over the repo?

Project Specific Goals / Problems

  • ?

Next Steps

  • Follow up meeting: 27 May 2020