Link Search Menu Expand Document

ASWF CI Working Group

Meeting: 22 July 2020

Attendees

  • Daniel Heckenberg (Animal Logic, TAC Chair)
  • Jean-Francois Panisset (VES Technology Committee)
  • Aloys Baillet (Animal Logic)
  • Andrew Grimberg (Linux Foundation Release Engineering)
  • Brian Cipriano (Google / OpenCue)
  • Christina Tempelaar-Lietz (OpenEXR)
  • Larry Gritz (Sony Imageworks / OpenShadingLanguage)
  • Marshall Elfstrand (Apple Developer Relations)
  • Michael Dolan (OpenColorIO)
  • Sean Looper (AWS)
  • Bernard Lefevre (OCIO / Autodesk)

Agenda & Notes

ASWF CI Goals for Year 2

  • GPU Build & Test (Close to production on OCIO)
  • Mac, Windows & Linux (New focus)
  • Packaging / Distribution
  • Testing with commercial components

Follow ups

  • CI WG Charter (Daniel)
  • CI WG Birds of a Feather meeting at ASWF Open Source Days? (Daniel)
    • Is this needed given our regular cadence meeting?
    • Expose our work to a larger audience, opportunity to draw in other contributors
    • Would make sense to align one of our regular meetings with this program
    • Formalize the work we’ve done and what we want to do as a few slides
    • Daniel will talk to Emily to find a slot in the program
  • GPU Resources
    • OCIO request for AWS setup JIRA ticket (Andy)
    • Michael: stalled at the moment, challenge is that we can spin up the build on AWS CodeBuild, no way to investigate the logs, challenge of not having access to the AWS.
    • Andrew: just got the logs, dumped to a S3 bucket. One line Javascript error, looks like a string formatting issue. Will need to run another build to get more complete logs (as they will now be stored in a S3 bucket). The S3 bucket will be public, will be a generic ASWF public bucket for artifacts. (originally request for OpenCue artifacts).
    • Andrew: OpenCue was asking for a location to store artifacts, credentials will be stored as organization-wide secrets. Not sufficient for Terraform, also need a DynamoDB database for locking.
    • Deletion policy is automatic after 1 week, does generate costs against ASWF
    • Brian: getting credentials will be next step for OpenCue, 1 week retention should be fine. Andrew: if we need longer, that can be done (Jenkins logs are retained for 6 months). Brian: storage mostly required for passing artefacts between GitHub Actions pipelines, so 1 week retention is fine.
    • Michael: GPU builds actually working now, remaining issues is logs not being pushed back to GitHub Actions.
  • Windows
    • Proof of concept of building Windows container with GitHub Actions and WIndows version supported by GHA (JF)
    • Andrew: have not heard anything about containerized builds on Windows, AWS CodeBuild is currently only option for “elastic” builds.
    • AWS has released GitHub Action to support CodeBuild which explicitly supports GPU containers on Linux.
    • JF: we would be happy with non-GPU accelerated Windows containers at this point.
    • Andrew: GHA are more likely to offer elastic deployment of self hosted builders before they are likely to support build inside Windows containers.
    • So we should probably try to move ahead with a non-containerized solution such as Vcpkg / Conan / Conda. JF to look at what’s the package manager that would it easier to pull a set of components compatible with the VFX Reference Platform.
    • Larry: have used vcpkg on Windows, played a little bit with Conan and Conda but not extensively enough to answer the question of “how easy it would be to have a VFX Reference Platform environment”.
    • JF: vcpkg may be opinionated about preferring “latest version”
    • Larry: vcpkg is similar to Homebrew on Mac, but lacks a pre-built cached install, has to rebuild everything from scratch, for instance no way to tell it that it doesn’t need to build dependencies in debug mode when doing a release build. ASWF Containers have cut down CI build times tremendously.
  • Mac
    • MacStadium / Orka
      • Marshall: Docker on Mac runs Linux containers, not macOS containers
      • macOS virtualization is a full VM
      • MacStadium Orka is probably the best approach, has good history of hosted macOS
      • Orka looks like a reasonable approach
      • Currently using GitHub Actions on macOS runners, but:
        • No control on the macOS / Xcode version, was moved forward without much warning, breaking builds
        • Little option to access multiple versions for a build matrix
        • We typically have a window of 3 years where commercial DCCs need to support older macOS versions
        • Are there best practices / guidance as far as managing availability of older versions of macOS / SDK? How about using using a package management system to manage access to dependencies?
        • Marshall: for the main SDK components that come with the platform, that all comes with the SDK shipped with Xcode. Just announced Xcode 12 which includes the SDKs for iOS 14 / macOS Big Sur, and support to build for Apple Silicon. Guidance is to use the latest version of Xcode and SDK to build software for the latest version of macOS. But can deploy back as far as you want, in Xcode GUI can deploy back to macOS 10.9, “availability annotations” in the SDK to show how far back you want to support. There isn’t a need to build on a 10.9 macOS with a 10.9 SDK to support macOS 10.9. So recommendation is always to build on latest macOS and toolchain, but targeting older platforms if needed. Especially recommending keeping up with major Xcode releases (and in this case support for Apple Silicon).
        • Daniel: this can give backwards binary compatibility, but doesn’t guarantee that someone on an older platform can build your project if you are targeting the newer toolchain. By comparison on Linux we tend to isolate the environment + toolchain.
        • Should ASWF projects state that they only support building on latest macOS / Xcode versions?
        • Michael: had issues on both macOS and Windows when CI builders got upgraded without much warning, which broke the build a couple of times. Christina: had similar macOS issues with OpenEXR. The container approach provides insulation against those types of issues.
        • Daniel: were the breakages “meaningful”, or small details? Michael: new warnings from new compiler versions could fail builds (we enable warnings as errors). We’re trying to use the images provided by GitHub Actions as much as possible, doing less customization when possible.
        • Christina: some breakage related to Python versions, and BoostPython looking for specific Python versions.
        • Daniel: is reliable availability of specific Python version an issue? JF: bundled Python is going away in macOS. Christina: difficult to get CMake to pick up a locally installed Python version.
        • Michael: there were some missing headers to build Python on macOS, required an optional SDK installer. But OCIO now using bundled Python 3.
        • Andrew: would CMake run in a virtualenv? How is it finding Python? Aloys: the way Cmake (and Cmake packages) look for Python is messy andhard to control. There is a specific “FindPython” CMake package. Having a consistent way to find Python in ASWF packages would be useful.
        • Daniel: it was an early goal to have consistent ways to find packages across projects.
        • Marshall: planning to join CI WG meetings going forward to help.
        • Larry: it would be great if the GitHub team could be prodded into providing GitHub Actions support for running on Apple Silicon as soon as possible so projects can start supporting the platform. Need more than cross compilation, want to be able to run test suite on actual Apple Silicon machines.
    • Apple Silicon / ARM builds
      • There may be DTKs available from MacStadium
    • Discussion on Xcode version on VFX Platform mailing list
    • Big Sur shows up both as 10.16 and 11.0
      • If you are running Big Sur on Apple Silicon DTK it will report as macOS 11
      • That’s the long term goal for final release
      • You can check for an API on 10.16 -> will report 11 APIs
  • ASWF-docker updates
    • Git v2
      • Everywhere in ASWF containers, no issues reported
    • VFXPlatform 2021
      • Newer compiler testing (GCC 9, Clang 10)
      • Update to CUDA 11? Not sure if NVIDIA has released base Docker images yet. Will release the 2021 trial images at that point, support for CUDA in gcc 9.
    • OpenEXR performance testing for imath, useful to keep track of timing for performance tests. Can use SonarCloud, but no support for graphing of results over time (on prem version has that support). Andrew: Sonar lost the ability to create custom dashboards in latest release.
    • Aloys: seems to be difficult to find solid information on the topic of tracking performance testing. Sentry seems to have some support. Do any other LF projects keep track of performance? Andrew: mostly using Sonar, and it doesn’t work anymore. One project is using an ELK stack to stick stats into it.
    • Daniel: what about the execution environment consistency, no guarantees that the execution environment doesn’t change, can be difficult to ensure reliable results. Aloys: Sonar used to allow annotating results based on hardware changes. Elastic Search / ELK could be a good solution. Is it something that LF releng could provide? Andrew: LF releng hasn’t set up ELK stacks for any projects, but used a vendor, so a passthrough cost. Can investigate what it would look like / cost. JF: what about CDash, does that keep track of performance results?
    • Waiting for Docker support on Windows / macOS may be too long, what about using Res and S3 bucket? These would require longer lived, fairly large S3 buckets (Clang, Qt, Boost results in large artifacts).
    • Daniel: approach based on Res would be directly usable in a studio environment.

Project Specific Goals / Problems

  • Artifact storage (OpenCue, Brian)
    • S3 bucket, currently in use for OpenVDB
  • Performance testing (OpenEXR, Aloys)

Next Steps