Link Search Menu Expand Document

ASWF CI Working Group

Meeting: 13 November 2019

Attendees

  • Daniel Heckenberg (Animal Logic, TAC Chair)
  • Jean-Francois Panisset (VES Technology Committee)
  • Larry Gritz (Sony Pictures Imageworks)
  • Sean Looper, Amazon Web Services
  • Trevor Thomson, Blue Sky
  • Eric Reinecke, Netflix, OpenTimelineIO
  • Andrew Grimberg, Linux Foundation
  • Michael Min, Netflix
  • Doug Walker, Autodesk
  • Aloys Baillet, Animal Logic

Agenda & Notes

ASWF CI Goals to end year 2019 [0:00-0:10]

  • One more meeting before end of the year / holiday hiatus

  • Linux, Windows, Mac platform support

    • Linux is well covered, primary project platform, improvements on Windows and Mac but more piecemeal. Haven’t been able to use Docker for the build environment. Complexity in supporting that / lack of standardization. A couple of projects have been using ad hoc approaches to build on those platforms.

    • Docker on Win? JF’s post

    • Docker on Mac? JF’s post

      • Questions: experimental credits from MacStadium for Docker based macOS containers?
  • GPU CI support

  • CMake best practice

    • How can we “canonize” what we have learned
  • Top of tree builds

    • Traditional CI
  • Dependency management

    • Probably not going to make much progress before end of year

    • Describe dependencies between our projects

  • Testing with commercial components?

    • On hold until we have a project use case: none of our projects are ready for that, tricky to dive into because of legal arrangements.

    • Maya / Houdini / …

  • VFXPlatform 2020 / Python 3

    • Python 3 support / transition for all our projects
  • Trevor: starting to look at Res to build all components, ties in to dependency management / top of tree builds. Daniel: similar at Animal Logic. Would Res and ASWF be “natural partners”, possibly good project for next year.

Follow ups: [0:10-0:30]

  • GPU-enabled tests (Michael, Andrew, Sean)

    • Making use of AWS GPU accelerated instances

    • Proof of concept of using that hardware from our Azure Pipelines CI instance

    • Not yet solving problem of dynamic provisioning

    • Sean: spoke with Aloys, he mentioned that he put in a ticket with LF RelEng to set him up with AWS account, haven’t heard back so far. Sean just needs to be given an account to provide credits. Discussing DevOps workflows for forks of projects, other projects that may not map directly to ASWF workflows. We need to first solve the obvious case of a ASWF AWS account for official builds, then figure out other cases for contributors. Daniel: started a thread with LF RelEng team. Included people with elevated permissions in email thread, will add JF and Sean.

    • Project to create GPU enabled builder with Docker enabled: updated Terraform GCE / Azure code, working on AWS next. https://github.com/jfpanisset/cloud_gpu_build_agent

    • Aloys: got access to larger system on Azure, wasn’t able to tie into his Azure Pipeline project. Should be able to fork the GitHub repo but keep the ASWF Azure Pipelines organization. This would allow sharing resource pools.

    • Daniel: this was predominantly for Aloys use case of having access to larger system for building ASWF Docker containers. We won’t have a good solution until we have more elastic provisioning of dedicated / special instances.

    • Daniel: also looking at working with Azure team, contact provided by Dave Fellows, to make better use of caching on Azure instances. Aloys: using the full suite of caching functionality provided by Azure Pipelines (still in beta / preview state). Create a cache key, list of fallback keys, using for ccache. Will work with JF to add simple example / support in ASWF sample project, will benefit other projects to reuse results of previous builds via ccache. Cache invalidation is not automatic, must increment a version number to invalid cache.

    • JF: looking at “chained” builds: use Azure Pipelines provided build instances to create GPU instances via Terraform.

  • Build matrix including Mac requirements for GPU testing (Wave, Michael)

  • Azure Pipelines follow ups (Windows, performance, caching) (Dave, Aloys, JF)

    • JF: Some work on Windows to setup service connections etc to allow fork workflows.
  • ASWF Docker images (Aloys)

  • Python3 / OpenEXR (Daniel)

  • ASWF / Project permissions and workflows (Daniel)

    • CI setup and testing

    • Project fork workflows

CI Updates for Projects [0:30-0:40]

  • OpenTimelineIO

    • Eric: Looking at incorporating ASWF infrastructure

    • Migrated to C++ with Python bindings, need to do builds that migrate to repos such as PyPy

    • What’s the best way to get started with CI integration. Build environment is easy use case, Python 3, C++ 11, Docker, not a lot of outside dependencies.

    • Daniel: could look at the ASWF Template project, gives a pretty good update of current approaches, bare bones examples of steps that need to be taken to conform to most of the ASWF standards, including CI.

    • On Python specific, which project would be the best example? Preparing artefacts for other package distribution systems? OpenEXR transitioning to pybind11, and that’s what OpenTimelineIO is already using.

    • Larry: there are many projects using pybind11 in our space.

    • Eric: how do you correctly generate Python artifacts?

    • Eric: need credentials for logging on to Azure Pipelines instance, Josh would already have those.

    • Larry: Python Package Index authors pronounce “Py-P-I”, not “PyPi” not to confuse with the other project.

    • Daniel: could be a good topic to bring up on TAC list with wider following.

    • Best practices would be welcome for ASWF template project.

  • OpenCue

  • OpenColorIO

    • Michael has been the point person on GPU / CI, but he’s busy on other topics right now so probably not a ton of progress yet. Has been interacting with Andrew Grimberg to make use of AWS GPU instances, but stopped by lack of dynamic provisioning.

    • Still, keep adding more GPU tests all the time, eager to be able to run them.

    • Larry: current holdup is not able to dynamically get GPU instances.

    • Andrew: no dynamic builds in Azure Pipeline is true. We can stand up static instances and incur ongoing costs of keeping those instances running, or we wait until Microsoft adds support for dynamic provisioning. Unless chained builds actually work out. Will come down to how the community / ASWF wants to spend the money.

    • Larry: do we have an idea of how much we are talking about? Andrew: depends on number and size of instances. Larry: having an order of magnitude would help. Andrew: needs to know the size of instances needed. Daniel: we can provide this information now. Doug: looking for specific machine on AWS we would like to use? Andrew: AWS or Azure, want size of instance / instances. Doug: our goal is to start with one, minimal GPU requirement.

    • JF: will provide “cheapest” instances info.

    • Daniel: by default Azure Pipelines doesn’t offer dynamic provisioning, but allows external build agents added to a pool. We could use this to “cook up” our own dynamic provisioning system. Andrew: there would also have to be a step afterwards to clean up, otherwise would end up with a lot of orphan nodes. This is an issue they run into in OpenStack. Daniel: is this a problem that others are facing? Andrew: none of the other LF projects are currently doing this on Azure Pipelines, sounds similar to something they did on OpenStack, do multi-node testing, lead node connected to Jenkins would then instantiate all the other nodes it needs, then clean up after itself. Another job then looks for orphan nodes and cleans those up to avoid running up costs. Can point to work done in JJB but not clear if that would translate over to Azure Pipelines. Daniel: we should make sure to leverage Azure contacts at Microsoft. Get indication of roadmap for dynamic instances. Andrew: had heard it was a goal before end of year, but the year is running out…

    • Aloys: if we get price of small GPU instance and we keep one instance always on running for 3 / 6 months, would this be feasible? Andrew: yes, we could go with something like that. Daniel / Sean: AWS offer of resources based on “responsible” use of resources, we want to leave headroom for later, and architecturally it’s a good idea to have an elastic worker pool that expands / contracts based on demand is the right way to go.

    • Daniel: if the cost of an instance is minimal this may be a practical step to get things going.

    • Doug: nothing else to report right now, had some problems with macOS agent, some updates broke the builds, but no such problems this week. Otherwise nothing else to report. Daniel: as the group most active across platforms, do you have a sense of how useful it would be to move to a more consistent (Docker?) approach across platforms? Doug: can’t comment right now, testing a variety of configurations, would be nice to tie more directly into work done by JF / Aloys, but can’t really speak to the challenges for OCIO.

    • Sean: is the ultimate goal to be running all our testing in Docker? Daniel: the goal is to provide the right way to capture as much of the dependency infrastructure, with desirable goal to reuse in other contexts. We started out by trying to use a much more open CI approach (Jenkins / LF JJB), we switched to trying to use reusability more internal, using Azure Pipelines, with a Docker image based build environment.

    • Aloys: we have a CI-base Docker image that contains all upstream requirements (Boost, TBB, …) aligned with VFX Reference Platform version, then additional images for each ASWF project. Just added a CI-USD for all the upstream dependencies for USD, hoping to submit PR to USD project to use that build image. Bonus CI-VFXall image that has everything included, including Qt, “everything and the kitchen sink”, not a lightweight Docker instance or fast to pull, but very convenient. Receiving a lot of Qt 5.12 builds of Houdini / Maya, and those Qt builds are not ABI compatible, hoping to propose this build image as a reference that could potentially be used by vendors as well. Daniel: we wanted our CI infrastructure to be exactly that, something that can be used for VFX Reference Platform builds.

  • OpenEXR

    • Daniel: Some discussion about Python 3 happened at last TAC meeting, Kimball offered to follow up with quick pointer as to who they are currently managing concurrent Python 2 and 3 builds in their CMake configuration, could be useful for other projects. Has not happened yet.

    • Larry: nothing specific to add. CMake files are relatively readable, but will change with transition to pybind11, top of list for upcoming 2.5 release.

  • OpenVDB

CI Platform [0:40-0:50]

Action Items

  • JF

Next Steps

  • Follow up meeting: 11 December 2019