Link Search Menu Expand Document

ASWF CI Working Group

Meeting: 6 January 2021

https://zoom.us/j/757849640?pwd=QzE1K2hrL2FHSFhKK3h5Z3BWTFJsZz09

Attendees

  • Daniel Heckenberg (Animal Logic)
  • Jean-Francois Panisset (VES Technology Committee)
  • Aloys Baillet (Animal Logic)
  • Andrew Grimberg (Linux Foundation Release Engineering)
  • Sean Looper (AWS)
  • Larry Gritz (SPI)
  • Michael Dolan (OCIO, Epic Games)
  • Ryan Botriell (SPI)
  • Mark Boorer (ILM)
  • Mitch Prater (Laika)
  • Robin Rowe (Cinepaint)

Agenda & Notes

ASWF CI Goals for Year 3

  • GPU Build & Test (success!)

  • Mac, Windows & Linux (New focus)

  • Packaging / Distribution

    • Michael: seems like we haven’t touched this a lot, and needs to be a bigger focus. There have been some discussions in OCIO, trying to get a v2 release out this month, follow up will be to support package managers across the board. Definitely on the radar. Currently support in a number of package managers, but unclear how well they are supported. There’s a broken vcpkg package, and it’s part of some Linux distros. Investigating Conan, PyPi (so you can pip install the Python bindings)

    • Andrew: LF projects cannot distribute VMs because of “license aggregation” issues. You can distribute build recipes.

    • Michael: OCIO is included in HomeBrew but supported by users. Ideally would like to have a CD pipeline that can update downstream distributions.

    • Mark: often we don’t know who are the maintainers on these packages, “The Brew package is broken”, but who maintains that package.

    • C++con 2019 talk on “making life easier for package managers”

  • Testing with commercial components

  • CI WG is concerned with configuration and setup of the ASWF CI infrastructure and supporting systems such as the Docker (aswf-docker) containers, as well as any other dependencies. Initial focus was to get Linux / VFX Reference platform up and running in an easy to use and stable way. Extended to GPU-based build and test (using AWS CodeBuild), driven by requirements from OCIO. Next steps are extending platform support to macOS and Windows, currently mostly working on macOS. Also looking at packaging and distribution to provide dependencies for our projects (Docker not as well supported on other OSes), and want to make our projects easily consumable in studios.

New Items

  • CentOS transition to Stream model

    • Mitch: emailed RedHat, got a canned response, seemed encouraging for an individual using CentOS 8, but unclear how that would work for organizations. Should be possible for individuals to keep a specific version, but unclear for a company. RedHat seems to be taking a wait and see attitude.

    • Blog post: CentOS Stream: Building an innovative future for enterprise Linux

    • FAQ: CentOS Stream Updates

    • Ryan: CentOS 7 EOL is 2024, seems to be that there’s some time.

    • JF: We can’t really wait for a plan announced a week before CentOS 8 EOL?

    • Aloys: CentOS Stream may still be reasonably stable, so could still be a viable solution for studios? You can still gate updates if you want. But what should you be building against? You can easily end up releasing software built against “too new” dependencies. We already have similar problems: aswf-docker is building images based on the official centos:7 image, which would pull in 7.8, whereas most studios may still be running older CentOS versions. We should be aware of this, we might want to lock our Docker base images on a specific CentOS version.

    • Andrew: could become an issue, saw EPEL discussion about creating “EPEL Next” to follow “tip of stream”, whereas EPEL would follow current release. In particular llvm can cause incompatibilities during certain upgrades, every point release of RHEL has different point releases of LLVM which can cause binary incompatibilities. EPEL community is already seeing this.

    • Ryan: Streams is moving from a downstream release of RHEL to an upstream releases of RHEL, so losing the testing / validation that RH was doing on RHEL before. Losing some guarantee of stability, especially if we are picking our own release.

    • Andrew: LF is still having internal discussions of what they are going to do, was going to target CentOS 8, current “pets” are still CentOS 7. May end up containerize even more than was planning to do to help better control those systems. Sysadmins at LF prefer something like CentOS because of the controls, but may end up considering Debian (not Ubuntu)

    • Mark: original push for CentOS was a “cheap” way to get RHEL, could still get an Enterprise license to access RedHat support. VFX industry might look to another enterprise Linux provider (Ubuntu / SuSE / ???)

    • Andrew: Oracle Linux was discussed, it is essentially CentOS rebuilt by Oracle with their own kernel, but have option to run the RHEL kernel. There’s a script to convert CentOS to Oracle Linux. But then Oracle could change the licensing terms.

    • Mitch / Andrew: Rocky Linux is happening, but how long will it take to get infrastructure up and running and enough of the distro rebuilt.

    • Andrew: Linux Foundation is specifically not in the business to be a distro provider.

    • Mitch: not a huge problem to “pay for something of value”: providing the value is worth the price. Andrew: yes, but RHEL itself is very expensive. But not a great model for render farm.

    • Mark: hardware support from a commercial entity is also important. Also being able to point software vendors at a specific platform. SideFX may support Ubuntu as well as CentOS / RHEL.

Follow ups

  • GPU Build & Test

    • Document OCIO setup in Template project (JF)

    • Any updates on GitHub secret handling? (Andrew)

  • ASWF-docker updates

    • Documentation

    • Docker Hub repository changes.

      • Application for Open-Source status? (Andrew)
    • Releasing new 2021 set of images with the right Qt version, got help from Autodesk which shared build recipe, a few additional packages required for Qt 5.15.

    • Some activity from people trying to use the download package script, some documentation issues that need to be cleaned up to explain how this works.

    • Have a branch that’s not quite ready for merging to split each package into a package YAML file, trying to model the dependencies of all the packages we maintain in aswf-docker. Looking at how we could use something like rez / spk or at least have a good model of the dependencies between packages, and the variants between packages.

    • We’ve discussed in the past the difference between having the version number in the package YAML file, Ryan has ideas about how not to have to do that so that your package definition is valid for all versions. Rez / SPK increment the version in the package definition YAML. Sometimes also need to maintain versions across ranges of package versions. Somewhat of a detail, but still an interesting challenge for package management systems to manage a range of versions without having to branch the repo for each version / maintain a different package definition file for each version. AL has those issues internally for a few hundred Rez packages maintained in a Git repo, trying to make that maintainable, but often end up having to duplicate entire folders with package definition files, CMakefiles… Hard to maintain, hard to review. Definitely interesting issues to explore / push on in the context of packaging. It’s a different problem from having the package definition with the source code. Aswf-docker needs to maintain the last 2 or 3 versions of everything.

    • Ryan: agree, one of the design choices of SPK is to have the package definition in the source code repo, where it makes sense to have the package version, but there is a good use case for having package definitions that work across versions.

    • Mark: if package definitions are inside the code, you require checkout of all the code just to discover those versions. ILM has lots of different version control systems, not always easy to mix and match, so not having to clone all of the code bases is useful.

    • Ryan: had interesting discussion with folks at Netflix around package management, maybe there’s a larger effort that can be put together to come up with a package definition format rather than a package manager, so that we don’t have to adopt wholesale a methodology to build and release software, but at least having an agreed upon format for package definition.

    • Aloys: followed the Rez-style YAML format, simple and easy, but yes, a standard way would be useful.

    • Larry: it seems hard to believe that there isn’t something out there that’s close enough to our needs, that we could convince the owners to extend for our needs. There’s always something specific to our industry that isn’t captured by available formats? Seems weird to have to start from scratch for just a package description.

    • Ryan: agree, but there’s a weird line between build system and package managers, and that line gets crossed all the time. One of the great usability features of Rez is that it is also a build system, build all the software you need with all resolved dependencies. Separating to “just” package definitions may be difficult, you would lose a lot of usability and convenience.

    • Mark: quite a lot of the work is done around building entire distributions. Gentoo Portage seems to be the closest to our needs, but huge overkill for what we need. But sometimes our systems do need that level of flexibility to build stuff.

    • Aloys: like the idea of having a standard way of defining packages and dependencies, what’s the ordered list of dependencies to get a package. Would need this to generate complete Docker images that are well defined. The build side could be an implementation detail. So there could be a use for a lightweight spec.

    • Ryan: are you deploying the software as a package, the source code as a package, or the compiled binaries as a package? Very different compatibility requirements.

    • Mark: when you have a static package definition, often you get into situations where you want to do conditional logic / express some sort of control flow structure, can be really clumsy when dealing with static (YAML) files. Easier to do in code / a dynamic language.

    • Ryan: in a lot of ways it leads you towards CMake as a standard, but there are things it doesn’t describe / some clumsiness. But now crossing into a build system. Is it reasonable to find something that expresses everything we need?

    • Mark: end up in environment management as well, would be nice to store those in some sort of immutable fashion

    • Ryan: you end back to Rez / SPK

    • Robin: what’s the difference between Rez / SPK and a CMake-based system? Ryan: Cmake is just for build and install, it doesn’t do environment management. Also Cmake doesn’t do any dependency management, maintaining a “FindX” module is tricky. Also Cmake builds one package, Rez integrates with Cmake if that’s your chosen build system.

    • What about YAML + Jinja2? Ryan: Conda does that, they’ve managed to shove everything they need in there, but some of the files end up looking very complicated, and very hard to maintain. It “can work”, but it’s not ideal.

    • Andrew: LF has 3 classes of tooling in projects supported by RelEng team. Largest is Java-based, they are all using Maven, which has the ability to express with packages which dependencies you depend on, within a version range, can also skip particular versions, will pull down latest version it can that meets those specifications. Second most popular language is Python, pip has similar capabilities, “I depend on a package within a particular revision set”. Often we get bitten by packages that don’t specify versions they depend on, and then an upstream package introduces an incompatible version that breaks builds. So sometimes have to pin to specific versions, but that’s not desirable (like to build “top of tree”). Third most popular language is Go: Go modules are Git based, typically point at a Git SHA or HEAD, they also strictly use semantic versioning. LF tries to push projects towards semver, but some projects push back (semver too strict for them). LF RelEng team champions SemVer, communicates lots of info to downstream projects.

    • LF RelEng team is 10 engineers, supporting 12-15 different major projects, majority of these projects are networking based, OpenDaylight is one of the largest projects (although project size is going down). Also ONAP (Open Networking Automation Platform), between those two projects, supporting on the order of 500-1000 active developers, no idea how many downstreams since these projects are consumed by major products. Cisco builds their network controller on top of OpenDaylight, AT&T runs 90% of their wireless infrastructure on top of ONAP. Getting these projects to mainly use SemVer has helped keep them stable.

    • Andrew: most of the breakage happens on the Python side, less so in Java. Most of the breakage is in LF’s own tooling for Jenkins based CI. Try to find commonalities between projects and write tools to avoid duplication of effort. But tooling has grown organically, so don’t always have test coverage for every part of a tool. Seeing this in CI tools since tend to always pull from HEAD. Somewhat less likely to see these issues if you are methodical about your updates.

    • Mark: within ASWF there aren’t so many projects, and they are mostly “good players”. Inside VFX facilities, you may be in the “wild west”. Andrew: see this with the LF tooling, projects can use the tooling in unexpected ways, community members can submit fixes back to LF RelEng. Don’t have “major shifts” like going from Python 2 to 3.

    • Mark: we’re being a bit more rigid since most VFX facilities don’t have a lot of engineering support for builds.

    • Andrew: RelEng staffing is based on needs of LF projects, from partial headcount / consulting role, to full time engineers for specific projects (OpenDaylight pays for 2 full time engineers to support CI / build). Trying to leverage work done between different projects.

    • Andrew: work that Aloys has done with aswf-docker, if companies rally around this, then that can help the industry in general. Systems need to be “opinionated” to make a difference. That’s why RelEng team is a strong proponent of SemVer, a set of guidelines that we will try to enforce. Also a high bar for code review policies. Have a code repo that manages those CI systems, release engineers have maintainer rights, but in some projects, the community is self sufficient enough to maintain that.

  • Mac CI

  • Project feedback

Action Items

Next Steps