Link Search Menu Expand Document

ASWF CI Working Group

Meeting: 3 February 2021

https://zoom.us/j/757849640?pwd=QzE1K2hrL2FHSFhKK3h5Z3BWTFJsZz09

Attendees

  • Daniel Heckenberg (Animal Logic)

  • Jean-Francois Panisset (VES Technology Committee)

  • Aloys Baillet (Animal Logic)

  • Andrew Grimberg (Linux Foundation Release Engineering)

  • Sean Looper (AWS)

  • Larry Gritz (SPI)

  • Mark Boorer (ILM)

  • Robin Rowe (Cinepaint)

  • David Aguilar (Disney Animation)

  • Patrick Hodoul (Autodesk / OCIO)

  • Michelle Halliwell (WDAS)

  • Jeff Bradley (DreamWorks)

  • Ryan Bottriell (SPI)

  • Marshall Elfstrand (Apple Developer Relations)

  • Christian Schmidbauer (GC)

  • Neil Barber

Agenda & Notes

ASWF CI Goals for Year 3

  • GPU Build & Test (success!)

  • Mac, Windows & Linux (New focus)

  • Packaging / Distribution

  • Testing with commercial components

  • CI WG is concerned with configuration and setup of the ASWF CI infrastructure and supporting systems such as the Docker (aswf-docker) containers, as well as any other dependencies. Initial focus was to get Linux / VFX Reference platform up and running in an easy to use and stable way. Extended to GPU-based build and test (using AWS CodeBuild), driven by requirements from OCIO. Next steps are extending platform support to macOS and Windows, currently mostly working on macOS. Also looking at packaging and distribution to provide dependencies for our projects (Docker not as well supported on other OSes), and want to make our projects easily consumable in studios.

  • Formalization of this WG

    • Appoint a chair

    • Updating plans and agenda for rest of the year, scoping document

    • Daniel has been happy to play this role, but there are others as well

    • Could transition the role of WG leadership sometime later in the year

New Items:

  • Discussion about JFrog / Artifactory / Bintray for distributing artifacts (JFrog wants to sunset Bintray for Artifactory). Bintray does not exist in Maven by default, whereas Maven Central does. Bintray didn’t take off as much due to lack of integration in Maven.

  • Should we look at inter-project dependencies?

  • Any new needs for projects just transitioning to ASWF repo?

Follow ups

  • GPU Build & Test

  • ASWF-docker updates

    • VFX Platform 2021 Containers

    • Aloys: everything has been released, only package that hasn’t been there yet is OpenEXR 3.0, but it doesn’t exist yet! Added latest version of OCIO in the last few hours.

    • Mark: found some build issues with OCIO 2.0, so there may be a 2.0.1 coming soon. Aloys: happy to help troubleshoot. Mark: not exporting any of the Cmake targets, so some of the targets aren’t being generated. Larry has put some workarounds in OIIO. Aloys: that’s something that has changed from 1.x? Mark: yes, bug has been around for a while in betas, hasn’t been caught yet.

    • Aloys: if there’s a new release with the CMake fix, happy to pull that in. We can discuss this issue in #wg-ci Slack channel.

    • OCIO 2.0 was officially released last week, the announcement may not have gone out to the mailing list, but it is on GitHub, there’s a release tag. ASWF won’t be doing an actual press release since there was already one done for the “feature complete” status.

    • Aloys: was using RC1 previously, but saw the 2.0 target a few days ago, so that’s what got incorporated into the VFX 2021.

    • Patrick: Reference Platform 2021 specifies OCIO 2.0.x

    • Daniel: container distribution, earlier discussion on that topic. Do we know how much traffic we are generating from Docker Hub, and whether we are hitting any of the limits in the new service tiers? Andrew: no info on the traffic, but Docker has been very silent about requests for “open source rights” against LF repositories, only 2 projects have been granted, the others are having “radio silence” in 2.5 months, so don’t really know what’s going on.

    • Daniel: do we know if we are being throttled? Andrew: no, we decided to pay for Team access, our CI is using a paid account, and so is Aloys, but potentially others pulling our containers could be throttled if they are using Docker Hub anonymously. Daniel: that could affect projects such as OSL if they are part of our organization? Andrew: yes, if they have moved to using the ASWF org GitHub runners. They would be able to use the unlimited pulls.

    • Daniel: do we need to take any steps for our projects? Larry / Patrick, is this in place and working? Larry: only 1 week into having moved the repository, so haven’t tried any of this stuff, have been stalled on GPU builds, haven’t used secrets before, so there’s lot to do.

    • Christian: is Docker Hub the goto for packages? Andrew: publishing on GitHub for containers has had problems, they have rewritten their container system 3 times so far, their current iteration has some issues with hosting containers. We could try it, but until GitHub gets it right, it’s probably not a good idea. Also having people pulling our images from something else than Docker Hub means they have to jump through hoops. The simple solution of just having it in the central registry that everyone uses (Docker Hub) is usually the solution that works. Hopefully we can actually fall under the open source “unlimited pulls” license.

    • Aloys: also concerned with the multi-gig size of our images, GitHub and Microsoft have limits on individual file sizes.

    • Daniel: Chrisitan, are you interested in avoiding GitHub issues by using a different repository, or interested in making it like a “binary release” that people can just use? Christian: we ran into the Docker Hub throttling, which was an issue. Andrew: were you using anonymous or registered? Christian: anonymous. Andrew: registered account gets 2x the quota. For other projects, went with the Team account (not that expensive), or a single paid for account (for instance your CI account). Owners of Docker are trying to recoup bandwidth costs.

  • Secrets Handling

    • Daniel: forking, GPU builds. Andrew: only way to access GPUs is through CodeBuild, requires LF RelEng to set up on a project by project basis.

    • https://docs.google.com/document/d/1Ypy95ozoNcLv11GrTqK0Z_4bpfKGpEyCXjUSNEKGylE/edit# Using Pull Request Target as a trigger workflow allows you to do triggering inside the receiving end repository in PRs, so those PRs would have access to all secrets. There’s a big red warning in that description.

    • Those workflows would always trigger using the job definition in the source repository. So that’s one way for PRs to use organization secrets.

    • Andrew: it is a GitHub Action you define in the central repo that says “any PR that comes in can trigger this job, that would have access to secrets”. No way to control where the PR comes from. It does allow for a project to create workflows that they are willing to have automatically run. So for instance we could authorize the CodeBuild jobs for all PRs. It does mean that anyone can raise a PR and consume resources we pay for. In all honesty that’s not very different on how LF’s other CI workflows (based on Jenkins) work.

    • Is it worth it, or do we need to keep a tighter rein on finances?

    • JF: is there a way to put a maximum run time on a CodeBuild job? Andrew: yes. In LF Jenkins env we have a 6hr job limit, but some projects cap to limit of 60 minutes, sometimes 30 minutes. Workflow Syntax for GitHub Actions - ObsJob IdTimeout

    • Daniel: this can be a “break glass” mechanism.

    • Andrew: there’s now at least the option to use org secrets in a PR.

  • Mac CI

    • JF: No new update, MacStadium ramping up accessibility of Apple Silicon Mac Minis

    • Daniel: are projects seeing demand for macOS / ARM? OCIO? OSL?

    • Larry: not a formal demand, but anticipation that it will be supported sooner rather than later

  • Project feedback

  • Package Management

    • We’ve had lots of discussion, any updates?

    • Mark: will need to clear the ability to share some of the stuff being worked on at ILM, waiting for a green flag to preset

    • Daniel: Ryan / Larry were putting out a “call for interest”, any bites? Ryan: haven’t received a ton of interest, so it’s still on the table, but the conversation didn’t really continue after this meeting. Larry:had some discussions with people from Netflix, they aren’t looking to partner up on development, but could be interested in defining package definitions. Also topic came up in context of OpenEXR. But Kimball / WETA hadn’t seen the presentation, pointed them to the recorded talk, but don’t know if it got viewed (WETA dealing with same issues that SPI’s solution is trying to address). But no one has outright said they would jump in and collaborate. So not clear if SPI is going to do, whether just go internal, share informally, “go big”. Started conversations with legal department in case SPI decides to share at least some of the code.

    • Daniel: at Animal Logic, have a CentOS upgrade cycle happening at the moment , looking at all the busy work involved. Overspecification of the Rez packages they have defined. Also looking at the transition to CentOS Stream, does an effective somewhat higher level package management system become part of the OS management, that could be a useful marshalling point for the efforts.

    • Jeff: have been investigating running inside containers, it’s been successful (making the actual OS irrelevant). Packaging up all the software inside containers, executing them and getting great functionality from that, including DCCs. It’s still early in the process, but showing a lot of promise. Hardest thing has been overcoming the “that’s not what Docker is for” argument. Using Dockers as VMs. Ryan: access to the GPU has been a struggle. Jeff: seems to be working now, the NVIDIA Docker containers.

    • Aloys: have also been experimenting with Singularity, which gives more access to external capabilities. There’s a VFX Docker image he’s converted to a Singularity image on the Singularity hub.

    • Jeff: looked at Singularity, liked it, user-facing container technology. Don’t need to give root, opening up the security a bit more for the users. Have been able to do a lot of what Singularity does in Docker by wrapping their own tools. Also like the layering of Docker, whereas Singularity is “singular”. Maybe starting with Docker, eventually pulling in Docker layers into Singularity.

    • Daniel: Jeff, could you do a 20 minute presentation in the next meeting? Jeff: sure. Ryan: very interested in seeing this. Still end up needing a package management system, could use SPK to bake out the filesystem and putting it into the image.

    • Jeff: also use Rez internally, this is a proven Rez environment, suck it into a Rez container. Also want to start using the ASWF containers, not leveraging those at this point, takes months to build their own VFX Platform compliant environments.

    • Aloys: Rez is the “last mile”, every day you have dozens / hundreds of new small packages, dozens of releases every day, so can’t afford to repackage a whole Docker image all the time. Also the ASWF Docker images have problems, projects can decide they want their own version of something (like pybind11) instead of depending on something that’s part of the image. Would like to be able to use a package management system on top of the ASWF containers.

    • Larry: frequently wish could have the SPK solution in CI. VFX Platform 2021 is one entry in the test matrix, but rest of matrix is a lot of oddball combinations, can’t all be Docker containers.

    • Aloys: have a hacky solution that uses Docker containers as “glorified tar file”: package is tared and wrapped in a Docker container. But there are limits as to what you can do to pass Docker images between GitHub Actions CI build stages. Aswf-docker tool is available on PyPI. But within the Docker container, you can download the packages you want, need to install Docker within the Docker image, not ideal. This is why JFrog looks interesting, individual file sizes (Qt is 500MB) can be quite large. A JFrog instance with enough capacity to store these packages could be very useful.

    • Daniel: isn’t this a custom package manager? Aloys: happy to use a different solution. Andrew: Podman / Buildah are ways to build Docker images that separate the container runtime from building the images, don’t require a daemon. You can build rootless containers. Buildah lets you build containers without needing the runtime. Ryan: also allows you to build the layers independently. Larry: there is a published caching action in GitHub Actions that didn’t exist when first switched from Travis, it can help not incur the cost of building the same dependency over and over. There may be a limit of 5GB, not sure if it was possible to force an invalidation of the cache. Looking at whether it could help cut down on CI build times. Aloys: submitted a PR to use caching / ccache against USD repo, first build is 40 min, but next one is 5 min. Larry: would love to see that. Daniel: caching of the build process rather than the result itself.

Action Items:

  • Follow up with Jeff for presentation at next meeting

Next Steps