Open AI/Codex: Move critical infrastructure from perl to python

Solutions & Products
- Solutions & Products
- Cloud Services
  Cloud Services
  
  World-class data management and storage solutions in the biggest public clouds.
  Visit Cloud Services
  
  Solutions
  
  Microsoft Azure
  
  Google Cloud
  
  AWS
  
  IBM Cloud
  
  Products
  
  Azure NetApp Files
  
  Amazon FSx for NetApp ONTAP
  
  Cloud Volumes Service for Google Cloud
  
  Cloud Volumes ONTAP
  
  Compute Optimization
  
  Cloud Sync
  
  Cloud Data Sense
  
  Cloud Tiering
  
  Cloud Backup Service
  
  Cloud Volumes Edge Cache
  
  Global File Cache
  
  Cloud Manager
  
  Astra
  
  Cloud Insights
  
  File Services / File Sharing
  
  MySQL
  
  PostgreSQL
  
  Kubernetes
  
  Quick Links
  
  Cloud Central
  
  Data Fabric
  
  Why NetApp for Cloud Services
  
  Spot by NetApp
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Hybrid Cloud
  Hybrid Cloud
  
  Build your business on the best of cloud and on premises together with Hybrid Cloud Infrastructure solutions.
  Visit Hybrid Cloud
  
  Solutions
  
  Virtualization
  
  Service Provider Infrastructure
  
  IT Automation
  
  Private Clouds
  
  VMware
  
  Red Hat
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Hybrid Cloud
  
  What is Hybrid Cloud
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Data Storage
  Data Storage
  
  NetApp is the proven leader when it comes to modernizing and simplifying your storage environment.
  Visit Data Storage
  
  Solutions
  
  SAN
  
  Scale-Out NAS
  
  Unstructured Data Solutions
  
  Products
  
  AFF A-Series
  
  AFF C190
  
  E-Series
  
  EF-Series
  
  FAS
  
  FlexPod
  
  SolidFire
  
  StorageGRID
  
  Disk Shelves & Storage Media
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Data Storage
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Cyber Resilience
  Cyber Resilience
  
  Our industry-leading solutions are built so you can protect and secure your sensitive company data.
  Visit Cyber Resilience
  
  Solutions
  
  Data Protection
  
  Ransomware Protection
  
  Business Continuity / Disaster Recovery
  
  Data Backup and Recovery
  
  Data Compliance
  
  ONTAP Data Security
  
  Products
  
  SnapCenter
  
  Cloud Backup
  
  Quick Links
  
  Data Fabric
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Data Management
  Data Management
  
  Get complete control over your data with simplicity, efficiency, and flexibility.
  Visit Data Management
  
  Solutions
  
  Simplicity365
  
  Products
  
  Active IQ
  
  Element Software
  
  OnCommand Insight
  
  ONTAP Data Management
  
  SANtricity Software
  
  Virtual Infrastructure Management
  
  Quick Links
  
  Data Fabric
  
  Data Management Specialists
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Enterprise Applications
  Enterprise Applications
  
  Speed application development, improve software quality, reduce business risk, and shrink costs.
  Visit Enterprise Applications
  
  Solutions
  
  SAP
  
  Oracle
  
  MS SQL
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Enterprise Applications
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- DevOps
  Devops
  
  Our solutions remove friction to help maximize developer productivity, reduce time to market, and improve customer satisfaction.
  Visit Devops
  
  Solutions
  
  Configuration Management
  
  Containers
  
  Google Clouds Anthos
  
  Continuous Integration Continuous Delivery
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for DevOps
  
  What is DevOps
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- AI
  AI
  
  NetApp AI solutions remove bottlenecks at the edge, core, and the cloud to enable more efficient data collection.
  Visit AI
  
  Solutions
  
  Big Data Analytics
  
  High Performance Computing
  
  Products
  
  ONTAP AI
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for AI
  
  What is AI
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- VDI
  VDI
  
  Provide a powerful, consistent end-user computer (EUC) experience—regardless of team size, location, complexity.
  Visit VDI
  
  Products
  
  Spot PC
  
  Virtual Desktop Service
  
  Quick Links
  
  Data Fabric
  
  What is VDI
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Services
  Services
  
  We have a service for your every need, plus the ones you’re about to discover.
  Visit Services
  
  Services
  
  Professional Services
  
  Support Services
  
  Quick Links
  
  Data Fabric
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
Support & Training
How to Buy
Community

How to quickly transition from perl to python without breaking mission critical services?

This was the dilemma faced by BAERO, NetApp's internal DevOps/Test infrastructure team, which provides tools and services for developers to build and test their code before submitting it. BAERO’s role is to help developers move faster, catch quality issues as early as possible, and protect NetApp^® products from regressions. Over the years, we've built a large amount of software to support this mission. But as NetApp adds new products and features, this software must evolve to support these new environments.

For example, NetApp ONTAP^® is now part of Amazon FSX, but to deliver this feature, ONTAP developers must be able to run, test, and debug new features in AWS before they are released. To support these requirements, BAERO extended services and infrastructure to work with Amazon FSX. In general, the faster BAERO can add support, the sooner NetApp developers can automatically catch regressions in their code.

BAERO is faced with a tricky balance: We have to move fast, but at the same time, we can’t break the infrastructure that NetApp development teams are using 24/7. Today, BAERO infrastructure code is a mixture of relatively new python and battle-hardened perl code and libraries written years ago but extended as needed. Unfortunately, the perl code can make it harder to support NetApp's new features and products, because perl's library ecosystem isn't as rich as python's, and perl has fewer developers who are eager to work in it.

Python allows teams to move faster and to better support the next generation of NetApp products. But can we translate our perl-based codebase into python without breaking mission-critical services along the way?

Our approach

The release of OpenAI/Codex to the public in August of 2021 introduced the possibility of using artificial intelligence and machine learning to help translate code between languages. For BAERO, it could help translate our perl codebase over to python. But not all products live up to expectations; we needed to test it to believe. Would Codex hold up in real-world situations? Could it translate our codebase faster and easier than a group of developers? The only way to find out was through trial and (hopefully no) error.

For the test, we picked 'run_utest.pl', a utility that is responsible for executing unit-tests and interpreting and returning results. It’s also a script that has evolved beyond how it was originally designed. The original code was extended as needed to add core-dump analysis, fuzzing support, code coverage support, or on-the-spot diagnosis when particular rare failures happened. As result, it became a big complicated perl script, with years of real-world runtime behind it, and therefore it was somewhat terrifying to experiment with. Any translation to a new language would need to be done in a way that didn't risk the correctness of the script, because the script enforces quality by running unit-test and is used internally by every developer, hundreds of times a day.

Challenges

On our first use of the Codex translation, it became clear that Codex has its pros and cons: It was great at some translations but very bad at others. Using it required having a developer in the middle verifying the correct translations and amending any missteps. That said, it’s easier to validate new python script than to write it from scratch, which made for a promising outcome.

In essence, Codex is a very fast but imperfect translator. We needed to figure out how to use it safely to speed up our translation project. When we were finished, the translated code needed to work perfectly (or close to it) from the start. Because 'run_utest.pl' is used so heavily in a normal build, it was straightforward to validate the “sunny-day” situations. However, the many corner cases and error paths that the perl version handles (and that were validated when written) are much more difficult. Those must work in the python translation when we deploy. There’s no room for error.

Overcome the challenges

We focused on reducing or eliminating the risk in the new python translation. Ideally, we would have had a suite of tests that would exercise both the error paths and the sunny-day situations. Unfortunately, there were no backing tests on the original code, and stability was enforced by hand-testing new code and then watching for issues after new versions were deployed.

We attacked the translation risk in the follow ways.

Translated the code directly without refactoring along the way. This makes it easier for reviewers to validate that the code does same thing in both languages. It also enables side-by-side functional and performance testing (on a per-function basis).
Translated and tested a little bit at a time, reviewed, and submitted. The smaller review sizes make it easier for reviewers to catch problems and for developers to see the slow but steady progress.
Tested the old code and new code with the same inputs. Since the new code should behave the same way as the old code, any differences in output were investigated.
Unit-tested like crazy. We focused on exercising every single line of code. It can be hard to hit every error path naturally, but with pytest's rich mocking, it’s often possible to exercise every line of code.

In the end, the new version has much better testing than the original, and the act of exercising all of the code found many error-path translation problems that did not manifest in the sunny-day situation.

The outcome

As of today, the port is feature complete and can run the entire unit-test workflow. Work continues on driving up unit-test coverage (>80% versus 0% prior) before it is deployed. One surprise was a bug in the original perl implementation; the perl code was swallowing specific types of exit code. The problem is hiding in the CURRENT unit-test infrastructure; it’s a problem that is fairly rare but is real. Originally it looked like a translation problem, but upon investigation, the python version was stricter than the perl version. Finding this problem alone is probably justification enough for the translation project.

With high unitest coverage, thorough integration testing (validated side-by-side with perl output), and many iterations of execution, we've deployed the python version of run_utest for ~5% of the unit-test targets.

We'll slowly grow this number and move to 100% after fixing unit-tests with the errors hidden by the original perl version of run_utest.

Next steps

After a very positive experience with OpenAI/Codex, we're ready to go all in. OpenAI/Codex has literally changed what we believe is possible for the BAERO development team to do. In the past, we would have written new projects in python, while maintaining the perl infrastructure until a particular piece became untenable...and then rewrite that in python. With OpenAI/Codex in our development toolbox, we now have a long list of infrastructure software that we're going translate proactively, with a blueprint for making the project successful.

In the end:

OpenAI/Codex will help BAERO move faster.
OpenAI/Codex will help NetApp's developers test new features sooner.
OpenAI/Codex will help NetApp ship higher-quality software to our customers at a faster pace.

Perl->Python is just the start. OpenAI/Codex has the potential to unlock and accelerate new NetApp features, scalability, and performance in the NetApp products themselves.

While OpenAI/Codex is just getting started, you should start experimenting it soon, here. By managing risk with the right tests/process, OpenAI/Codex can already accelerate the translation of legacy code into new languages.

Share this page

Phil Ezolt