Apache Airflow: From Stagnation to Millions of Downloads


0

Unlock the Secrets of Ethical Hacking!

Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!

Enroll now and gain industry-standard knowledge: Enroll Now!

Vikram Koka stumbled upon Apache Airflow in late 2019. He was working in the Internet of Things industry and searching for a solution to orchestrate sensor data using software. Airflow seemed to be a perfect fit, but Koka noticed the open-source project’s stagnant state. Thus began a journey to breathe a second life into this dying software.

Airflow was the brainchild of Airbnb. The company created the system to automate and manage its data-related workflows, such as cleaning and organizing datasets in its data warehouse and calculating metrics around host and guest engagement. In 2015, Airbnb released the software as open source. Then, four years later, Airflow transitioned into a Top-Level Project at the Apache Software Foundation, a leading developer and steward of open-source software.

What was once a thriving project had stalled, however, with flat downloads and a lack of version updates. Leadership was divided, with some maintainers focusing on other endeavors.

Yet Koka believed in the software’s potential. Unlike static configuration files, Airflow follows the principle of “configuration as code.” Workflows are represented as directed acyclic graphs of tasks—a graph with directed edges and no loops. Developers can code these tasks in the Python programming language, allowing them to import libraries and other dependencies that can help them better define tasks. Akin to a musical conductor, Airflow orchestrates the symphony of tasks and manages the scheduling, execution, and monitoring of workflows.

This flexibility is what caught Koka’s eye. “I fell in love with the concept of code-first pipelines—pipelines which could actually be deployed in code,” he says. “The whole notion of programmatic workflows really appealed to me.”

Koka started work righting the Airflow ship. As an open-source contributor with decades of experience in the data and software engineering space, he connected with people in the community to fix bugs around reliability and craft other enhancements. It took a year, but Airflow 2.0 was released in December 2020.

Airflow’s Growth and Community Expansion

That served as a crucial turning point for the project. Downloads from its GitHub repository increased, and more enterprises adopted the software. Encouraged by this growth, the team envisioned the next generation of Airflow: a modular architecture, a more modern user interface, and a “run anywhere, anytime” feature, enabling it to operate on premises, in the cloud, or on edge devices and handle event-driven and ad hoc scenarios in addition to scheduled tasks. The team delivered on this vision with the launch of Airflow 3.0 last April.

“It was amazing that we managed to ‘rebuild the plane while flying it’ when we worked on Airflow 3—even if we had some temporary issues and glitches,” says Jarek Potiuk, one of the foremost contributors to Airflow and now a member of its project management committee. “We had to refactor and move a lot of pieces of the software while keeping Airflow 2 running and providing some bug fixes for it.”

Compared to Airflow’s second version, which Koka says had a few hundred to a thousand downloads per month on GitHub, “now we’re averaging somewhere between 35 to 40 million downloads a month.” The project’s community also soared, with more than 3,000 developers of all skill levels from around the world contributing to Airflow.

Jens Scheffler is an active part of that community. As a technical architect of digital testing automation at Bosch, his team was one of the early adopters of Airflow, using the software to orchestrate tests for the company’s automated driving systems.

Scheffler was inspired by the openness and responsiveness of Airflow members to his requests for guidance and support, so he considered “giving back something to the community—a contribution of code.” He submitted a few patches at first, then implemented an idea for a feature that would benefit not only his team but other Airflow users as well. Scheffler also discovered other departments within Bosch employing Airflow, so they’ve formed a small in-house community “so we can exchange knowledge and keep in touch.”

Koka, who is also a member of Airflow’s project management committee and a chief strategy officer at data operations platform Astronomer, notes that managing a huge group of contributors is challenging, but nurturing that network is as essential as improving the software. The Airflow team has established a system that enables developers to contribute gradually, starting with documentation then progressing to small issues and bug fixes before tackling larger features. They also make it a point to swiftly respond and provide constructive feedback.

“For many of us in the community, [Airflow] is an adopted child. None of us were the original creators, but we want more people feeling they’ve also adopted it,” says Koka. “We’re in different organizations, in different countries, speak different languages, but we’re still able to come together toward a certain mission. I love being able to do that.”

The Airflow team is already planning future features. This includes tools to write tasks in programming languages other than Python, human-in-the-loop capabilities to review and approve tasks at certain checkpoints, and support for artificial intelligence (AI) and machine learning workflows. According to Airflow’s 2024 survey, the software has a rising number of use cases in machine learning operations (MLOps) and generative AI.

“We are at a pivotal moment where AI and ML workloads are the most important things in the IT industry, and there is a great need to make all those workloads—from training to inference and agentic processing—robust, reliable, scalable, and generally have a rock-solid foundation they can run on,” Potiuk says. “I see Airflow as such a foundation.”

From Your Site Articles

Related Articles Around the Web



Unlock the Secrets of Ethical Hacking!

Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!

Enroll now and gain industry-standard knowledge: Enroll Now!

Don’t miss the Buzz!

We don’t spam! Read our privacy policy for more info.

🤞 Don’t miss the Buzz!

We don’t spam! Read more in our privacy policy


Like it? Share with your friends!

0

0 Comments

Your email address will not be published. Required fields are marked *