At Meta, Bento is our internal Jupyter notebooks platform that is leveraged by many internal users. Notebooks are also being used widely for creating reports and workflows (for example, performing data ETL) that need to be repeated at certain intervals. Users with such notebooks would have to remember to manually run their notebooks at the required cadence – a process people might forget because it does not scale with the number of notebooks used.
In this post, we’ll explain how we married Bento with our batch ETL pipeline framework called Dataswarm (think Apache Airflow) in a privacy and lineage-aware manner...
View the full article