Search the Community
Showing results for tags 'sig architecture'.
-
Author: Frederico Muñoz (SAS Institute) This is the third interview of a SIG Architecture Spotlight series that will cover the different subprojects. We will cover SIG Architecture: Code Organization. In this SIG Architecture spotlight I talked with Madhav Jivrajan (VMware), a member of the Code Organization subproject. Introducing the Code Organization subproject Frederico (FSM): Hello Madhav, thank you for your availability. Could you start by telling us a bit about yourself, your role and how you got involved in Kubernetes? Madhav Jivrajani (MJ): Hello! My name is Madhav Jivrajani, I serve as a technical lead for SIG Contributor Experience and a GitHub Admin for the Kubernetes project. Apart from that I also contribute to SIG API Machinery and SIG Etcd, but more recently, I’ve been helping out with the work that is needed to help Kubernetes stay on supported versions of Go, and it is through this that I am involved with the Code Organization subproject of SIG Architecture. FSM: A project the size of Kubernetes must have unique challenges in terms of code organization -- is this a fair assumption? If so, what would you pick as some of the main challenges that are specific to Kubernetes? MJ: That’s a fair assumption! The first interesting challenge comes from the sheer size of the Kubernetes codebase. We have ≅2.2 million lines of Go code (which is steadily decreasing thanks to dims and other folks in this sub-project!), and a little over 240 dependencies that we rely on either directly or indirectly, which is why having a sub-project dedicated to helping out with dependency management is crucial: we need to know what dependencies we’re pulling in, what versions these dependencies are at, and tooling to help make sure we are managing these dependencies across different parts of the codebase in a consistent manner. Another interesting challenge with Kubernetes is that we publish a lot of Go modules as part of the Kubernetes release cycles, one example of this is client-go.However, we as a project would also like the benefits of having everything in one repository to get the advantages of using a monorepo, like atomic commits... so, because of this, code organization works with other SIGs (like SIG Release) to automate the process of publishing code from the monorepo to downstream individual repositories which are much easier to consume, and this way you won’t have to import the entire Kubernetes codebase! Code organization and Kubernetes FSM: For someone just starting contributing to Kubernetes code-wise, what are the main things they should consider in terms of code organization? How would you sum up the key concepts? MJ: I think one of the key things to keep in mind at least as you’re starting off is the concept of staging directories. In the kubernetes/kubernetes repository, you will come across a directory called staging/. The sub-folders in this directory serve as a bunch of pseudo-repositories. For example, the kubernetes/client-go repository that publishes releases for client-go is actually a staging repo. FSM: So the concept of staging directories fundamentally impact contributions? MJ: Precisely, because if you’d like to contribute to any of the staging repos, you will need to send in a PR to its corresponding staging directory in kubernetes/kubernetes. Once the code merges there, we have a bot called the publishing-bot that will sync the merged commits to the required staging repositories (like kubernetes/client-go). This way we get the benefits of a monorepo but we also can modularly publish code for downstream consumption. PS: The publishing-bot needs more folks to help out! For more information on staging repositories, please see the contributor documentation. FSM: Speaking of contributions, the very high number of contributors, both individuals and companies, must also be a challenge: how does the subproject operate in terms of making sure that standards are being followed? MJ: When it comes to dependency management in the project, there is a dedicated team that helps review and approve dependency changes. These are folks who have helped lay the foundation of much of the tooling that Kubernetes uses today for dependency management. This tooling helps ensure there is a consistent way that contributors can make changes to dependencies. The project has also worked on additional tooling to signal statistics of dependencies that is being added or removed: depstat Apart from dependency management, another crucial task that the project does is management of the staging repositories. The tooling for achieving this (publishing-bot) is completely transparent to contributors and helps ensure that the staging repos get a consistent view of contributions that are submitted to kubernetes/kubernetes. Code Organization also works towards making sure that Kubernetes stays on supported versions of Go. The linked KEP provides more context on why we need to do this. We collaborate with SIG Release to ensure that we are testing Kubernetes as rigorously and as early as we can on Go releases and working on changes that break our CI as a part of this. An example of how we track this process can be found here. Release cycle and current priorities FSM: Is there anything that changes during the release cycle? MJ During the release cycle, specifically before code freeze, there are often changes that go in that add/update/delete dependencies, fix code that needs fixing as part of our effort to stay on supported versions of Go. Furthermore, some of these changes are also candidates for backporting to our supported release branches. FSM: Is there any major project or theme the subproject is working on right now that you would like to highlight? MJ: I think one very interesting and immensely useful change that has been recently added (and I take the opportunity to specifically highlight the work of Tim Hockin on this) is the introduction of Go workspaces to the Kubernetes repo. A lot of our current tooling for dependency management and code publishing, as well as the experience of editing code in the Kubernetes repo, can be significantly improved by this change. Wrapping up FSM: How would someone interested in the topic start helping the subproject? MJ: The first step, as is the first step with any project in Kubernetes, is to join our slack: slack.k8s.io, and after that join the #k8s-code-organization channel. There is also a code-organization office hours that takes place that you can choose to attend. Timezones are hard, so feel free to also look at the recordings or meeting notes and follow up on slack! FSM: Excellent, thank you! Any final comments you would like to share? MJ: The Code Organization subproject always needs help! Especially areas like the publishing bot, so don’t hesitate to get involved in the #k8s-code-organization Slack channel. View the full article
-
Author: Frederico Muñoz (SAS Institute) This is the second interview of a SIG Architecture Spotlight series that will cover the different subprojects. In this blog, we will cover the SIG Architecture: Production Readiness subproject. In this SIG Architecture spotlight, we talked with Wojciech Tyczynski (Google), lead of the Production Readiness subproject. About SIG Architecture and the Production Readiness subproject Frederico (FSM): Hello Wojciech, could you tell us a bit about yourself, your role and how you got involved in Kubernetes? Wojciech Tyczynski (WT): I started contributing to Kubernetes in January 2015. At that time, Google (where I was and still am working) decided to start a Kubernetes team in the Warsaw office (in addition to already existing teams in California and Seattle). I was lucky enough to be one of the seeding engineers for that team. After two months of onboarding and helping with different tasks across the project towards 1.0 launch, I took ownership of the scalability area and I was leading Kubernetes to support clusters with 5000 nodes. I’m still involved in SIG Scalability as its Technical Lead. That was the start of a journey since scalability is such a cross-cutting topic, and I started contributing to many other areas including, over time, to SIG Architecture. FSM: In SIG Architecture, why specifically the Production Readiness subproject? Was it something you had in mind from the start, or was it an unexpected consequence of your initial involvement in scalability? WT: After reaching that milestone of Kubernetes supporting 5000-node clusters, one of the goals was to ensure that Kubernetes would not degrade its scalability properties over time. While non-scalable implementation is always fixable, designing non-scalable APIs or contracts is problematic. I was looking for a way to ensure that people are thinking about scalability when they create new features and capabilities without introducing too much overhead. This is when I joined forces with John Belamaric and David Eads and created a Production Readiness subproject within SIG Architecture. While setting the bar for scalability was only one of a few motivations for it, it ended up fitting quite well. At the same time, I was already involved in the overall reliability of the system internally, so other goals of Production Readiness were also close to my heart. FSM: To anyone new to how SIG Architecture works, how would you describe the main goals and areas of intervention of the Production Readiness subproject? WT: The goal of the Production Readiness subproject is to ensure that any feature that is added to Kubernetes can be reliably used in production clusters. This primarily means that those features are observable, scalable, supportable, can always be safely enabled and in case of production issues also disabled. Production readiness and the Kubernetes project FSM: Architectural consistency being one of the goals of the SIG, is this made more challenging by the distributed and open nature of Kubernetes? Do you feel this impacts the approach that Production Readiness has to take? WT: The distributed nature of Kubernetes certainly impacts Production Readiness, because it makes thinking about aspects like enablement/disablement or scalability more challenging. To be more precise, when enabling or disabling features that span multiple components you need to think about version skew between them and design for it. For scalability, changes in one component may actually result in problems for a completely different one, so it requires a good understanding of the whole system, not just individual components. But it’s also what makes this project so interesting. FSM: Those running Kubernetes in production will have their own perspective on things, how do you capture this feedback? WT: Fortunately, we aren’t talking about "them" here, we’re talking about "us": all of us are working for companies that are managing large fleets of Kubernetes clusters and we’re involved in that too, so we suffer from those problems ourselves. So while we’re trying to get feedback (our annual PRR survey is very important for us), it rarely reveals completely new problems - it rather shows the scale of them. And we try to react to it - changes like "Beta APIs off by default" happen in reaction to the data that we observe. FSM: On the topic of reaction, that made me think of how the Kubernetes Enhancement Proposal (KEP) template has a Production Readiness Review (PRR) section, which is tied to the graduation process. Was this something born out of identified insufficiencies? How would you describe the results? WT: As mentioned above, the overall goal of the Production Readiness subproject is to ensure that every newly added feature can be reliably used in production. It’s not possible to enforce that by a central team - we need to make it everyone's problem. To achieve it, we wanted to ensure that everyone designing their new feature is thinking about safe enablement, scalability, observability, supportability, etc. from the very beginning. Which means not when the implementation starts, but rather during the design. Given that KEPs are effectively Kubernetes design docs, making it part of the KEP template was the way to achieve the goal. FSM: So, in a way making sure that feature owners have thought about the implications of their proposal. WT: Exactly. We already observed that just by forcing feature owners to think through the PRR aspects (via forcing them to fill in the PRR questionnaire) many of the original issues are going away. Sure - as PRR approvers we’re still catching gaps, but even the initial versions of KEPs are better now than they used to be a couple of years ago in what concerns thinking about productionisation aspects, which is exactly what we wanted to achieve - spreading the culture of thinking about reliability in its widest possible meaning. FSM: We've been talking about the PRR process, could you describe it for our readers? WT: The PRR process is fairly simple - we just want to ensure that you think through the productionisation aspects of your feature early enough. If you do your job, it’s just a matter of answering some questions in the KEP template and getting approval from a PRR approver (in addition to regular SIG approval). If you didn’t think about those aspects earlier, it may require spending more time and potentially revising some decisions, but that’s exactly what we need to make the Kubernetes project reliable. Helping with Production Readiness FSM: Production Readiness seems to be one area where a good deal of prior exposure is required in order to be an effective contributor. Are there also ways for someone newer to the project to contribute? WT: PRR approvers have to have a deep understanding of the whole Kubernetes project to catch potential issues. Kubernetes is such a large project now with so many nuances that people who are new to the project can simply miss the context, no matter how senior they are. That said, there are many ways that you may implicitly help. Increasing the reliability of particular areas of the project by improving its observability and debuggability, increasing test coverage, and building new kinds of tests (upgrade, downgrade, chaos, etc.) will help us a lot. Note that the PRR subproject is focused on keeping the bar at the design level, but we should also care equally about the implementation. For that, we’re relying on individual SIGs and code approvers, so having people there who are aware of productionisation aspects, and who deeply care about it, will help the project a lot. FSM: Thank you! Any final comments you would like to share with our readers? WT: I would like to highlight and thank all contributors for their cooperation. While the PRR adds some additional work for them, we see that people care about it, and what’s even more encouraging is that with every release the quality of the answers improves, and questions "do I really need a metric reflecting if my feature works" or "is downgrade really that important" don’t really appear anymore. View the full article
-
Forum Statistics
67.4k
Total Topics65.3k
Total Posts