Jump to content

Search the Community

Showing results for tags 'open source'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOps Forum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes
    • Linux
    • Logging, Monitoring & Observability
    • Red Hat OpenShift
    • Security
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

  1. Solr is an open-source, highly scalable search platform built on top of Apache Lucene. It provides powerful capabilities for searching, indexing, and faceting large amounts of data. Here are 10 real use cases of Solr: Apache Solr is an open-source search platform built on Apache Lucene, which is a high-performance, full-text search engine library. Solr is widely used for enterprise search and analytics purposes because it provides robust full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (like Word and PDF) handling capabilities. It is designed to handle large volumes of text-centric data and provides distributed search and index replication functionalities. Solr is also known for its scalability and fault tolerance, making it a popular choice for large-scale search applications. Here are ten real use cases of Solr: E-commerce Product Search: Solr is commonly used in e-commerce platforms to provide advanced search capabilities over a vast inventory of products. It helps in delivering relevant search results, supporting facets and filters (like brand, price range, and product features) to enhance user experience. Content Management Systems (CMS): Integrating Solr with CMSs allows websites to manage and search through large repositories of content such as articles, blogs, and other media types efficiently. Enterprise Document Search: Companies use Solr to index and search through extensive collections of documents, including emails, PDFs, Word documents, and more, making it easier for employees to find the information they need quickly. Social Media Analytics: Solr can process and index large streams of social media data for sentiment analysis, trend tracking, and monitoring public opinion, enabling businesses to gain insights into customer perceptions. Geospatial Search: Solr supports location-based searches, which can be used in applications like real estate listings and location-specific services to find entities within a given distance from a geographic point. Data Collection and Discovery: Research institutions use Solr to manage, search, and analyze large datasets, facilitating data discovery and academic research. Job and Resume Searching: Job portals utilize Solr to match candidates with jobs effectively. It indexes job listings and resumes, providing powerful search and filtering capabilities. News and Media Sites: Media outlets use Solr to manage and retrieve news content and articles based on various attributes like publication date, relevance, keywords, etc. Healthcare Information Systems: Solr is used in healthcare for indexing medical records, research papers, treatment histories, and other data, improving access to information and supporting better healthcare outcomes. Recommendation Systems: Solr’s ability to handle complex queries and analyze large amounts of data helps in building recommendation engines that suggest products, services, or content based on user preferences and behavior. The post What is Solr? appeared first on DevOpsSchool.com. View the full article
  2. Anaconda and Teradata have unveiled a new integration to bring the most popular and widely used Python and R packages to Teradata VantageCloud through the Anaconda Repository. The integration with ClearScape Analytics, a powerful engine for deploying end-to-end artificial intelligence (AI) and machine learning (ML), is designed to provide enterprises with the ability to deploy... Read more » The post Anaconda and Teradata partner to enhance open-source support for AI innovation appeared first on Cloud Computing News. View the full article
  3. RudderStack is an open-source customer data platform tool. It collects, routes, processes data from your websites, apps, cloud tools, and data warehouseView the full article
  4. Lucene query cheatsheet Basic Search Single Term:term Finds documents containing term. Phrase Search:"exact phrase" Finds documents containing the exact phrase. Boolean Operators AND:term1 AND term2 Both terms must be present. OR:term1 OR term2 At least one of the terms must be present. NOT:NOT term Documents must not contain term. Combination:(term1 AND term2) OR term3 Complex boolean logic can be applied by combining operators. Wildcard Searches Single Character Wildcard:te?t Matches text with one character replaced. Multiple Character Wildcard:test* Matches text with zero or more characters. Wildcard at Start:*test Not supported directly but can be used in certain contexts. Fuzzy Searches Fuzzy:term~ Matches terms that are similar to the specified term. Proximity Searches Proximity:"term1 term2"~N Matches terms that are within N words of each other. Range Searches Range:[start TO end] Finds documents with terms within the specified range. Exclusive Range:{start TO end} Excludes the exact start and end values. Regular Expressions Regex:/regex/ Matches terms by regular expression. Boosting Terms Boost:term^N Increases the relevance of a term by a factor of N. Field-Specific Searches Specific Field:fieldname:term Searches for the term within a specific field. Grouping Group Queries:(query1) AND (query2) Groups parts of queries for complex searches. How to search Apache HTTPD using Lucene These examples assume that the logs have been indexed in a Lucene-based system like Elasticsearch, and they demonstrate how to utilize various Lucene query features to filter and search log data effectively. Note that the specific fields used in these examples (ip, timestamp, response, request, etc.) should correspond to the fields defined in your Lucene schema for Apache HTTPD logs. // 1. Find logs for a specific IP address ip:"192.168.1.1" // 2. Search logs within a specific date range timestamp:[20230101 TO 20230131] // 3. Identify logs with 4xx client error response codes response:[400 TO 499] // 4. Locate logs for requests to a specific URL request:"GET /index.html HTTP/1.1" // 5. Filter logs by a specific user-agent string agent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64)" // 6. Search for logs with a specific referrer referrer:"http://example.com/" // 7. Find all logs of GET requests request_method:GET // 8. Filter logs resulting in 5xx server errors response:[500 TO 599] // 9. Identify requests to a specific directory request:"/images/*" // 10. Locate requests taking longer than 2 seconds duration:>2000 // 11. Exclude logs from a specific IP address -ip:"192.168.1.1" // 12. Find requests for a specific file type (.jpg) request:"*.jpg" // 13. Identify logs from a specific day timestamp:20230115 // 14. Search logs with responses in a byte range bytes:[1000 TO 5000] // 15. Filter logs by HTTP method and response code request_method:POST AND response:200 // 16. Search for failed login attempts (custom log message) message:"Failed login attempt" // 17. Find logs from a range of IP addresses ip:[192.168.1.1 TO 192.168.1.100] // 18. Identify logs with a 200 OK response response:200 // 19. Search for logs with specific query parameters request:"*?user=john&*" // 20. Locate logs with a 404 Not Found response response:404 The post Apache Lucene Query Example appeared first on DevOpsSchool.com. View the full article
  5. Amidst push from Tiny Corp and the rest of the community, AMD is open-sourcing more of its GPU documentation and firmware in hopes of making its hardware truly-competitive with Nvidia in the AI space, and potentially more. View the full article
  6. How this open source LLM chatbot runner hit the gas on x86, Arm CPUsView the full article
  7. Malicious xz backdoor reveals fragility of open sourceView the full article
  8. Arm China continues to silently strengthen its product stack, now has an AI developers board and open-source drivers for its NPU design. View the full article
  9. It’s not often we step outside of our careers and busy everyday lives and pause to say, “Hey, what is this all about? Am I fulfilled? Am I finding meaning in what I’m doing day to day?” As the Head of Commercial Legal at GitHub, I have quite a unique role. Most of my career was spent in a law firm until I transitioned into tech. I came from a male-dominated workplace and saw tech as a new environment (though still male-dominated) where ideas flourished and folks were allowed to be themselves in a safe space. But this month—March is Gender Equality Month and celebrates Women’s Day—gave a reason to pause and reflect on my own journey, as well as ask what people and organizations are doing every day to empower women. So, I sat down with Felicitas Heyne, co-founder of Audiopedia Foundation, as well as Nadine Krish Spencer, Head of Product and Experience at Chayn. We discussed how tech is aiding organizations fighting for gender equality, what it means to be a woman in tech and the world today, and what advice and learnings they’d like to share with others. I hope you find as much inspiration from their mission, work, and stories as I did. Jiyon Yun Head of Commercial Legal // GitHub Getting to know Audiopedia Foundation Jiyon (GitHub): The work Audiopedia Foundation is doing is truly remarkable, and I would love to hear in your own words why these efforts are so essential and how you came to help found this organization. Felicitas (Audiopedia): Audiopedia Foundation works to empower women in the global south through access to information in an audio format. We work with NGOs around the world to bring different forms of tech—from solar-powered audio players to WhatsApp to loudspeakers—to local communities based on their unique needs. I had never dreamt of leading an organization doing social impact work all over the globe. But I have a hard time realizing there is an injustice and not doing anything about it. I’ve always been passionate about empowering women and when we started to dive into the topic, we realized that 500 million women in the world are illiterate—and these are just the official numbers. There are also more than 7,000 languages in the world, half of which don’t even have a written language. We tried to come up with an idea to bring information to these women—including topics like health, economics, human rights—which sparked the idea of Audiopedia nine years ago. Getting to know Chayn Jiyon (GitHub): I think we could all take away some learnings from Chayn—from your values to the way you operate. Could you tell me about Chayn’s mission? Nadine (Chayn): Chayn is a tech-forward nonprofit that aims to support survivors of sexual abuse, assault, and domestic violence with healing. We use technology to further accelerate that mission. Jiyon: What really stood out for me about Chayn is how survivors are supporting other survivors. Can you speak a little bit more about that? How are women empowering other women to heal, find peace, and move forward? Nadine: It’s definitely a powerful part of our organization. We are all women in the organization at the moment and that’s different, especially if you come from the tech world. It was something that really drew me to Chayn and I thought, “Wow, I really want to see whether this survivor focus—we call it a trauma-informed way of working—is actually possible.” And to be totally transparent, we’re still figuring out the answer to that because we do have a lot of survivors on the team. It’s not something that people have to disclose, but it’s a constant awareness for us. And even if people aren’t survivors, quite often people close to them have experienced abuse. That adds an extra layer of understanding to the people we’re trying to reach and helps further our mission. The role of tech in social impact work Jiyon: What is the role of tech and open source in helping social impact organizations tackle global issues? Felicitas: Tech is really a big opportunity to make a change. We’ve been doing development work for decades now and still, every third woman can’t read or write. These numbers haven’t changed despite all of the work from NGOs and that’s because the scalability and impact aren’t sufficient. But we can leapfrog this problem now with tech. That’s why tech is such a big opportunity. We can solve problems that we haven’t been able to solve for decades. And we can solve them quickly, so we don’t have to take another 300 years to reach gender equality. Nadine: As somebody coming from the tech world, there was an assumption that moving into the charity or nonprofit sector might mean that it’s less progressive or less advanced in tech. But last year we were part of the DPG Open Source Community Manager Program and worked with a community manager who we’ve now gone on to employ. It has been instrumental having somebody who really got it from the tech side; we had tried to set up our own tech volunteer program before, but we saw it as quite a heavy lift to manage a tech community. And I think what she has really helped us to see is that there are people out there who just want to come in and help, without having met you or even getting credit. They do it because they’ve got an itch they want to scratch or they see this as a way to contribute to social good, and that is really unique. I don’t know if another industry operates like that where strangers come in and essentially perform random acts of kindness. Sources of inspiration Jiyon: As a woman leader, what inspires you? Who inspires you? Felicitas: Any woman who’s willing and able to overcome obstacles. Becoming a victim is easy for a woman, but it’s very inspiring to see how women overcome and even grow from these challenges. Women are the largest untapped potential in the world in my point of view. We’ve had 2,000 years of patriarchy behind us, and I’d really love to see what would happen with 2,000 years of matriarchy ahead of us. When we were in Morocco, we went to a women’s shelter and I listened to many women’s stories. As I listened to them, I had no idea how they could overcome and survive what they went through, but they were there, many with their children, moving forward. It was so impressive and I realized that anything I could do to make it easier for them and women like them is an obligation. I didn’t earn my privilege; it was mere luck. So, I have a strong need to help those who aren’t as privileged. It’s a question of justice in my eyes; inequality drives me crazy. Advice for women in tech Jiyon: If you looked back 5 years ago, 10 years ago, or when you were starting your career, what advice would you give women who aren’t in tech right now but who want to follow that path? Nadine: Tech holds the power to try and do things differently. And we’re at a point where it would be easy for women to retreat. In the same way that sometimes we retreat from other male-dominated spaces; the wider world of tech could become one of those places. When I was in the commercial world, I tried my best as one of two women on a floor with maybe 100 men. I joined the company when it was only 35 people as a product manager and saw them scale to around 450 people by the time I left. Because I was able to climb so quickly as the company scaled, I struggled a lot with imposter syndrome. People would tell me to “break down the imposter syndrome,” especially because I was a woman. But the idea of just “breaking it down” is really tough, and it made me think even more that I wasn’t cut out to do this, which was really hard to shake. But as the company grew and I was surrounded by more women, I actually realized the better advice is: find your allies. Having allies—of any gender—helps you start to shake the imposter syndrome naturally and you become a lot more confident when you’re not in a place of isolation. Where we go from here Jiyon: What can women and other leaders do to contribute to and inspire change? Felicitas: The key is empathy. If you start to look to the global south, you very quickly realize that most of the things we take for granted in our lives aren’t granted for billions of people, especially women. It’s important to question your position in the world, recognize your privileges, and use your empathy to drive action. Nadine: It’s really important to get some “balcony time” where you step out and look over what’s going on in your life and all around you. It’s really difficult to juggle everything in your day-to-day life and to just stop and reflect. And the second part is then to act on those realizations and start doing things for other people. It’s taking the time to acknowledge the people in your life and to say, “I see you there and I see how you’re showing up for other people.” Recognition and support are things we’ve got to do for each other. Speaking with both Felicitas and Nadine moved me in a way I wasn’t expecting. It was a good reminder to take that “balcony time” and step outside of my every day, reflect on what I can do to impact others, and take steps to do that. I hope you found some inspiration from their stories as well. If you want to learn more or support these causes, visit Audiopedia Foundation’s website and repository and Chayn’s website and repository. You can also contribute to Chayn’s frontend, backend, and soulmedicine work.
  10. By investing in open source frameworks and LGTM tools, SRE teams can effectively monitor their apps and gain insights into system behavior.View the full article
  11. Open-source software is used throughout the technology industry to help developers build software tools, apps, and services. While developers building with open-source software can (and often do) benefit greatly from the work of others, they should also conduct appropriate due diligence to protect against software supply chain attacks. With an increasing focus on managing open-source software supply chain risk, both Citi and Google strive to apply more rigor across risk mitigation, especially while choosing known and trusted suppliers where open source components are sourced from. Key open source attack vectors The diagram above highlights key open source attack vectors. We can divide the common software supply chain security attacks into five main types: Attacks at runtime leveraging vulnerabilities in the code Attacks on the repositories, tooling and processes Attacks on the integrity of the artifacts as they progress through the pipeline Attacks on the primary open source dependencies that customers applications leverage Attacks throughout the inherited transitive dependency chain of the open source packages Application security experts have seen their work increase and get harder as these attacks have increased in recent years. Open-source components often include and depend on the functionality of other open-source components in order to function. These components can have two types of dependencies: direct and transitive. Generally, the interactions work like this: The application makes an initial call to a direct dependency. If the direct dependency requires any outside components for it to function, those outside components are the application’s transitive dependencies. These types of dependencies are notoriously difficult to remediate. This is because they are not readily accessible to the developer. Their code base resides with their maintainers, rendering the application entirely dependent upon their work. If the maintainer of one of these transitive dependencies releases a fix, the amount of time before it makes its way up the supply chain to impact your direct dependency could be a while. Thus, the management of vulnerabilities needs to be extended to the full transitive dependency chain as this is where 95% of the vulnerabilities are found. Maintaining a regular upgrade and patching process for your software development lifecycle (SDLC) tooling is now a must; as is upgrading the security of both your repositories and processes combined with active security testing of each. Tamper-evident provenance and signing can increase confidence in the ability to maintain artifact integrity throughout the pipeline. And mapping and understanding the full transitive dependency chain of all external components and depending on only known and trusted providers for these components becomes a required condition. Recent guidance from CISA and other government agencies supports the focus on appropriately selecting and testing open source software ahead of ingestion from a trusted source. While some organizations load built software artifacts directly from public package repositories, others with a more restrictive security risk appetite will require more stringent security controls requiring the use of curated open-source software providers. They may opt to only leverage open-source software they themselves have built from source, although this would be prohibitively expensive for most. But if they chose to use a curated third party, what checks must they look for before delegating that critical authority? There are three main criteria to evaluate a curated OSS vendor: 1. High level of security maturity A trusted supplier must demonstrate a high level of security maturity. Common areas of focus are to examine the security hygiene of the supplier in particular. Look for details of the vulnerability management culture and ability to quickly keep up to date with patching within the organisation. They should also have a well trained team, prepared to quickly address any incidents and a regular penetration testing team, continuously validating the security posture of the organisation. The trusted supplier should be able to demonstrate the security of their own underlying foundational infrastructure. Check that they: Have an up-to-date inventory of their own external dependencies. Demonstrate knowledge and control of all ingest points. Leverage a single production build service so that they can maintain a singular logical control point. Meet best practice standards for managing their infrastructure including: Well designed separation of duties and IAM control Built-in organizational policy and guard rails to secure Zero Trust network design Automated and regular patching with associated evidence Support for these posture controls with complementary continuous threat detection with detection, logging and monitoring systems. Bonus points if they operate with "everything as code" and with hermetic, reproducible and verifiable builts. 2. High level of internal SDLC security The security of the SDLC used within the trusted supplier must be extremely high, particularly around the control plane of the SDLC and the components that interact with the source code to build the end product. Each system must be heavily secured and vetted to ensure any changes to the software is reviewed, audited, and requires multi-party approvals before progressing to the next stage or deployment. Strong authentication and authorisation policies must be in place to ensure that only highly trusted individuals could ever build, or change the vendor infrastructure. The SDLC security also needs to extend to the beginning of the ingestion of the source code material into the facility and to any code or functionality used within the control plane of the system itself. 3. Effective insider threat program As the trusted supplier is a high value target, there will be the potential for an insider threat as an attack vector.Therefore, the curated vendor would be expected to have an active and effective insider threat program. This personnel vetting approach should also extend to ensuring the location of all staff are within approved proximity and not outsourced. Trust but verify It is also important that the trusted supplier provide supporting evidence and insights. This evidence includes: Checkable attestations on infrastructure security and processes via third party certifications and/or your own independent audit. Checkable attestations for the security posture and processes for their SDLC against a standard framework like SLSA or SSDF. Cryptographic signatures on the served packages and any associated accompanying metadata so that you can verify source and distribution integrity. The actual relevance and security risk of an issue in a package is the combination of inherent criticality of in isolation, the context it's used in, the environmental conditions in which its deployed, any external compensating controls, and decreased or increased risk in the environment. The figure below shows the interrelationship and interaction between vulnerabilities and threats in the application and those from the underlying infrastructure. 4. Enhanced security and risk metadata that should accompany each served package to increase your understanding and insights to both the inherent component risk of the code or artifact as well as how that risk can change in context of your specific application and environment. Key metadata can include: Standard SBOM with SCA insights - vulnerabilities, licensing info, fully mapped transitive dependencies and associated vulnerability and licensing risk. VEX statements for how the inherited vulnerabilities from transitive dependencies affect the primary package being served. Any related threat intelligence specific to the package, use case, or your organization. The ability of the supplier to provide this type of enhanced data reinforces the evidence that they have achieved a high level of security and that the components they serve represent assured and more trustable ingredients you can employ with greater confidence. Better control and balancing benefits of open source components Leveraging open source components is critical to developer velocity, quality and accelerating innovation and execution. Applying these recommendations and requirements can enable you to better control and balance the benefits of using open source components with the potential risk of introducing targetable weak points in your SDLC and ultimately reduce your risk and exposure. Google Cloud’s Assured Open Source Software (Assured OSS) service for Java and Python ecosystems gives any organization that uses open source software the opportunity to leverage the security and experience Google applies to open source dependencies by incorporating the same OSS packages that Google secures and uses into their own developer workflows. Learn more about Assured Open Source Software, enable Assured OSS through our self-serve onboarding form, use the metadata API to list available Python and Java packages and determine which Assured OSS packages you want to use. View the full article
  12. The revolution in generative AI (gen AI) and large language models (LLMs) is leading to larger model sizes and increased demands on the compute infrastructure. Organizations looking to integrate these advancements into their applications increasingly require distributed computing solutions that offer minimal scheduling overhead. As the need for scalable gen AI solutions grows, Ray, an open-source Python framework designed for scaling and distributing AI workloads, has become increasingly popular. Traditional Ray deployments on virtual machines (VMs) have limitations when it comes to scalability, resource efficiency, and infrastructure manageability. One alternative is to leverage the power and flexibility of Kubernetes and deploy Ray on Google Kubernetes Engine (GKE) with KubeRay, an open-source Kubernetes operator that simplifies Ray deployment and management. “With the help of Ray on GKE, our AI practitioners are able to get easy orchestration of infrastructure resources, flexibility and scalability that their applications need without the headache of understanding and managing the intricacies of the underlying platform.” - Nacef Labidi, Head of Infrastructure, Instadeep In this blog, we discuss the numerous benefits that running Ray on GKE brings to the table — scalability, cost-efficiency, fault tolerance and isolation, and portability, to name a few — and resources on how to get started. Easy scalability and node auto-provisioningOn VMs, Ray's scalability is inherently limited by the number of VMs in the cluster. Autoscaling and node provisioning, configured for specific clouds (example), require detailed knowledge of machine types and network configurations. In contrast, Kubernetes orchestrates infrastructure resources using containers, pods, and VMs as scheduling units, while Ray distributes data-parallel processes within applications, employing actors and tasks for scheduling. KubeRay introduces cloud-agnostic autoscaling to the mix, allowing you to define minimum and maximum replicas within the workerGroupSpec. Based on this configuration, the Ray autoscaler schedules more Kubernetes pods as required by its tasks. And if you choose the GKE Autopilot mode of operation, node provisioning happens automatically, eliminating the need for manual configuration. Greater efficiency and improved startup latencyGKE offers discount-based savings such as committed use discounts, new pricing model and reservations for GPUs in Autopilot mode. In addition, GKE makes it easy to taking advantage of cost-saving measures like spot nodes via YAML configuration. Low startup latency is critical to optimal resource usage, ensuring quick recovery, faster iterations and elasticity. GKE image streaming lets you initialize and run eligible container images from Artifact Registry, without waiting for the full image to download. Testing demonstrated containers going from `ray-ml` container image going from `ContainerCreating` to `Running` state in 8.82s, compared to 5m17s without image streaming — that’s 35x faster! Image streaming is automatically enabled on Autopilot clusters and available on Standard clusters. Automated infrastructure management for fault tolerance and isolationManaging a Ray cluster on VMs offers control over fault tolerance and isolation via detailed VM configuration. However, it lacks the automated, portable self-healing capabilities that Kubernetes provides. Kubernetes excels at repeatable automation that is expressed with clear declarative and idempotent desired state configuration. It provides automatic self-healing capabilities, which in Kuberay 2.0 or later extends to preventing the Ray cluster from crashing when the head node goes down. In fact, Ray Serve docs specifically recommend Kubernetes for production workloads, using the RayService custom resource to automatically handle health checking, status reporting, failure recovery and upgrades. On GKE, the declarative YAML-based approach not only simplifies deployment and management but can also be used to provision security and isolation. This is achieved by integrating Kubernetes' RBAC with Google Cloud's Identity and Access Management (IAM), allowing administrators to finely tune the permissions granted to each Ray cluster. For instance, a Ray cluster that requires access to a Google Cloud Storage bucket for data ingestion or model storage can be assigned specific roles that limit its actions to reading and writing to that bucket only. This is configured by specifying the Kubernetes service account (KSA) as part of the pod template for Ray cluster `workerGroupSpec` and then linking a Google Service account with appropriate permissions to the KSA using the workload identity annotation. Easy multi-team sharing with Kubernetes namespacesOut of the box, Ray does not have any security separation between Ray clusters. With Kubernetes you can leverage namespaces to create a Ray cluster per team, and use Kubernetes Role-Based Access Control (RBAC), Resource Quotas and Network Policies. This creates a namespace-based trust boundary to allow multiple teams to each manage their Ray clusters within a larger shared Kubernetes cluster. Flexibility and portabilityYou can use Kubernetes for more than just data and AI. As a general-purpose platform, Kubernetes is portable across clouds and on-premises, and has a rich ecosystem. With Kubernetes, you can mix Ray and non-Ray workloads on the same infrastructure, allowing the central platform team to manage a single common compute layer, while leaving infrastructure and resource management to GKE. Think of it as your own personal SRE. Get started with Kuberay on GKEIn conclusion, running Ray on GKE is a straightforward way to achieve scalability, cost-efficiency, fault tolerance and isolation for your production workloads, all while ensuring cloud portability. You get the flexibility to adapt quickly to changing demands, making it an ideal choice for forward-thinking organizations in an ever-evolving generative AI landscape. To get started with Kuberay on GKE, follow these instructions. This repo has Terraform templates to run Kuberay on GPUs and TPUs, and examples for training and serving. You can also find more tutorials and code samples at AI/ML on GKE page. View the full article
  13. The post 8 Best Open-Source Disk Cloning & Backup Tools for Linux (2024) first appeared on Tecmint: Linux Howtos, Tutorials & Guides .Disk cloning is the process of copying data from one hard disk to another. While you can perform this task using copy-and-paste methods, it’s important The post 8 Best Open-Source Disk Cloning & Backup Tools for Linux (2024) first appeared on Tecmint: Linux Howtos, Tutorials & Guides.View the full article
  14. Years of SaaS innovation and disruption threaten the profitability of vendor-driven open source projects, forcing licensing changes.View the full article
  15. Fix: Developer Chasm To Engage More Devs With My Open Source Project Wish I could push that git commit to move beyond initial developer engagement. A developer chasm means getting stuck with open-source community growth after initial engagement. In this article, I will share the insights that helped me successfully move open-source projects from the initial developer engagement stage to the category leader stage with community-led growth. I learned from my developer community-building work and developer relations consultant work for open-source projects. View the full article
  16. Edge AI is transforming the way that devices interact with data centres, challenging organisations to stay up to speed with the latest innovations. From AI-powered healthcare instruments to autonomous vehicles, there are plenty of use cases that benefit from artificial intelligence on edge computing. This blog will dive into the topic, capturing key considerations when starting an edge AI project, main benefits, challenges and how open source fits into the picture. What is Edge AI? AI at the edge, or Edge AI, refers to the combination of artificial intelligence and edge computing. It aims to execute machine learning models on interconnected edge devices. It enables devices to make smarter decisions, without always connecting to the cloud to process the data. It is called edge, because the machine learning model runs near the user rather than in a data centre. Edge AI is growing in popularity as industries identify new use cases and opportunities to optimise their workflows, automate business processes or unlock new chances to innovate. Self-driving cars, wearable devices, security cameras, and smart home appliances are among the technologies that take advantage of edge AI capabilities to deliver information to users in real-time when it is most essential. Benefits of edge AI Nowadays, algorithms are capable of understanding different tasks such as text, sound or images. They are particularly useful in places occupied by end users with real-world problems. These AI applications would be impractical or even impossible to deploy in a centralised cloud or enterprise data centre due to issues related to latency, bandwidth and privacy. Some of the most important benefits of edge AI are: Real time insights: Since data is analysed real time, close to the user, edge AI enables real time processing and reduces the time needed to complete activities and derive insight.Cost savings: Depending on the use case, some data can often be processed at the edge where it is collected, so it doesn’t all have to be sent to the data centre for training the machine learning algorithms. This reduces the cost of storing the data, as well as training the model. At the same time, organisations often utilise edge AI to reduce the power consumption of the edge devices, by optimising the time they are on and off, which again leads to cost reduction.High availability: Having a decentralised way of training and running the model enables organisations to ensure that their edge devices benefit from the model even if there is a problem within the data centre.Privacy: Edge AI can analyse data in real time without exposing it to humans, increasing the privacy of appearance, voice or identity of the objects involved. For example, surveillance cameras do not need someone to look at them, but rather have machine learning models that send alerts depending on the use case or need.Sustainability: Using edge AI to reduce the power consumption of edge devices doesn’t just minimise costs, it also enables organisations to become more sustainable. With edge AI, enterprises can avoid utilising their devices unless they are needed. Use cases in the industrial sector Across verticals, enterprises are quickly developing and deploying edge AI models to address a wide variety of use cases. To get a better sense of the value that edge AI can deliver, let’s take a closer look at how it is being used in the industrial sector. Industrial manufacturers struggle with large facilities that often use a significant number of devices. A survey fielded in the spring of 2023 by Arm found that edge computing and machine learning were among the top five technologies that will have the most impact on manufacturing in the coming years. Edge AI use cases are often tied to the modernisation of existing manufacturing factories. They include production scheduling, quality inspection, and asset maintenance – but applications go beyond that. Their main objective is to improve the efficiency and speed of automation tasks like product assembly and quality control. Some of the most prominent use cases of Edge AI in manufacturing include: Real-time detection of defects as part of quality inspection processes that use deep neural networks for analysing product images. Often, this also enables predictive maintenance, helping manufacturers minimise the need to reactively fix their components by instead addressing potential issues preemptively. Execution of real-time production assembly tasks based on low-latency operations of industrial robots. Remote support of technicians on field tasks based on augmented reality (AR) and mixed reality (MR) devices; Low latency is the primary driver of edge AI in the industrial sector. However, some use cases also benefit from improved security and privacy. For example, 3D printers3d printers can use edge AI to protect intellectual property through a centralised cloud infrastructure. Best practices for edge AI Compared to other kinds of AI projects, running AI at the edge comes with a unique set of challenges. To maximise the value of edge AI and avoid common pitfalls, we recommend following these best practices: Edge device: At the heart of Edge AI are the devices which end up running the models. They all have different architectures, features and dependencies. Ensure that the capabilities of your hardware align with the requirements of your AI model, and ensure that the software – such as the operating system – is certified on the edge device.. Security: Both in the data centres and on the edge devices there are artefacts that could compromise the security of an organisation. Whether we talk about the data used for training, the ML infrastructure used for developing or deploying the ML model, or the operating system of the edge device, organisations need to protect all these artefacts. Take advantage of the appropriate security capabilities to safeguard these components, such as secure packages, secure boot of the OS from the edge device, or full-disk encryption on the device.Machine learning size: Depending on the use case, the size of the machine learning model is different. It needs to fit on the end device that it is intended to run, so developers need to optimise the model size dictate the chances to successfully deploying it.Network connection: The machine learning lifecycle is an iterative process, so models need to be periodically updated. Therefore, the network connection influences both the data collection process as well as the model deployment capabilities. Organisations need to check and ensure there is a reliable network connection before building deploying models or building an AI strategy.Latency: Organisations often use edge AI for real-time processing, so the latency needs to be minimal. For example, retailers need instant alerts when fraud is detected and cannot ask customers to wait at the cashiers for minutes before confirming payment. Depending on the use case, latency needs to be assessed and considered when choosing the tooling and model update cadence.Scalability: Scale is often limited to the cloud bandwidth to move and process information. It leads to high costs. To ensure a broader range of scalability, the data collection and part of the data processing should happen at the edge. Remote management: Organisations often have multiple devices or multiple remote locations, so scaling to all of them brings new challenges related to their management. To address these challenges, ensure that you have mechanisms in place for easy, remote provisioning and automated updates. Edge AI with open source Open source is at the centre of the artificial intelligence revolution, and open source solutions can provide an effective path to addressing many of the best practices described above. When it comes to edge devices, open source technology can be used to ensure the security, robustness and reliability of both the device and machine learning model. It gives organisations the flexibility to choose from a wide spectrum of tools and technologies, benefit from community support and quickly get started without a huge investment. Open source tooling is available across all layers of the stack, from the operating system that runs on the edge device, to the MLOps platform used for training, to the frameworks used to deploy the machine learning model. Edge AI with Canonical Canonical delivers a comprehensive AI stack with all the open source software organisations need for their edge AI projects. Canonical offers an end-to-end MLOps solution that enables you to train your models. Charmed Kubeflow is the foundation of the solution, and it is seamlessly integrated with leading open source tooling such as MLflow for model registry or Spark for data streaming. It gives organisations flexibility to develop their models on any cloud platform and any Kubernetes distribution, offering capabilities such as user management, security maintenance of the used packages or managed services. The operating system that the device runs plays an important role. Ubuntu Core is the distribution of the open source Ubuntu operating system dedicated to IoT devices. It has capabilities such as secure boot and full disk encryption to ensure the security of the device. For certain use cases, running a small cloud, such as Microcloud enables unattended edge clusters to leverage machine learning. Packaging models as snaps makes them easy to maintain and update in production. Snaps offer a variety of benefits including OTA updates, auto rollback in case of failure and no touch deployment. At the same time, for managing the lifecycle of the machine learning of the model and the remote management, brand stores are an ideal solution.. To learn more about Canonical’s edge AI solutions, get in touch. Further reading 5 Edge Computing Examples You Should Know How a real-time kernel reduces latency in telco edge clouds MLOps Toolkit Explained View the full article
  17. Edge AI is transforming the way that devices interact with data centres, challenging organisations to stay Edge AI is transforming the way that devices interact with data centres, challenging organisations to stay up to speed with the latest innovations. From AI-powered healthcare instruments to autonomous vehicles, there are plenty of use cases that benefit from artificial intelligence on edge devices. This blog will dive into the topic, capturing key considerations when starting an edge AI project, the main benefits, challenges and how open source fits into the picture. What is Edge AI? AI at the edge, or edge AI, refers to the combination of artificial intelligence and edge computing. It aims to execute machine learning models on connected edge devices. It enables devices to make smarter decisions, without always connecting to the cloud to process the data. It is called edge, because the machine learning model runs near the user rather than in a data centre. Edge AI is growing in popularity as industries identify new use cases and opportunities to optimise their workflows, automate business processes or unlock new chances to innovate. Self-driving cars, wearable devices, Industrial assembly lines, and smart home appliances are among the technologies that take advantage of edge AI capabilities to deliver information to users in real-time when it is most essential. Benefits of edge AI Algorithms are capable of understanding different inputs such as text, sound or images. They are particularly useful in places occupied by end users with real-world problems. These AI applications would be impractical or even impossible to deploy in a centralised cloud or enterprise data centre due to issues related to latency, bandwidth and privacy. Some of the most important benefits of edge AI are: Real time insights: Since data is analysed real time, close to the user, edge AI enables real time processing and reduces the time needed to complete activities and derive insight.Cost savings: Depending on the use case, some data can be processed at the edge where it is collected, so it doesn’t all have to be sent to the data centre for training the machine learning algorithms. This reduces the cost of storing the data, as well as training the model. At the same time, organisations often utilise edge AI to reduce the power consumption of the edge devices, by optimising the time they are on and off, which again leads to cost reduction.High availability: Having a decentralised way of training and running the model enables organisations to ensure that their edge devices benefit from the model even if there is a problem within the data centre.Privacy: Edge AI can analyse data in real time without exposing it to humans, increasing the privacy of appearance, voice or identity of the objects involved. For example, surveillance cameras do not need someone to look at them, but rather have machine learning models that send alerts depending on the use case or need.Sustainability: Using edge AI to reduce the power consumption of edge devices doesn’t just minimise costs, it also enables organisations to become more sustainable. With edge AI, enterprises can avoid utilising their devices unless they are needed. Use cases in the industrial sector Across verticals, enterprises are quickly developing and deploying edge AI models to address a wide variety of use cases. To get a better sense of the value that edge AI can deliver, let’s take a closer look at how it is being used in the industrial sector. Industrial manufacturers struggle with large facilities that often use a significant number of devices. A survey fielded in the spring of 2023 by Arm found that edge computing and machine learning were among the top five technologies that will have the most impact on manufacturing in the coming years. Edge AI use cases are often tied to the modernisation of existing manufacturing factories. They include production scheduling, quality inspection, and asset maintenance – but applications go beyond that. Their main objective is to improve the efficiency and speed of automation tasks like product assembly and quality control. Some of the most prominent use cases of Edge AI in manufacturing include: Real-time detection of defects as part of quality inspection processes that use deep neural networks for analysing product images. Often, this also enables predictive maintenance, helping manufacturers minimise the need to reactively fix their components by instead addressing potential issues preemptively. Execution of real-time production assembly tasks based on low-latency operations of industrial robots. Remote support of technicians on field tasks based on augmented reality (AR) and mixed reality (MR) devices; Low latency is the primary driver of edge AI in the industrial sector. However, some use cases also benefit from improved security and privacy. For example, 3D printers3d printers can use edge AI to protect intellectual property through a centralised cloud infrastructure. Best practices for edge AI Compared to other kinds of AI projects, running AI at the edge comes with a unique set of challenges. To maximise the value of edge AI and avoid common pitfalls, we recommend following these best practices: Edge device: At the heart of Edge AI are the devices which end up running the models. They all have different architectures, features and dependencies. Ensure that the capabilities of your hardware align with the requirements of your AI model, and ensure that the software – such as the operating system – is certified on the edge device.. Security: Both in the data centres and on the edge devices there are artefacts that could compromise the security of an organisation. Whether we talk about the data used for training, the ML infrastructure used for developing or deploying the ML model, or the operating system of the edge device, organisations need to protect all these artefacts. Take advantage of the appropriate security capabilities to safeguard these components, such as secure packages, secure boot of the OS from the edge device, or full-disk encryption on the device.Machine learning size: Depending on the use case, the size of the machine learning model is different. It needs to fit on the end device that it is intended to run, so developers need to optimise the model size and dictate the chances to successfully deploy it.Network connection: The machine learning lifecycle is an iterative process, so models need to be periodically updated. Therefore, the network connection influences both the data collection process as well as the model deployment capabilities. Organisations need to check and ensure there is a reliable network connection before building deploying models or building an AI strategy.Latency: Organisations often use edge AI for real-time processing, so the latency needs to be minimal. For example, retailers need instant alerts when fraud is detected and cannot ask customers to wait at the cashiers for minutes before confirming payment. Depending on the use case, latency needs to be assessed and considered when choosing the tooling and model update cadence.Scalability: Scale is often limited to the cloud bandwidth to move and process information. It leads to high costs. To ensure a broader range of scalability, the data collection and part of the data processing should happen at the edge. Remote management: Organisations often have multiple devices or multiple remote locations, so scaling to all of them brings new challenges related to their management. To address these challenges, ensure that you have mechanisms in place for easy, remote provisioning and automated updates. Edge AI with open source Open source is at the centre of the artificial intelligence revolution, and open source solutions can provide an effective path to addressing many of the best practices described above. When it comes to edge devices, open source technology can be used to ensure the security, robustness and reliability of both the device and machine learning model. It gives organisations the flexibility to choose from a wide spectrum of tools and technologies, benefit from community support and quickly get started without a huge investment. Open source tooling is available across all layers of the stack, from the operating system that runs on the edge device, to the MLOps platform used for training, to the frameworks used to deploy the machine learning model. Edge AI with Canonical CaCanonical delivers a comprehensive AI stack with all the open source software which your organisation might need for your edge AI projects. Canonical offers an end-to-end MLOps solution that enables you to train your models. Charmed Kubeflow is the foundation of the solution, and it is seamlessly integrated with leading open source tooling such as MLflow for model registry or Spark for data streaming. It gives organisations flexibility to develop their models on any cloud platform and any Kubernetes distribution, offering capabilities such as user management, security maintenance of the used packages or managed services. The operating system that the device runs plays an important role. Ubuntu Core is the distribution of the open source Ubuntu operating system dedicated to IoT devices. It has capabilities such as secure boot and full disk encryption to ensure the security of the device. For certain use cases, running a small cloud, such as Microcloud enables unattended edge clusters to leverage machine learning. Packaging models as snaps makes them easy to maintain and update in production. Snaps offer a variety of benefits including OTA updates, auto rollback in case of failure and no touch deployment. At the same time, for managing the lifecycle of the machine learning of the model and the remote management, brand stores are an ideal solution. Get Started with edge AI Explore the Canonical solution further with our MLOps Toolkit to discover the key factors to consider when building your machine learning toolkit which includes: Hardware and software that is already tested and validated on the marketOpen source machine learning tools for data processing and models buildingContainer solutions for orchestrationCloud computing with multiple optionsProduction-grade solutions that can be rolled out within an enterprise Download the MLOps Toolkit here. To learn more about Canonical’s edge AI solutions, get in touch. Further reading 5 Edge Computing Examples You Should Know How a real-time kernel reduces latency in telco edge clouds MLOps Toolkit Explained View the full article
  18. If you're contemplating the daring act of open sourcing your projects, here are some things to know before you set out. View the full article
  19. As we announced at DockerCon, we’re now providing a free Docker Scout Team subscription to all Docker-Sponsored Open Source (DSOS) program participants. If your open source project participates in the DSOS program, you can start using Docker Scout today. If your open source project is not in the Docker-Sponsored Open Source program, you can check the requirements and apply. For other customers, Docker Scout is already generally available. Refer to the Docker Scout product page to learn more. Why use Docker Scout? Docker Scout is a software supply chain solution designed to make it easier for developers to identify and fix supply chain issues before they hit production. To do this, Docker Scout: Gives developers a centralized view of the tools they already use to see all the critical information they need across the software supply chain Makes clear recommendations on how to address those issues, including for security issues and opportunities to improve reliability efforts Provides automation that highlights new defects, failures, or issues Docker Scout allows you to prevent and address flaws where they start. By identifying issues earlier in the software development lifecycle and displaying information in Docker Desktop and the command line, Docker Scout reduces interruptions and rework. Supply chain security is a big focus in software development, with attention from enterprises and governments. Software is complex, and when security, reliability, and stability issues arise, they’re often the result of an upstream library. So developers don’t just need to address issues in the software they write but also in the software their software uses. These concerns apply just as much to open source projects as proprietary software. But the focus on improving the software supply chain results in an unfunded mandate for open source developers. A research study by the Linux Foundation found that almost 25% of respondents said the cost of security gaps was “high” or “very high.” Most open source projects don’t have the budget to address these gaps. With Docker Scout, we can reduce the burden on open source projects. Conclusion At Docker, we understand the importance of helping open source communities improve their software supply chain. We see this as a mutually beneficial relationship with the open source community. A well-managed supply chain doesn’t just help the projects that produce open source software; it helps downstream consumers through to the end user. For more information, refer to the Docker Scout documentation. Learn more Join our “Improving Software Supply Chain Security for Open Source Projects” webinar on Wednesday, February 7, 2024 at 1 PM Eastern (1700 UTC). Watch on LinkedIn or on the Riverside streaming platform. Try Docker Scout. Looking to get up and running? Use our Quickstart guide. Vote on what’s next! Check out the Docker Scout public roadmap. Have questions? The Docker community is here to help. Not a part of DSOS? Apply now. View the full article
  20. CloudBees co-founder buzzes about open source drama and AIView the full article
  21. What does it mean for a new technology to go mainstream? First released in 2005, Git was still a new open source version control system when we founded GitHub. Today, Git is a foundational element of the modern developer experience—93% of developers use it to build and deploy software everywhere1. In 2023, GitHub data highlighted how another technology has quickly begun to reshape the developer experience: AI. This past year, more and more developers started working with AI, while also experimenting with building AI-powered applications. Git has fundamentally changed today’s developer experience, and now AI is setting the stage for what’s next in software development. At GitHub, we know developers love to learn by doing and open source helps developers more rapidly adopt new technologies, integrate them into their workflows, and build what’s next. Open source also powers nearly every piece of modern software—including much of the digital economy. As we explore how technologies become mainstream, GitHub continues to play a pivotal role in bridging the gap between experimentation and the widespread adoption of open source technologies, which underpin the foundations of our software ecosystem. In this year’s report, we’ll study how open source activity around AI, the cloud, and Git has changed the developer experience and is increasingly driving impact among developers and organizations alike. We uncover three big trends: Developers are building with generative AI in big numbers. We’re seeing more developers experiment with foundation models from OpenAI and other AI players, with open source generative AI projects even entering the top 10 most popular open source projects by contributor count in 2023. With almost all developers (92%) using or experimenting with AI coding tools, we expect open source developers to drive the next wave of AI innovation on GitHub.2 Developers are operating cloud-native applications at scale. We’re seeing an increase in declarative languages using Git-based infrastructure as code (IaC) workflows, greater standardization in cloud deployments, and a sharp increase in the rate at which developers were using Dockerfiles and containers, IaC, and other cloud-native technologies. 2023 saw the largest number of first-time open source contributors. We continue to see commercially backed open source projects capture the largest share of first-time contributors and overall contributions—but this year, we also saw generative AI projects enter the top 10 most popular projects for first-time contributors. We’re also seeing notable growth in private projects on GitHub, which increased 38% year over year and account for more than 80% of all activity on GitHub. Kyle Daigle Chief Operating Officer // GitHub Oh, and if you’re a visual learner, we have you covered. A global community of developers building on GitHub Globally, developers are using GitHub to build software and collaborate in larger numbers than ever before—and that spans across public and private projects. This not only proves the foundational value of Git in today’s developer experience, but also shows the global community of developers using GitHub to build software. With 20.2 million developers and a 21% increase in developer growth over the past year, the U.S. continues to have the largest developer community globally. But since 2013, we’ve continued to see other communities account for more growth across the platform which we expect to continue. This worldwide distribution of developers on GitHub shows which regions have the most developers. Who do we consider to be a developer? We define “developer” as anyone with a GitHub account. Why? The open source and developer communities are an increasingly diverse and global group of people who tinker with code, make non-code contributions, conduct scientific research, and more. GitHub users drive open source innovation, and they work across industries—from software development to data analysis and design. Developer communities in Asia Pacific, Africa, South America, and Europe are getting bigger year over year—with India, Brazil, and Japan among those leading the pack. Explore our data with the GitHub Innovation Graph To help researchers build their own insights from GitHub data, we have released the GitHub Innovation Graph. With the GitHub Innovation Graph, researchers, policymakers, and developers can now access valuable data and insights into global developer impact to assess the influence of open source on the global economy. Through a dedicated webpage and repository, it offers quarterly data that dates back to 2020 and includes Git pushes, developers, organizations, repositories, languages, licenses, topics, and economic collaborators. Explore the GitHub Innovation Graph > Projecting the top 10 developer communities over the next five years To understand which developer communities are poised to grow the most over the next five years, we built projections based on current growth rates. Under this rubric, we anticipate that India will overtake the United States as the largest developer community on GitHub by 2027. These projections assume linear growth to forecast which developer communities will be the largest on GitHub by 2028. Fastest growing developer communities in Asia Pacific We continue to see considerable growth in the Asia Pacific region driven by economic hubs in India, Japan, and Singapore. # of developers YoY growth 01 Singapore >1M developers 39% 02 India >13.2M developers 36% 03 Hong Kong (SAR) >1.6M developers 35% 04 Vietnam >1.5M developers 34% 05 Indonesia >2.9M developers 31% 06 Japan >2.8M developers 31% 07 The Philippines >1.3M developers 31% 08 Thailand >857K developers 25% 09 South Korea >1.9M developers 22% 10 Australia >1.4M developers 21% Table 1: Developer growth by total developers in 2023, % increase from 2022. India’s developer community continues to see massive year-over-year growth. In last year’s Octoverse, we predicted that India would overtake the United States in total developer population. That’s still on track to happen. India saw a 36% year-over-year increase in its developer population with 3.5 million new developers joining GitHub in 2023. As a part of the UN-backed Digital Public Goods Alliance, India’s been building its digital public infrastructure with open materials—ranging from software code to AI models—to improve digital payments and ecommerce systems. Here’s a list of open source software (OSS) projects that Indian developers have built and are contributing to on GitHub. Singapore saw the most growth in developer population this year in APAC, and ranks first globally with the highest ratio of developers to overall population. The National University of Singapore’s School of Computing incorporates GitHub into its curriculum, and high growth may also be attributable to the country’s regulatory significance in Southeast Asia. We’re also likely to see continued developer growth in Japan over the next year as a result of its investments in technology and startups. Fastest growing developer communities in Africa With the fastest growing population in the world and an increasing pool of developers, African regions have been identified as promising hubs for technology companies. (For example, in Kenya, programming is mandatory to teach in primary and secondary school.) # of developers YoY growth 01 Nigeria >868K developers 45% 02 Ghana >152K developers 41% 03 Kenya >296K developers 41% 04 Morocco >446K developers 35% 05 Ethiopia >94K developers 32% 06 South Africa >539K developers 30% Table 2: Developer growth by total developers in 2023, % increase from 2022. Nigeria is a hot spot for OSS adoption and technological investments, and its 45% year-over-year growth rate—which is the largest worldwide increase—reflects this. There’s also a collection of at least 200 projects on GitHub made by Nigerian developers, which can be found under the “Made in Africa” collection. Fastest growing developer communities in South America Developer growth rates in South America are on par with some of the fastest-growing developer communities in Asia Pacific and Africa. # of developers YoY growth 01 Argentina >925K developers 33% 02 Bolivia >105K developers 33% 03 Colombia >872K developers 31% 04 Brazil >4.3M developers 30% 05 Chile >437K developers 26% Table 3: Developer growth by total developers in 2023, % increase from 2022. In 2023, Brazil’s developer population was the largest in this region and continues to grow by double-digits with a 30% year-over-year increase. This follows continued investment by private and public organizations in Brazil. Check out the list of OSS projects that Brazilian developers made and are contributing to on GitHub. We’re also seeing continued growth in Argentina and Colombia, which have emerged over the last few years as popular investment targets for organizations. Open banking systems have helped to accelerate global growth—and developer activity. Such systems have enabled Indian citizens who are in their country’s welfare system to receive direct benefit transfers to their bank accounts, and helped to disburse emergency funds during the pandemic. Mercado Libre serves as Latin America’s largest e-commerce and digital payments ecosystem. By using GitHub to automate deployment, security tests, and repetitive tasks, its developers stay focused on their mission to democratize commerce. Meanwhile, 70% of Brazil’s adult population and 60% of its businesses have used Pix, the country’s real-time payments infrastructure. The Central Bank of Brazil recently open sourced Pix’s communication protocols. The bottom line: developers want to build great software and rank designing solutions to novel problems among the top things that positively impact their workdays. When investments are made to optimize the developer experience, developers can drive real-world impact that they’re proud of. Fastest growing developer communities in Europe Communities throughout Europe continue to see increases in their overall developer populations, but their development now more closely mirrors the United States in aggregate as communities in South America, Africa, and the Asia Pacific outpace them in growth. # of developers YoY growth 01 Spain >1.5M developers 25% 02 Portugal >410K developers 24% 03 Poland >1.2M developers 24% 04 Germany >2.9M developers 22% 05 Italy >1.1M developers 22% 06 France >2.3M developers 22% 07 United Kingdom >3.4M developers 21% Table 4: Developer growth by total developers in 2023, % increase from 2022. Notably, the growth in France follows its government push to attract more tech startups. We’re also seeing an uptick in growth in Spain and Italy, which speaks to efforts in these two regions to bolster their domestic technology markets. The explosive growth of generative AI in 2023 While generative AI made a splash in news headlines in 2023, it’s not entirely new to developers on GitHub. In fact, we’ve seen several generative AI projects emerge on GitHub over the past several years—and plenty of other AI-focused projects, too. But GitHub data in 2023 reflects how these AI projects have progressed from more specialist-oriented work and research to more mainstream adoption with developers increasingly using pre-trained models and APIs to build generative AI-powered applications. Just halfway through this past year, we saw more than twice the number of generative AI projects in 2023 as in all of 2022. And we know this is just the tip of the iceberg. As more developers experiment with these new technologies, we expect them to drive AI innovation in software development and continue to bring the technology’s fast-evolving capabilities into the mainstream. Developers are increasingly experimenting with AI models. Where in years past we saw developers building projects with machine learning libraries like tensorflow/tensorflow, pytorch/pytorch, we now see far more developers experimenting with AI models and LLMs such as the ChatGPT API. Stay smart: we anticipate businesses and organizations to also leverage pre-trained AI models—especially as more and more developers become familiar with building with them. Open source AI innovation is diverse and the top AI projects are owned by individual developers. Analyzing the top 20 open source generative AI projects on GitHub, some of the top projects are owned by individuals. That suggests that open source projects on GitHub continue to drive innovation and show us all what’s next in the industry, with the community building around the most exciting advancements. Generative AI is driving a significant and global spike in individual contributors to generative AI projects with 148% year-over-year growth—and a 248% year-over-year increase in the total number of generative AI projects, too. Notably, the United States, India, and Japan are leading the way among developer communities with other regions, including Hong Kong (SAR), the United Kingdom, and Brazil following. The massive uptick in the number of developers learning about generative AI will impact businesses. As more and more developers gain familiarity with building generative AI-powered applications, we expect a growing talent pool to bolster businesses that seek to develop their own AI-powered products and services. What will the impact of generative AI be on developers? Earlier this year, we partnered with Harvard Business School and Keystone.AI to conduct some research around the economic and productivity impacts that AI will have on the developer landscape. One of the more striking key findings we uncovered is that the productivity gains that developers stand to benefit from generative AI could contribute an estimated $1.5 trillion USD to the global economy, as well as an additional 15 million “effective developers” to worldwide capacity by 2030. Learn more > The bottom line: over the past year, we have seen an exponential growth in applications being built on top of foundation models, like ChatGPT, as developers use these LLMs to develop user-facing tools, such as APIs, bots, assistants, mobile applications, and plugins. Developers globally are helping to lay the groundwork for mainstream adoption, and experimentation is helping to build a talent pool for organizations. The most popular programming languages Since we saw a massive growth in cloud-native development in 2019, IaC has continued to grow in open source. In 2023, Shell and Hashicorp Configuration Language (HCL) once again emerged as top languages across open source projects, indicating that operations and IaC work are gaining prominence in the open source space. HCL adoption registered 36% year-over-year growth, which shows that developers are making use of infrastructure for their applications. The increase in HCL suggests developers are increasingly using declarative languages to dictate how they’re leveraging cloud deployments. JavaScript has once again taken the crown for the #1 most popular language, and we continue to see familiar languages, such as Python and Java, remain in the top five languages year over year. TypeScript rises in popularity. This year, TypeScript overtook Java for the first time as the third most popular language across OSS projects on GitHub with 37% growth of its user base. A language, type checker, compiler, and language service all in one, TypeScript was launched in 2012 and marked the dawn of gradual types, which allow developers to adopt varying levels of static and dynamic typing in their code. Learn more about Typescript > There has been a notable increase in popular languages and frameworks for data analytics and operations. Venerable languages, such as T-SQL and TeX, grew in 2023, which highlights how data scientists, mathematicians, and analysts are increasingly engaging with open source platforms and tooling. The bottom line: Programming languages aren’t just confined to the realm of traditional software development anymore. We see remarkable parity with the most popular languages used in projects created in 2023 when compared to the overall most popular languages used across GitHub. Some notable outliers include Kotlin, Rust, Go, and Lua, which have seen larger growth across newer projects on GitHub. Rust continues to rise Amid comments from industry leaders about how systems programming should be conducted in Rust and its inclusion in the Linux kernel, Rust continues to attract more and more developers. While its overall usage is comparatively low to other languages, it is growing at 40% year over year and was named by the 2023 Stack Overflow developer survey as the most admired language for the eighth year in a row. Learn why Rust is so admired > Both Rust and Lua are notable for their memory safety and efficiency—and both can be used for systems and embedded systems programming, which can be attributed to their growth. And the recent growth of Go is driven by cloud-native projects, such as Kubernetes and Prometheus. Defining a language vs. a framework A programming language is a formal means of defining the syntax and semantics for writing code, and it serves as the foundation for development by specifying the logic and behavior of applications. A framework is a pre-built set of tools, libraries, and conventions designed to streamline and structure the development process for specific types of applications. Developer activity as a bellwether of new tech adoption In early 2023, we celebrated a milestone of more than 100 million developers using GitHub—and since last year we’ve seen a nearly 26% increase in all global developer accounts on GitHub. More developers than ever collaborate across time zones and build software. Developer activity, in both private and public repositories, underscores what technologies are being broadly adopted—and what technologies are poised for wider adoption. Developers are automating more of their workflows. Over the past year, developers used 169% more GitHub Actions minutes to automate tasks in public projects, develop CI/CD pipelines, and more. On average, developers used more than 20 million GitHub Actions minutes a day in public projects. And the community keeps growing with the number of GitHub Actions in the GitHub Marketplace passing the 20,000 mark in 2023. This underscores growing awareness across open source communities around automation for CI/CD and community management. More than 80% of GitHub contributions are made to private repositories. That’s more than 4.2 billion contributions to private projects and more than 310 million to public and open source projects. These numbers show the sheer scale of activity happening across public, open source, and private repositories through free, Team, and GitHub Enterprise accounts. The abundance of private activity suggests the value of innersource and how Git-based collaboration doesn’t benefit the quality of just open source but also proprietary code. In fact, all developers in a recent GitHub-sponsored survey said their companies have adopted some innersource practices at minimum, and over half said there’s an active innersource culture in their organization. GitHub is where developers are operating and scaling cloud-native applications. In 2023, 4.3 million public and private repositories used Dockerfiles—and more than 1 million public repositories used Dockerfiles for creating containers. This follows the increased use we’ve seen in Terraform and other cloud-native technologies over the past few years. The increased adoption of IaC practices also suggests developers are bringing more standardization to cloud deployments. Generative AI makes its way into GitHub Actions. The early adoption and collaborative power of AI among the developer community is apparent in the 300+ AI-powered GitHub Actions and 30+ GPT-powered GitHub Actions in the GitHub Marketplace. Developers not only continue to experiment with AI, but are also bringing it to more parts of the developer experience and their workflows through the GitHub Marketplace. How will AI change the developer experience? 92% of developers are already using AI coding tools both in and outside of work. That’s one of our key findings in a 2023 developer survey GitHub sponsored. Moreover, 81% of developers believe that AI coding tools will make their teams more collaborative. Developers in our survey indicate that collaboration, satisfaction, and productivity are all positioned to get a boost from AI coding tools. Learn more about AI’s impact on the developer experience > The bottom line: developers experiment with new technologies and share their learnings across public and private repositories. This interdependent work has surfaced the value of containerization, automation, and CI/CD to package and ship code across open source communities and companies alike. The state of security in open source This year, we’re seeing developers, OSS communities, and companies alike respond faster to security events with automated alerts, tooling, and proactive security measures—which is helping developers get better security outcomes, faster. We’re also seeing responsible AI tooling and research being shared on GitHub. More developers are using automation to secure dependencies. In 2023, open source developers merged 60% more automated Dependabot pull requests for vulnerable packages than in 2022—which underscores the shared community’s dedication to open source and security. Developers across open source communities are fixing more vulnerable packages and addressing more vulnerabilities in their code thanks to free tools on GitHub, such as Dependabot, code scanning, and secret scanning. We calculate the top 1,000 public projects by a rubric called Mona Rank, which evaluates the number of stars, forks, and unique Issue authors. We take all public, non-forked repositories with a license and calculate ranks for each of the above three metrics and then use the sum to show the top Mona Ranked projects. More open source maintainers are protecting their branches. Protected branches give maintainers more ways to ensure the security of their projects and we’ve seen more than 60% of the most popular open source projects using them. Managing these rules at scale should get even easier since we launched repository rules on GitHub in GA earlier this year. Developers are sharing responsible AI tooling on GitHub. In the age of experimental generative AI, we’re seeing a development trend in AI trust and safety tooling. Developers are creating and sharing tools around responsible AI, fairness in AI, responsible machine learning, and ethical AI. The Center for Security and Emerging Technology at Georgetown University is also identifying which countries and institutions are the top producers of trustworthy AI research and sharing its research code on GitHub. AI redefines “shift left” AI will usher in a new era for writing secure code, according to Mike Hanley, GitHub’s Chief Security Officer and Senior Vice President of Engineering. Traditionally, “shift left” meant getting security feedback as early as possible and catching vulnerable code before it reached production. This definition is set to be radically transformed with the introduction of AI, which is fundamentally changing how we can prevent vulnerabilities from ever being written in code. Tools, like GitHub Copilot and GitHub Advanced Security, bring security directly to developers as they’re introducing their ideas to code in real time. The bottom line: to help OSS communities and projects stay more secure, we’ve invested in making Dependabot, protected branches, CodeQL, and secret scanning available for free to public projects. New adoption metrics in 2023 show how these investments are succeeding in helping more open source projects improve their overall security. We’re also seeing interest in creating and sharing responsible AI tools among software developers and institutional researchers. The state of open source In 2023, developers made 301 million total contributions to open source projects across GitHub that ranged from popular projects like Mastodon to generative AI projects like Stable Diffusion, and LangChain. Commercially backed projects continued to attract some of the most open source contributions—but 2023 was the first year that generative AI projects also entered the top 10 most popular projects across GitHub. Speaking of generative AI, almost a third of open source projects with at least one star have a maintainer who is using GitHub Copilot. Commercially backed projects continue to lead. In 2023, the largest projects by the total number of contributors were overwhelmingly commercially backed. This is a continued trend from last year, with microsoft/vscode, flutter/flutter, and vercel/next.js making our top 10 list again in 2023. Generative AI grows fast in open source and public projects. In 2023, we saw generative AI-based OSS projects, like langchain-ai/langchain and AUTOMATIC1111/stable-diffusion-webui, rise to the top projects by contributor count on GitHub. More developers are building LLM applications with pre-trained AI models and customizing AI apps to user needs. Open source maintainers are adopting generative AI. Almost a third of open source projects with at least one star have a maintainer who is using GitHub Copilot. This follows our program to offer GitHub Copilot for free to open source maintainers and shows the growing adoption of generative AI in open source. Did you know that nearly 30% of Fortune 100 companies have Open Source Program Offices (OSPOs)? OSPOs encourage an organization’s participation in and compliance with open source. According to the Linux Foundation, OSPO adoption across global companies increased by 32% since 2022, and 72% of companies are planning to implement an OSPO or OSS initiative within the next 12 months. Companies, such as Microsoft, Google, Meta, Comcast, JPMorgan Chase, and Mercedes Benz, for example, have OSPOs. We founded GitHub’s OSPO in 2021 and open sourced github-ospo to share our resources and insights. (By our count, GitHub depends on over 50K open source components to build GitHub.) Learn more about OSPOs > Developers see benefits to combining packages and containerization. As we noted earlier, 4.3 million repositories used Docker in 2023. On the other side of the coin, Linux distribution NixOS/nixpkgs has been on the top list of open source projects by contributor for the last two years. First-time contributors continue to favor commercially backed projects. Last year, we found that the power of brand recognition around popular, commercially backed projects drew more first-time contributors than other projects. This continued in 2023 with some of the most popular open source projects among first-time contributors backed by Microsoft, Google, Meta, and Vercel. But community-driven open source projects ranging from home-assistant/core to AUTOMATIC1111/stable-diffusion-webui, langchain-ai/langchain, and Significant-Gravitas/Auto-GPT also saw a surge in activity from first-time contributors. This suggests that open experimentation with foundation models increases the accessibility of generative AI, opening the door to new innovations and more collaboration. 2023 saw the largest number of first time contributors contributing to open source projects. New developers became involved with the open source community through programs like freeCodeCamp, First Contributions, and GitHub Education. We also saw a large number of developers taking part in online, open sourced education projects from the likes of Google and IBM. Other trends to watch Open source projects focused on front-end development continue to grow. The continued growth of vercel/next.js and nuxt/nuxt (which came within the top 40 projects by contributor growth), we’re seeing more developers in open source and public projects engage with front-end development work. The open source home automation project home-assistant/core hits the top contributors list again. The project’s been on the top list nearly every year since 2018 (with the exception of 2021). Its continued popularity shows the strength of the project’s community building efforts. The bottom line: developers are contributing to open source generative AI projects, open source maintainers are adopting generative AI coding tools, and companies continue to rely on open source software. These are all indications that developers who learn in the open and share their experiments with new technologies lift an entire global network of developers—whether they’re working in public or private repositories. Take this with you Just as Git has become foundational to today’s developer experience, we’re now seeing evidence of the mainstream emergence of AI. In the past year alone, a staggering 92% of developers have reported using AI-based coding tools, both inside and outside of work. This past year has also seen an explosive surge in AI experimentation across various open source projects hosted on GitHub. We leave you with three takeaways: GitHub is the developer platform for generative AI. Generative AI evolved from a specialist field into mainstream technology in 2023—and an explosion of activity in open source reflects that. As more developers build and experiment with generative AI, they’re using GitHub to collaborate and collectively learn. Developers are operating cloud-native applications at scale on GitHub. In 2019, we started to see a big jump in the number of developers using container-based technologies in open source—and the rate at which developers are increasingly using Git-based IaC workflows, container orchestration, and other cloud-native technologies sharply increased in 2023. This enormous amount of activity shows that developers are using GitHub to standardize how they deploy software to the cloud. GitHub is where open source communities, developers, and companies are building software. In 2023, we saw a 38% increase in the number of private repositories—which account for more than 81% of all activity on GitHub. But we are seeing continued growth in the open source communities who are using GitHub to build what’s next and push the industry forward. With the data showing the increase in new open source developers and the rapid pace of innovation that is possible in open communities, it’s clear that open source has never been stronger. Methodology This report draws on anonymized user and product data taken from GitHub from October 1, 2022 through September 30, 2023. We define AI projects on GitHub by 683 repository topic terms, which you can learn more about in research we conducted in 2023 (page 25 to be exact). We also evaluate open source projects by a metric we call “Mona Rank,” which is a rank-based analysis of the community size and popularity of projects. More data is publicly available on the GitHub Innovation Graph—a research tool GitHub offers for organizations and individuals curious about the state of software development across GitHub. For a complete methodology, please contact press@github.com. Glossary 2023: a year in this report is the last 365 days from the last Octoverse release and ranges from 10/1/2022 to 9/30/2023. Developers: developers are individual, not-spammy user accounts on GitHub. Public projects: any project on GitHub that is publicly available for others to contribute to, fork, clone, or engage with. Open Source Projects and Communities: open source projects are public repositories with an open source license. Location: geographic information is based on the last known network location of individual users and organization profiles. We only study anonymized and aggregated location data, and never look at location data beyond the geographic region and country. Organizations: organization accounts represent groups of people on GitHub that can be paid or free and big or small. Projects and Repositories: we use repositories and projects interchangeably, but recognize that larger projects can sometimes span multiple repositories. Notes Stack Overflow, “Beyond Git: The other version control systems developers use.” January 2023. ↩ GitHub, “Survey reveals AI’s impact on the developer experience.” June 2023. ↩
  22. Linux, the open-source operating system, has a rich and fascinating history. While many are familiar with its core principles and widespread use, there are lesser-known The post 25 Interesting GNU/Linux Facts You Probably Didn’t Know first appeared on Tecmint: Linux Howtos, Tutorials & Guides. View the full article
  23. The post 9 Top Open Source Reverse Proxy Servers for Linux first appeared on Tecmint: Linux Howtos, Tutorials & Guides .A reverse proxy server is a type of proxy server that is deployed between clients and back-end/origin servers, for example, an HTTP server such as NGINX, Apache, The post 9 Top Open Source Reverse Proxy Servers for Linux first appeared on Tecmint: Linux Howtos, Tutorials & Guides.View the full article
  24. The post 8 Best Open Source Web Servers in 2023 first appeared on Tecmint: Linux Howtos, Tutorials & Guides .It’s been a long journey since the first web server was released back in 1991. For quite a long time, Apache was the only mention-worthy The post 8 Best Open Source Web Servers in 2023 first appeared on Tecmint: Linux Howtos, Tutorials & Guides.View the full article
  25. We will explore their use cases, key features, performance metrics, supported programming languages, and more to provide a comprehensive and unbiased overview of each database.View the full article
  • Forum Statistics

    42.5k
    Total Topics
    42.3k
    Total Posts
×
×
  • Create New...