Jump to content

Search the Community

Showing results for tags 'apache'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • Welcome New Members !
    • General Discussion
    • Site News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Data Engineering, Data Science & AI
    • Development & Programming
    • CI/CD & GitOps
    • Docker, Containers, Microservices & Serverless
    • Infrastructure-as-Code
    • Kubernetes
    • Linux
    • Monitoring, Observability & Logging
    • Security
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure
    • Red Hat OpenShift

Calendars

  • DevOps Events

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Development Experience


Cloud Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

Found 25 results

  1. This post is the second part of our two-part series on the latest performance improvements of stateful pipelines. The first part of this... View the full article
  2. Data Engineering Tools in 2024 The data engineering landscape in 2024 is bustling with innovative tools and evolving trends. Here’s an updated perspective on some of the key players and how they can empower your data pipelines: Data Integration: Informatica Cloud: Still a leader for advanced data quality and governance, with enhanced cloud-native capabilities. MuleSoft Anypoint Platform: Continues to shine in building API-based integrations, now with deeper cloud support and security features. Fivetran: Expands its automated data pipeline creation with pre-built connectors and advanced transformations. Hevo Data: Remains a strong contender for ease of use and affordability, now offering serverless options for scalability. Data Warehousing: Snowflake: Maintains its edge in cloud-based warehousing, with improved performance and broader integrations for analytics. Google BigQuery: Offers even more cost-effective options for variable workloads, while deepening its integration with other Google Cloud services. Amazon Redshift: Continues to be a powerful choice for AWS environments, now with increased focus on security and data governance. Microsoft Azure Synapse Analytics: Further integrates its data warehousing, lake, and analytics capabilities, providing a unified platform for diverse data needs. Data Processing and Orchestration: Apache Spark: Remains the reigning champion for large-scale data processing, now with enhanced performance optimizations and broader ecosystem support. Apache Airflow: Maintains its popularity for workflow orchestration, with improved scalability and user-friendliness. Databricks: Expands its cloud-based platform for Spark with advanced features like AI integration and real-time streaming. AWS Glue: Simplifies data processing and ETL within the AWS ecosystem, now with serverless options for cost efficiency. Emerging Trends: GitOps: Gaining traction for managing data pipelines with version control and collaboration, ensuring consistency and traceability. AI and Machine Learning: Increasingly integrated into data engineering tools for automation, anomaly detection, and data quality improvement. Serverless Data Processing: Offering cost-effective and scalable options for event-driven and real-time data processing. Choosing the right tools: With this diverse landscape, selecting the right tools depends on your specific needs. Consider factors like: Data volume and complexity: Match tool capabilities to your data size and structure. Cloud vs. on-premises: Choose based on your infrastructure preferences and security requirements. Budget: Evaluate pricing models and potential costs associated with each tool. Integration needs: Ensure seamless compatibility with your existing data sources and BI tools. Skillset: Consider the technical expertise required for each tool and available support resources. By carefully evaluating your needs and exploring the strengths and limitations of these top contenders, you’ll be well-equipped to choose the data engineering tools that empower your organization to unlock valuable insights from your data in 2024. The post Data Engineering Tools in 2024 appeared first on DevOpsSchool.com. View the full article
  3. In enterprises, SREs, DevOps, and cloud architects often discuss which platform to choose for observability for faster troubleshooting of issues and understanding about performance of their production systems. There are certain questions they need to answer to get maximum value for their team, such as: Will an observability tool support all kinds of workloads and heterogeneous systems? Will the tool support all kinds of data aggregation, such as logs, metrics, traces, topology, etc..? Will the investment in the (ongoing or new) observability tool be justified? In this article, we will provide the best way to get started with unified observability of your entire infrastructure using open-source Skywalking and Istio service mesh. View the full article
  4. The post 16 Apache Web Server Security and Hardening Tips first appeared on Tecmint: Linux Howtos, Tutorials & Guides .Apache web server is one of the most popular and widely used web servers for hosting files and websites. It’s easy to install and configure The post 16 Apache Web Server Security and Hardening Tips first appeared on Tecmint: Linux Howtos, Tutorials & Guides.View the full article
  5. Today, we are excited to announce that Amazon EMR on EKS now supports managed Apache Flink, available in public preview. With this launch, customers who already use EMR can run their Apache Flink application along with other types of applications on the same Amazon EKS cluster, helping improve resource utilization and simplify infrastructure management. For customers who already run big data frameworks on Amazon EKS, they can now let Amazon EMR automate provisioning and management. View the full article
  6. We are excited to launch two new features that help enforce access controls with Amazon EMR on EC2 clusters (EMR Clusters). These features are supported with jobs that are submitted to the cluster using the EMR Steps API. First is Runtime Role with EMR Steps. A Runtime Role is an AWS Identity and Access Management (IAM) role that you associate with an EMR Step. An EMR Step uses this role to access AWS resources. The second is integration with AWS Lake Formation to apply table and column-level access controls for Apache Spark and Apache Hive jobs with EMR Steps. View the full article
  7. Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 3.1.1 and 3.2.0 for new and existing clusters. Apache Kafka 3.1.1 and Apache Kafka 3.2.0 includes several bug fixes and new features that improve performance. Some of the key features include enhancements to metrics and the use of topic IDs. MSK will continue to use and manage Zookeeper for quorum management in this release for stability. For a complete list of improvements and bug fixes, see the Apache Kafka release notes for 3.1.1 and 3.2.0. View the full article
  8. Amazon EMR release 6.6 now supports Apache Spark 3.2, Apache Spark RAPIDS 22.02, CUDA 11, Apache Hudi 0.10.1, Apache Iceberg 0.13, Trino 0.367, and PrestoDB 0.267. You can use the performance-optimized version of Apache Spark 3.2 on EMR on EC2, EKS, and recently released EMR Serverless. In addition Apache Hudi 0.10.1 and Apache Iceberg 0.13 are available on EC2, EKS, and Serverless. Apache Hive 3.1.2 is available on EMR on EC2 and EMR Serverless. Trino 0.367 and PrestoDB 0.267 are only available on EMR on EC2. View the full article
  9. AWS Glue can now connect to Apache Kafka using additional client authentication mechanisms. AWS Glue now supports SASL (Simple Authentication and Security Layer) using either SCRAM (Salted Challenge Response Authentication Mechanism) or GSSAPI (Kerberos). View the full article
  10. GoAccess is an interactive and real-time web server log analyzer program that quickly analyze and view web server logs. It comes as an open-source and runs as a command line in Unix/Linux operating systems. The post GoAccess (A Real-Time Apache and Nginx) Web Server Log Analyzer first appeared on Tecmint: Linux Howtos, Tutorials & Guides. View the full article
  11. Amazon EMR on Amazon EKS provides a new deployment option for Amazon EMR that allows you to run Apache Spark on Amazon Elastic Kubernetes Service (Amazon EKS). If you already use Amazon EMR, you can now run Amazon EMR based applications with other types of applications on the same Amazon EKS cluster to improve resource utilization and simplify infrastructure management across multiple AWS Availability Zones. If you already run big data frameworks on Amazon EKS, you can now use Amazon EMR to automate provisioning and management, and run Apache Spark up to 3x faster. With this deployment option, you can focus on running analytics workloads while Amazon EMR on Amazon EKS builds, configures, and manages containers. View the full article
  12. We are pleased to announce the general availability of Amazon MSK Serverless, a type of Amazon MSK cluster that makes it easier for developers to run Apache Kafka without having to manage capacity. MSK Serverless automatically provisions and scales compute and storage resources and offers throughput-based pricing, so you can use Apache Kafka on demand and pay for the data you stream and retain. View the full article
  13. Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 2.7.1 for new and existing clusters. Apache Kafka 2.7.1 includes several bug fixes. For a complete list of fixes, see the Apache Kafka release notes for 2.7.1. View the full article
  14. Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 2.8.0 for new and existing clusters. Apache Kafka 2.8.0 includes several bug fixes and new features that improve performance. Some of the key features include connection rate limiting to avoid problems with misconfigured clients (KIP-612) and topic identifiers which provides performance benefits (KIP-516). There is also an early access feature to replace zookeeper with a self-managed metadata quorum (KIP-500), however this is not recommended for use in production. For a complete list of improvements and bug fixes, see the Apache Kafka release notes for 2.8.0. View the full article
  15. Developing your website from scratch can be a daunting task. It’s time-consuming and expensive if you are planning to hire a developer. An easy way to get your blog or website off the ground The post How to Install Drupal with Apache on Debian and Ubuntu first appeared on Tecmint: Linux Howtos, Tutorials & Guides. View the full article
  16. Apache Flink Kinesis Consumer now supports Enhanced Fan Out (EFO) and the HTTP/2 data retrieval API for Amazon Kinesis Data Streams. EFO allows Amazon Kinesis Data Streams consumers to scale by offering each consumer a dedicated read throughput up to 2MB/second. The HTTP/2 data retrieval API reduces latency of data delivery from producers to consumers to 70 milliseconds or better. In combination, these two features allow you to build low latency Apache Flink applications that utilize dedicated throughput from Amazon Kinesis Data Streams. View the full article
  17. Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 2.7.0 for new and existing clusters. Apache Kafka 2.7.0 includes several bug fixes and new features that improve performance. Some key features include the ability to throttle create topic, create partition, and delete topic operations (KIP-599) and configurable TCP connection timeout (KIP-601). For a complete list of improvements and bug fixes, see the Apache Kafka release notes for 2.7.0. View the full article
  18. Amazon Managed Workflows is a new managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as “workflows”. View the full article
  19. Amazon Kinesis Data Analytics for Apache Flink now provides access to the Apache Flink Dashboard, giving you greater visibility into your applications and advanced monitoring capabilities. You can now view your Apache Flink application’s environment variables, over 120 metrics, logs, and the directed acyclic graph (DAG) of the Apache Flink application in a simple, contextualized user interface. View the full article
  20. You can now build and run streaming applications using Apache Flink version 1.11 in Amazon Kinesis Data Analytics for Apache Flink. Apache Flink v1.11 provides improvements to the Table and SQL API, which is a unified, relational API for stream and batch processing and acts as a superset of the SQL language specially designed for working with Apache Flink. Apache Flink v1.11 capabilities also include an improved memory model and RocksDB optimizations for increased application stability, and support for task manager stack traces in the Apache Flink Dashboard. View the full article
  21. Amazon Neptune now supports Apache TinkerPop 3.4.8 in the latest engine release, 1.0.4.0, improving the development experience for Gremlin users. View the full article
  22. Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 2.6.0 for new and existing clusters. Apache Kafka 2.6.0 includes several bug fixes and new features that improve performance. Some key features include native APIs to manage client quotas (KIP-546) and explicit rebalance triggering to enable advanced consumer usecases (KIP-568). For a complete list of improvements and bug fixes, see the Apache Kafka release notes for 2.6.0. View the full article
  23. Streaming extract, transform, and load (ETL) jobs in AWS Glue can now read data encoded in the Apache Avro format. Previously, streaming ETL jobs could read data in the JSON, CSV, Parquet, and XML formats. With the addition of Avro, streaming ETL jobs now support all the same formats as batch AWS Glue jobs. View the full article
  24. Streaming extract, transform, and load (ETL) jobs in AWS Glue can now ingest data from Apache Kafka clusters that you manage yourself. Previously, AWS Glue supported reading specifically from Amazon Managed Streaming for Apache Kafka (Amazon MSK). With this update, AWS Glue allows you to perform streaming ETL on data from Apache Kafka whether it is deployed on-premises or in the cloud. View the full article
  25. Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 2.6.2 for new and existing clusters. Apache Kafka 2.6.2 includes several bug fixes and security fixes. Version 2.6.2 will replace 2.6.1 as the default recommended version for new clusters created in Amazon MSK. For a complete list of fixes, see the Apache Kafka release notes for 2.6.2. View the full article
  • Forum Statistics

    39.7k
    Total Topics
    39.9k
    Total Posts
×
×
  • Create New...