Search the Community

Showing results for tags 'delta lake'.

Found 19 results

Sort By
- Date
- Relevancy

delta lake Delta Lake — Type widening

TDS posted a topic in Databases, Data Engineering & Data Science

What is type widening and why does it matter? Continue reading on Towards Data Science » View the full article
- April 29, 2024
san francisco Data & AI Summit

James posted an event in DevOps Events

Jun 10

Monday 10 June 2024 until Thursday 13 June 2024
About Experience everything that Summit has to offer. Attend all the parties, build your session schedule, enjoy the keynotes and then watch it all again on demand. Expo access to 150 + partners and 100’s of Databricks experts 500 + breakout sessions and keynotes 20 + Hands-on trainings Four days food and beverage Networking events and parties On-Demand session streaming after the event Join leading experts, researchers and open source contributors — from Databricks and across the data and AI community — who will speak at Data + AI Summit. Over 500 sessions covering everything from data warehousing, governance and the latest in generative AI. Join thousands of data leaders, engineers, scientists and architects to explore the convergence of data and AI. Explore the latest advances in Apache Spark™, Delta Lake, MLflow, PyTorch, dbt, Presto/Trino and much more. You’ll also get a first look at new products and features in the Databricks Data Intelligence Platform. Connect with thousands of data and AI community peers and grow your professional network in social meetups, on the Expo floor or at our event party. Register https://dataaisummit.databricks.com/flow/db/dais2024/landing/page/home Further Details https://www.databricks.com/dataaisummit/
- October 26, 2023
- - 1
- - summits
  - data & ai summit
  - (and 12 more)
    Tagged with:
    
    summits
    
    data & ai summit
    
    conferences
    
    databricks
    
    ai
    
    genai
    
    apache spark
    
    delta lake
    
    ml
    
    mlflow
    
    pytorch
    
    dbt
    
    presto
    
    databricks data intelligence platform
delta lake Announcing Delta Lake support for BigQuery

Google Cloud Platform posted a topic in Databases, Data Engineering & Data Science

Delta Lake is an open-source optimized storage layer that provides a foundation for tables in lake houses and brings reliability and performance improvements to existing data lakes. It sits on top of your data lake storage (like cloud object stores) and provides a performant and scalable metadata layer on top of data stored in the Parquet format. Organizations use BigQuery to manage and analyze all data types, structured and unstructured, with fine-grained access controls. In the past year, customer use of BigQuery to process multiformat, multicloud, and multimodal data using BigLake has grown over 60x. Support for open table formats gives you the flexibility to use existing open source and legacy tools while getting the benefits of an integrated data platform. This is enabled via BigLake — a storage engine that allows you to store data in open file formats on cloud object stores such as Google Cloud Storage, and run Google-Cloud-native and open-source query engines on it in a secure, governed, and performant manner. BigLake unifies data warehouses and lakes by providing an advanced, uniform data governance model. This week at Google Cloud Next '24, we announced that this support now extends to the Delta Lake format, enabling you to query Delta Lake tables stored in Cloud Storage or Amazon Web Services S3 directly from BigQuery, without having to export, copy, nor use manifest files to query the data. Why is this important? If you have existing dependencies on Delta Lake and prefer to continue utilizing Delta Lake, you can now leverage BigQuery native support. Google Cloud provides an integrated and price-performant experience for Delta Lake workloads, encompassing unified data management, centralized security, and robust governance. Many customers already harness the capabilities of Dataproc or Serverless Spark to manage Delta Lake tables on Cloud Storage. Now, BigQuery’s native Delta Lake support enables seamless delivery of data for downstream applications such as business intelligence, reporting, as well as integration with Vertex AI. This lets you do a number of things, including: Build a secure and governed lakehouse with BigLake’s fine-grained security model Securely exchange Delta Lake data using Analytics Hub Run data science workloads on Delta Lake using BigQuery ML and Vertex AI How to use Delta Lake with BigQuery Delta Lake tables follow the same table creation process as BigLake tables. Required roles To create a BigLake table, you need the following BigQuery identity and access management (IAM) permissions: bigquery.tables.create bigquery.connections.delegate Prerequisites Before you create a BigLake table, you need to have a dataset and a Cloud resource connection that can access Cloud Storage. Table creation using DDL Here is the DDL statement to create a Delta lake Table code_block <ListValue: [StructValue([('code', 'CREATE EXTERNAL TABLE `PROJECT_ID.DATASET.DELTALAKE_TABLE_NAME`\r\nWITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`\r\nOPTIONS (\r\n format ="DELTA_LAKE",\r\n uris=[\'DELTA_TABLE_GCS_BASE_PATH\']);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e9803cb44f0>)])]> Querying Delta Lake tables After creating a Delta Lake BigLake table, you can query it using GoogleSQL syntax, the same as you would a standard BigQuery table. For example: code_block <ListValue: [StructValue([('code', 'SELECT FIELD1, FIELD2 FROM `PROJECT_ID.DATASET.DELTALAKE_TABLE_NAME`'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e9803cb4550>)])]> You can also enforce fine-grained security at the table level, including row-level and column-level security. For Delta Lake tables based on Cloud Storage, you can also use dynamic data masking. Conclusion We believe that BigQuery’s support for Delta Lake is a major step forward for customers building lakehouses using Delta Lake. This integration will make it easier for you to get insights from your data and make data-driven decisions. We are excited to see how you use Delta Lake and BigQuery together to solve their business challenges. For more information on how to use Delta Lake with BigQuery, please refer to the documentation. Acknowledgments: Mahesh Bogadi, Garrett Casto, Yuri Volobuev, Justin Levandoski, Gaurav Saxena, Manoj Gunti, Sami Akbay, Nic Smith and the rest of the BigQuery Engineering team. View the full article
- April 11, 2024
- - bigquery
  - gcp
google cloud next 2023 All 161 things we announced at Google Cloud Next ‘23 – a recap

Google Cloud Platform posted a topic in Google Cloud Platform

This week we hosted Google Cloud Next in San Francisco! It was great to be back together in-person with the Google Cloud community and ecosystem. Highlights included the opening keynote, packed with breakthrough product announcements and customer stories, which you can also watch in just 13 minutes. Also check out the fun and inspirational developer keynote, complete with an original song and cool demos. We hosted 250+ educational breakout sessions, and thousands of developers were buzzing around our Innovators Hive. On top of that, we loved having partners integrated throughout Next, from the show floor to the sessions, to the evening parties throughout the week. APPROVED - NEXT2023_0829_104040-9470_ALIVECOVERAGE.jpgImages from Google Cloud Next ‘23NEXT2023_0829_130334-4818_ALIVECOVERAGE.jpgImages from Google Cloud Next ‘23 NEXT2023_0830_104327-5811_ALIVECOVERAGE.jpgImages from Google Cloud Next ‘23 NEXT2023_0830_133802-2592_ALIVECOVERAGE.jpgImages from Google Cloud Next ‘23 Generative AI was a big focus throughout, particularly the many new offerings across Vertex AI and Duet AI, highlighting all the new ways to cloud. The new way to cloudOpening video for Google Cloud Next ‘23 KeynoteToo much happened this week to share it all here, but we’ve pulled together a summary of all of the announcements so you can feel like you were there. Read on for the complete list of 161 (!) announcements. Data and AI CloudAI platforms and tools1. Duet AI in Google Cloud is now in preview with new capabilities, and general availability coming later this year. 2. Vertex AI Search and Conversation, formerly Enterprise Search on Generative AI App Builder and Conversational AI on Generative AI App Builder, are both generally available to make it fast and easy to build generative chatbots and custom search engines. 3. New multi-turn search in Vertex AI Search supports follow-up questions without starting the interaction over. In addition, we built out the Vertex AI platform: 4. We added new models to Vertex AI Model Garden including Meta’s Llama 2 and Code Llama and Technology Innovation Institute's Falcon LLM, and pre-announced Anthropic’s Claude 2. 5. The PaLM 2 foundation model now supports 38 languages, and 32,000-token context windows that make it possible to process long documents in prompts. 6. The Codey chat and code generation model offers up to a 25% quality improvement in major supported languages for code generation and code chat. 7. The Imagen image generation model features improved visual appeal, image editing, captioning, a new tuning feature to align images to guidelines with 10 or fewer samples, and visual questions and answering, as well as digital watermarking functionality powered by Google DeepMind SynthID. 8. Adapter tuning in Vertex AI is generally available for PaLM 2 for text. 9. Reinforcement Learning with Human Feedback (RLHF) is now in public preview. 10. New Vertex AI Extensions let models take actions and retrieve specific information in real time and act on behalf of users across Google and third-party applications like Datastax, MongoDB and Redis. 11. New Vertex AI data connectors help ingest data from enterprise and third-party applications like Salesforce, Confluence, and JIRA. 12. Grounding in Vertex AI roots generative outputs in your enterprise data, to increase confidence in your generative AI search and conversational applications. 13. Vertex AI now supports Ray, an open-source unified compute framework to scale AI and Python workloads. 14. We announced Colab Enterprise, a managed service in public preview that combines the ease-of-use of Google’s Colab notebooks with enterprise-level security and compliance support capabilities. 15. Next month, we’ll make Med-PaLM 2, our medically-tuned version of PaLM 2, available as a preview to more customersin the healthcare and life sciences industry. And to enhance MLOps for generative AI, we announced: 16. Automatic Metrics in Vertex AI lets you evaluate a model based on a defined task and “ground truth” dataset. 17. Automatic Side by Side in Vertex AI uses a large model to evaluate the output of multiple models being tested, helping to augment human evaluation at scale. 18. A new generation of Vertex AI Feature Store, now built on BigQuery, helps you avoid data duplication and preserve data access policies. Data analyticsWe announced a wealth of features across our data analytics offerings: 19. Duet AI is in preview across a variety of products in the data analytics portfolio such as Looker, BigQuery, and Dataplex. 20. BigQuery Studio, in preview, is a single interface for data engineering, analytics, and predictive analysis to simplify end-to-end data workflows. 21. We announced enhanced support for open source formats like Hudi and Delta Lake within BigLake, and added performance acceleration for Apache Iceberg. 22. BigLake can now be your single lakehouse, with cross-cloud materialized views and cross-cloud joins in BigQuery Omni. 23. Spark integration on Google Distributed Cloud extends the power of fast analytical query processings to on-premises to support data residency requirements. 24. New governance capabilities in Dataplex for data lineage, quality, and metadata management help users understand what data to analyze, and train ML models on trusted data sources to help improve accuracy. 25. BigQuery data clean rooms can help you understand your Google and YouTube campaign performance. 26. Now you can access Vertex AI foundation models, including PaLM 2, directly from BigQuery. 27. New model inference in BigQuery lets you run model inferences across formats like TensorFlow, ONNX, and XGBoost, and new capabilities for real-time inference can identify patterns and automatically generate alerts. 28. For model tuning, we added vector and semantic search in BigQuery. 29. You can automatically synchronize vector embeddings in BigQuery with Vertex AI Feature Store for model grounding. 30. Now you can access thousands of datasets from hundreds of providers including Acxiom, Bloomberg, Equifax, Nielsen, and Zoominfo directly in BigQuery. DatabasesTransactional and operational databases are the lifeblood of the organization. 31. AlloyDB AI is available in preview via downloadable AlloyDB Omni to help you build enterprise gen AI applications everywhere. 32. AlloyDB Omni, the downloadable edition that runs on Google Cloud, AWS, Azure, on premises, and on your laptop, is moving from technology preview to public preview. 33. Duet AI in Database Migration Service provides AI-assisted code conversion to automate the last mile conversion of Oracle database code to PostgreSQL. Sign up for the preview. 34. Duet AI in Cloud Spanner is in preview to help with generating code to structure, modify, or query your data using natural language. 35. Support for Oracle in Database Migration Service with Cloud SQL for PostgreSQL as the target is now GA. 36. Cloud Spanner Data Boost, now GA, lets you to analyze your Spanner data via services such as BigQuery, Spark on Dataproc, or Dataflow with virtually no impact on your transactional workloads. 37. Support for auto-generated keys in Cloud Spanner allows schema authors to push down critical identity logic into the database. 38. A new BigQuery Export to Bigtable Reverse ETL feature in preview lets you serve analytical insights from your applications without having to touch any ETL tools. 39. The fully managed Memorystore for Redis Cluster is available in preview. This is an easy-to-use, open-source compatible Redis Cluster service that provides up to 60 times more throughput than Memorystore for Redis, with microseconds latencies. 40. The new Google Cloud Ready for Cloud SQL program recognizes partner solutions that have met integration requirements with Cloud SQL. 41. The Bigtable change streams feature allows you to capture data changes to a Bigtable table as the changes happen, letting you stream them for processing or analysis, and is now GA. 42. Cloud Bigtable request priorities, in private preview, let you execute large workloads that are not time-sensitive e.g. analytical queries, as low priority jobs on a Bigtable cluster, thus minimizing the impact of batch processing on serving workloads. 43. You can now create a copy of a Cloud Bigtable backup and store it in any project or region where you have a Bigtable instance, and retain your backups for up to 90 days. Business IntelligenceWe extended our Looker business intelligence ecosystem and showed that the future of Looker is powered by AI. 44. Duet AI in Looker is now in preview. 45. The Looker semantic layer is now open to Tableau in preview, and to Microsoft Power BI in general availability. 46. Looker Studio users now have native access to the Alteryx Designer Cloud for data preparation, and enhanced cloud connectivity, starting with Microsoft Excel and CSV files from storage platforms including Sharepoint and OneDrive. 47. An integration with Looker partner Sisu Data will be generally available later this quarter, helping customers determine the root cause of changes, spot data outliers, and identify next steps for analysis. Dev CloudGoogle Cloud’s Dev Cloud covers all our infrastructure — compute, network, storage, regions, etc. — as well as platforms and developer tools. ComputeWe talked about chips that are optimized for AI workloads: 48. A3 VMs, based on NVIDIA H100 GPUs and delivered as a GPU supercomputer, will be generally available next month. 49. The new Google Cloud TPU v5e, in preview, has up to 2x higher training performance per dollar and up to 2.5x inference performance per dollar for LLMs and generative AI models compared to Cloud TPU v4. 50. New Multislice technology in preview lets you scale AI models beyond the boundaries of physical TPU pods, with tens of thousands of Cloud TPU v5e or TPU v4 chips. 51. Support for Cloud TPUs in GKE is now available for Cloud TPU v5e and Cloud TPU v4. 52. Support for AI inference on Cloud TPUs is in preview. We enhanced our general-purpose Compute Engine families with new models and features: 53. The Arm-based C3A, powered by AmpereOne processors will be in preview next month. 54. A preview of AMD-based C3D VMs is coming next month, and will offer the largest instance sizes in our general-purpose portfolio, with up to 360 vCPUs and 2.8TB of DDR5 memory. 55. We unveiled Titanium, a system of purpose-built, custom silicon and multiple tiers of offloads that enables offerings like Hyperdisk and our 3rd generation C3 VMs. 56. C3 VMs support for Hyperdisk Extreme is now in preview and supports 500K IOPS. 57. The SSD-based Hyperdisk Balanced is now in preview, with up to 2x the performance of previous generation Persistent Disk SSD. 58. Hyperdisk Storage Pools is now in preview, allowing customers to provision capacity and performance in aggregate Pools, and then thinly provision Hyperdisk volumes from those pools. 59. Our VMs have new uptime SLAs: a 99.95% uptime SLA for memory-optimized VMs, and 99.9% (up from 99.5%) for all other VM families. 60. Future reservations, now in preview, is a new Compute Engine feature that allows you to reserve compute capacity for a future date. 61. C3 is now certified for SAP. 62. Workload Manager has new capabilities for securing and deploying SAP workloads, in preview. 63. In preview in October, we’ll offer a new VMware Engine node with 2TB RAM and options from 64 - 128 vCPUs and up to 50TB storage for Google Cloud VMware Engine, as well as three new storage options: NetApp Volumes, Filestore, and Storage-only (VSAN) nodes. Networking 64. We announced Cross-Cloud Network, an open and programmable global cloud networking platform. 65. Private Service Connect now supports 20+ different Google and partner managed services. 66. The Cloud Firewall Plus tier is in preview, a cloud-first next-generation firewall, or NGFW. 67. The new Network Service Integration Manager simplifies the setup and operation of partner NGFWs from Checkpoint, Cisco, Fortinet, and Palo Alto Networks. 68. VPC Spokes support in Network Connectivity Center in preview lets you scale VPC connectivity, providing reachability between a large number of VPC spokes. 69. The internal Application Load Balancer now supports global access, which allows private clients from any Google Cloud region to access internal load balancers residing in any other Google Cloud region, and global backends, which allow internal Application Load Balancers to health-check and send traffic to globally distributed backend services. 70. New Service Extensions callouts for Cloud Load Balancers let you customize services such as specialized monitoring, logging, traffic steering, or authentication. 71. A new automation solution toolkit for the Google Cloud Load Balancers global frontend lets you integrate and automate products including Cloud Armor, Cloud Load Balancing, and Cloud CDN into popular CI/CD platforms. 72. Cloud Application Load Balancers now support cross-project service referencing. 73. New mTLS client-side authentication is available for global external Application Load Balancers. 74. Auto-deployment for Cloud Armor Adaptive Protection is now generally available. 75. Palo Alto Networks is integrating its Prisma Access natively in Google Cloud. 76. Broadcom is integrating its Secure Web Gateway natively in Google Cloud. 77. The Titanium Network Adapter, part of our Titanium system of offloads and available in A3, C3, and H3 VMs, delivers up to twice the throughput and three times the packet processing speed of prior VM generations. Hybrid and multi-cloudWe added capabilities and features to our Google Distributed Cloud family, GDC Edge and GDC Hosted: 78. GDC Hosted now offers pre-trained models for speech, translation, optical character recognition (OCR), and Workbench. 79. Vertex Prediction and Vertex Pipelines are coming to GDC Hosted in preview in Q2 2024, as is Document Translation API service. 80. Database Service for GDC Hosted will now support AlloyDB Omni as a new managed database engine in preview. 81. GDC Edge will support Dataproc in preview in Q4 2023. 82. The new GDC hardware stack features the 4th Gen Intel® Xeon® Scalable Processors and high-performance network fabrics with up to 400 Gbps throughput. 83. We introduced new hardware configurations for GDC Edge that are ruggedized and optimized for retail stores and restaurants. 84. GDC Hosted now offers support for new guest operating systems, customer-managed VM images, a package repository for VM runtime configuration, simplified networking, backup and restore tooling, a CLI and streamlined APIs. 85. The new GKE Enterprise will be included in all Google Distributed Cloud deployments at no additional cost. 86. ElasticSearch, MongoDB Enterprise Advanced, and SAP on Google Distributed Cloud are available through Google Cloud Marketplace. 87. Boundary proxy on GDC Edge in preview provides visibility and auditability by inspecting and logging all management traffic between Google Cloud and GDC Edge (in preview). 88. New Bastion host on GDC Edge, in preview, enables you to easily view and control Google Cloud's access to GDC Edge for troubleshooting purposes, supporting you with operational sovereignty and compliance requirements. 89. Now you can automatically create and provide the resource mapping for all your managed keys with key tracking on GDC Hosted, in preview. 90. BYO-certificate on GDC Hosted enables you to upload certificates issued by a third-party certificate authority, in preview. 91. With new survivability mode on GDC Edge, you can continue on-prem operations on GDC Edge even if you lose connectivity to Google Cloud, now in preview. Platforms 92. GKE Enterprise, a new premium edition of our flagship container orchestration platform, lets platform teams can manage large Kubernetes deployments across multiple clusters and multiple teams around the world, all from a single management console. 93. GKE now supports Cloud TPU v5e, A3 VMs with NVIDIA H100 GPUs, and Google Cloud Storage FUSE on GKE (GA). 94. Duet AI in GKE (preview) provides generative AI assistance specifically trained on GKE documentation to help platform teams cut down on time it takes to learn and operate Kubernetes. Developer experienceWe took our commitment to an exceptional developer experience up a notch: 95. Jump Start Solutions application and infrastructure solutions, are now GA. 96. Application Integration — a no-code integration platform as a service (iPaaS) designed to empower you to weave together your applications — is now GA. 97. We partnered with GitLab to offer a secure DevOps solution with integrated source management, artifact management, CI/CD, and enhanced security features. 98. Duet AI is available console-wide, including in GKE, Cloud Run, and Cloud operations suite. It will also be available in Apigee API Management and Application Integration in preview for trusted testers here. Migration and Management99. We’ve revamped (re-RaMPed?) our Rapid Migration Program, including the new Cloud Capability Assessment (CCA) for RaMP online assessment and a detailed new migration-planning dashboard. 100. Migration Center, a unified migration service within Google Cloud console, is now GA, and includes the new SAP Cost Estimator. 101. Google Cloud VMware Engine is now available in 18 regions, most recently in Turin, Italy (europe-wst12), Santiago (southamerica-west1), and Delhi (asia-south2). 102. The new flexible ve2 node platform powered by 3rd Generation Intel® Xeon® Scalable Processors (formerly named Ice Lake) and next-generation VMware Engine networking are now available for Google Cloud VMware Engine. 103. Event-driven transfer in Storage Transfer Service is now GA. 104. Storage Transfer Service now offers support for transferring data from on-premises Hadoop Distributed File Systems (HDFS) sources to Cloud Storage, currently available to trusted testers. 105. SAP Security Validation, now available in public preview for Google Cloud Workload Manager, provides SAP-specific infrastructure and application-level security checks. 106. Workload Manager’s Guided Deployment Automation service is now available in public preview and generates custom automation code to modernize to S/4HANA. Geo and sustainability107. The Google Maps Platform team introduced a suite of Environment APIs for solar, air quality, and pollen. Security Cloud108. Duet AI in Mandiant Threat Intelligence in preview helps surface the most prevalent tactics, techniques, and procedures (TTPs) used by threat actors against organizations by summarizing our frontline threat intelligence. 109. Duet AI in Chronicle Security Operations in preview aids in threat detection, investigation, and response for cyber defenders by simplifying search, complex data analysis, and threat detection engineering. 110. Duet AI in Security Command Center in preview offers near-instant analysis of security findings and possible attack paths. 111. Mandiant Hunt for Chronicle, now in preview, provides continual threat hunting by Mandiant experts on Chronicle data. 112. Agentless vulnerability scanning, powered by Tenable, has been integrated into Security Command Center to detect operating system, software, and network vulnerabilities on Compute Engine virtual machines, now in preview. 113. Security Command Center now allows organizations to design their own customized posture findings (GA) and threat detectors (in preview). 114. Confidential Computing running on 4th Gen Intel Xeon Scalable CPUs with TDX technology is now in private preview. 115. We expanded the coverage footprint of our Sensitive Data Protection offerings with enhanced integration for Dataplex and Dialogflow (now GA), and in preview for Cloud SQL. 116. Assured Workloads Japan Regions is now in preview, offering customers controlled environments that enforce data residency in our Japanese regions, options for local control of encryption keys, and administrative access transparency. 117. The official Google Cloud Certified Professional Cloud Security Engineer Exam Guide is now available. Collaboration Cloud: Google Workspace118. Duet AI for Google Workspace is generally available, and you can get started with a no-cost trial. 119. Duet AI for Google Workspace can create a whole new presentation in Slides, complete with text, charts, and images, based on your relevant content in Drive and Gmail. 120. Duet AI in Google Meet helps you look and sound your best with new AI-powered enhancements, including studio look, studio lighting, and studio sound, as well as dynamic tiles and automatic face detection so remote attendees can see everyone in a meeting room, with each in-person attendee getting their own video tile with their name. 121. Duet AI in Google Meet can capture notes, action items, and video snippets in real time with the “take notes for me” feature, and send a recap to attendees after the meeting. It can even catch latecomers up with “summary so far,” or attend a meeting in your place with “attend for me.” 122. We’re enhancing smart reply in Gmail with Duet AI, allowing you to draft longer personalized replies with a single tap. 123. The enhanced Google Chat experience is available with powerful new features, including Duet AI in Google Chat. To help you keep up with messaging, we’re bringing direct messages and spaces together into a unified conversation list, with a chronological home view, @-mentions, and starred conversations, with intelligent prioritization of your messages, based on your communication patterns, coming early next year. 124. Google Chat will soon support up to 500,000 participants in a single space, to help build thriving communities, even in the largest organizations. 125. Huddles in Chat are a new way for teams to communicate in real time, using quick-to-join audio and video conversations, powered by Google Meet. With huddles, instead of jumping out of the conversation into a meeting, the meeting integrates directly and smoothly into the Chat experience. 126. More third-party and Google Workspace apps are supported in Chat, including an updated Google Drive app that lets you respond to comments and sharing requests, as well as new support for Workday, Loom, Zoho, and LumApps. Mio will provide support for message interoperability with other major platforms. 127. In the lead up to Next, we announced new capabilities, including those powered by AI, to help prevent cyber threats, provide safer work with built-in zero trust controls, and better support for our customers’ digital sovereignty and compliance needs. Customers 128. We announced the winners of our Google Cloud Customer Awards. 129. AdoreMe, a direct-to-consumer brand, uses generative AI in Google Workspace to improve customer experiences, create production-worthy marketing materials, and speed up innovation. 130. Bayer Pharmaceuticals is working with Google Cloud to use gen AI, including testing Med-PaLM 2. 131. Canoo, a leading advanced mobility company, will deploy a range of Google Cloud’s AI, data management and security technologies to maximize the value of the data coming from its electric vehicles. 132. CAPCOM is harnessing Google Cloud’s infrastructure to support its global player base for AAA game launches like Street Fighter 6. 133. Dun & Bradstreet is collaborating with Google Cloud on gen AI to drive innovation across multiple applications. 134. The Government of El Salvador announced a multi-year deal with Google Cloud to digitally transform the country with Google Distributed Cloud (GDC) Edge, as well as to establish an office presence and Center of Excellence (CoE) in the country. 135. The Estée Lauder Companies Inc. (ELC) and Google Cloud announced an expansion of their strategic partnership to pioneer new uses of gen AI for online consumers. 136. Fox Sports will use Google Cloud’s gen AI capabilities to quickly search footage from more than 1.9 million videos and create custom video content in near-real-time to delight and engage its audiences. 137. GE Appliances (GEA) is expanding its partnership with Google Cloud to enhance and personalize consumer experiences, like recipe creation and appliance maintenance, with gen AI. 138. General Motors and Google Cloud will share new details on how the two companies are collaborating to bring conversational AI technology into millions of GM vehicles. 139. Ginkgo Bioworks, which is building the leading platform for cell programming and biosecurity, and Google Cloud announced a five-year strategic cloud and AI partnership, intended to enable Ginkgo to develop and deploy AI tools for biology and biosecurity. 140. Hackensack Meridian Health will leverage Google Cloud’s gen AI technologies to develop and deploy solutions that will help clinical staff to focus more on care, improve overall decision-making, and better personalize the patient experience. 141. HCA Healthcare and Google Cloud will improve workflows on time-consuming tasks such as clinical documentation so physicians and nurses can focus more on patient care with gen AI. 142. Huma will use Google Cloud’s gen AI to enhance its regulated disease management platform. 143. Infinitus is using Google Cloud’s gen AI to streamline provider-payor interactions, simplify resource-intensive operations, and improve response times. 144. Meditech is working with Google Cloud to empower employees, and ultimately better serve its healthcare provider customers with gen AI, including Med-PaLM2. 145. MSCI is expanding its partnership with Google Cloud to accelerate the development of gen AI solutions for the financial services industry. 146. Runway has chosen Google Cloud to build, accelerate, and manage model deployment and bring gen AI to more creatives. 147. Six Flags will use Google Cloud’s gen AI to create an in-park app that helps improve visitor experience and park operations in real time. Startups and Public Sector 148. Eleven more leading generative AI startups have chosen Google Cloud to develop gen AI and bring their products to market. 149. The new Google for Startups Startup School: AI is a six-week in-depth online AI training course that will launch in January 2024. 150. A new VC AMA ebook, Startup Advice from VCs: 10 Lessons from the Trenches, for startups contains valuable takeaways from the VC AMA Series, and is available for download. 151. Nine startups in the Startup Lounge announced new products based on Google Cloud in the Startup Lounge. 152. We announced the winners of the 2023 Google Cloud Customer Awards in Government and Education Congratulations to: Federal Emergency Management Agency (FEMA), United States Postal Service (USPS), United States Air Force Research Lab (AFRL), State of Hawaii DHS, Salk Institute for Biological Studies, Hawaii Department of Transportation (HDOT), New York State Department of Environmental Conservation (NY DEC) and Minnesota Department of Public Safety (MN DPS). Partners 153. We recognized our annual Partners of the Year. 154. A new Google Cloud Generative AI Partner Initiative provides partners with AI journey maps and learning assets, and separate “Services” and “Build” tracks for partners working with gen AI. 155. DocuSign is working with Google Cloud to pilot how Vertex AI could be used to help generate smart contract assistants that can summarize, explain and answer what’s in complex contracts and other documents. 156. SAP is working with us to develop new solutions utilizing Google Cloud foundation models via Vertex AI and SAP data, beginning with solutions to enhance customers’ sustainability initiatives and improve safety and manufacturing processes for the automotive industry. 157. Workday’s applications for Finance and HR are now live on Google Cloud, and they are working with us to develop new gen AI capabilities within the flow of Workday. 158. We forged new or expanded partnerships with data providers such as Acxiom, Bloomberg, CoreLogic, Dun & Bradstreet, Equifax, NIQ and TransUnion to provide access to their datasets from BigQuery. 159. Technology and solutions in the new Data and AI Cloud for Supply Chain helps customers build data-driven supply chains. 160. The new Industry Value Networks (IVN) initiative combines expertise and offerings from systems integrators (SIs), independent software vendors (ISVs) and content partners to create comprehensive, differentiated, repeatable, and high-value solutions that accelerate time-to-value and reduce risk for customers. Next ‘24!Last but not least, we’re excited to do it all over again, but even bigger next year so please save the date: 161. Google Cloud Next ‘24 will take place April 9-11 in Las Vegas. That was a lot! A huge thank you to our Google Cloud team, customers, and partners who came together to make this all happen. Together, we’re building the new way to cloud.
- September 1, 2023
- - delta lake
  - google cloud next
  - (and 1 more)
    Tagged with:
    
    delta lake
    
    google cloud next
    
    gcp
Announcing BigQuery Studio — a collaborative analytics workspace to accelerate data-to-AI workflows

Google Cloud Platform posted a topic in Google Cloud Platform

Organizations that are effective at using data and AI are more profitable than their competitors and see improved performance across a variety of business metrics, according to recent research: Already, 81% of organizations have increased their data and analytics investments over the previous two years. However, many organizations are still struggling to extract the full business value of data, with over 40% citing disparate analytics tools and data sources, and poor data quality as their biggest challenges. Google Cloud is in a unique position to offer a unified, intelligent, open, and secure data and AI cloud for organizations. Thousands of customers across industries worldwide use Dataproc, Dataflow, BigQuery, BigLake and Vertex AI for data-to-AI workflows. Today, we are excited to announce BigQuery Studio — a unified, collaborative workspace for Google Cloud’s data analytics suite that helps accelerate data to AI workflows from data ingestion and preparation to analysis, exploration and visualization — all the way to ML training and inference. It allows data practitioners to: Use SQL, Python, Spark or natural language directly within BigQuery and leverage those code assets easily across Vertex AI and other products for specialized workflows Extend software development best practices such as CI/CD, version history and source control to data assets, enabling better collaboration Uniformly enforce security policies and gain governance insights through data lineage, profiling and quality, right inside BigQuery Single interface for all data teamsDisparate tools create inconsistent experiences for analytics professionals, requiring them to use multiple connectors for data ingestion, switch between coding languages, and transfer data assets between systems. This significantly impacts time-to-value of organizations’ data and AI investments. BigQuery Studio addresses these challenges by bringing an end-to-end analytics experience in a single, purpose-built platform. It provides a unified workspace including a SQL and a notebook interface (powered by Colab Enterprise which is currently in preview), allowing data engineers, data analysts and data scientists to perform end-to-end tasks including data ingestion, pipeline creation, and predictive analytics, all using the coding language of their choice. For example, analytics users like data scientists can now use Python in a familiar Colab notebook environment for data analysis and exploration at petabyte-scale right inside BigQuery. BigQuery Studio’s notebook environment supports browsing of datasets and schema, autocompletion of datasets and columns, and querying and transformation of data. Furthermore, the same Colab Enterprise notebook can be accessed in Vertex AI for ML workflows such as model training and customization, deployment, and MLOps. Notebook experience in BigQuery StudioAdditionally, by leveraging BigLake with built-in support for Apache Parquet, Delta Lake and Apache Iceberg, BigQuery Studio provides a single pane of glass to work with structured, semi-structured and unstructured data of all formats across cloud environments such as Google Cloud, AWS, and Azure. Shopify, a leading commerce platform, has been exploring how BigQuery Studio complements its existing BigQuery environment. "Shopify has invested in employing a team with a diverse array of skill sets to remain ahead of trends for data science and engineering. In early testing with BigQuery Studio, we liked Google's ability to connect different tools for different users within a simplified experience. We see this as an opportunity to reduce friction across our team without sacrificing scale we expect from BigQuery" - Zac Roberts, Data Engineering Manager, Shopify Maximize productivity and collaborationBigQuery Studio improves collaboration among data practitioners by extending software development best practices such as CI/CD, version history and source control to analytics assets including SQL scripts, Python scripts, notebooks and SQL pipelines. Additionally, users will be able to securely connect with their favorite external code repositories, so that their code can never be out of sync. Version control for data assetsIn addition to enabling human collaborations, BigQuery Studio also provides an AI-powered collaborator for contextual chat and code assistance. Duet AI in BigQuery can understand the context of each user and their data, and uses it to auto-suggest functions, and code blocks for SQL and Python. Through the new chat interface, data practitioners can use natural language to get personalized real-time guidance on performing specific tasks, reducing the need for trial and error and searching documentation for a needle in a haystack. Code composition, code completion, and chat interface in Duet AI in BigQueryUnified security and governanceBigQuery Studio lets organizations derive trusted insights from trusted data by helping users understand data, identify quality issues, and diagnose problems. Data practitioners can track data lineage, profile data, and enforce data-quality constraints to help ensure that data is high-quality, accurate, and reliable. Later this year, BigQuery Studio will surface personalized metadata insights like summaries of datasets or recommendations for how to derive deeper analysis. Additionally, BigQuery Studio allows admins to uniformly enforce security policies for data assets by reducing the need to copy, move, or share data outside of BigQuery for advanced workflows. With unified credential management across BigQuery and Vertex AI, policies are enforced for fine-grained security without needing to manage additional external connections or service accounts. For example, using simple SQL in BigQuery, data analysts can now use Vertex AI’s foundational models for images, videos, text, and language translations for tasks like sentiment analysis and entity detection over BigQuery data without requiring to share data with third party services. Data quality, lineage, and profilingWhat BigQuery Studio customers are saying"Our data & analytics team is ceaselessly committed to staying ahead of the curve of data engineering and data science. During our initial trials with BigQuery Studio, we were particularly impressed by Google's prowess in integrating diverse tools into a singular, streamlined experience. This fusion not only diminishes friction but also significantly amplifies our team's efficiency, a testament to the power of BigQuery." - Vinícius dos Santos Mello, Staff Data Engineer, Hurb "As an early adopter of BigQuery Studio, we were impressed with its ability to not only minimize friction but also ensure robust data protection and centralization. The added support for Pandas DataFrames will further streamline our processes, saving valuable time for our team to collaborate and stay ahead of the curve." - Sr. Director Analytics Engineering, Aritzia “Duet AI in BigQuery has helped our data team at L’Oréal accelerate our transformation by making it easier for us to explore, understand, and use our data. With Duet AI, we can quickly query our data to get the insights we need to make better decisions for our business. We are excited to continue working with Duet AI to further our transformation and achieve our business goals.” - Antoine Castex, Data Platform Architect, L’Oréal Getting startedBigQuery Studio is now available for customers in preview. Check out the documentation to learn more and sign up to get started today.
- August 30, 2023
- - delta lake
Reimagine data analytics for the era of AI

Google Cloud Platform posted a topic in Google Cloud Platform

The emergence of generative AI is poised to become one of the most significant technological shifts in modern memory, opening up endless transformative possibilities for enterprises. Our customers are already seeing incredible benefits with AI. Organizations like TIME are exploring new possibilities with gen AI to engage with customers and build a more robust community. Wendy’s is innovating fast-food order management, Orange is exploring next-generation contact centers, and Priceline uses BigQuery’s AI capabilities with its own proprietary algorithms to offer customers personalized services and product recommendations. The new functionality we are announcing today unlocks even more ways for customers to innovate. Data is at the center of gen AI, which is why we are bringing new innovations to our Data and AI Cloud to help companies activate their data with AI. First, we are helping to interconnect your data and workloads by announcing BigQuery Studio, a single interface for data engineering, analytics, and predictive analysis to simplify end-to-end data workflows. Additional features provide data teams with a simplified data foundation and include enhanced support for unstructured data, cross-cloud analytics, secure data sharing, and governance. Second, we are bringing AI to your data in BigQuery with integration to Vertex AI foundation models. New innovations for real-time model inference and vector embeddings help allow you to securely run generative AI at scale on your business data. Lastly, we are boosting productivity of your data teams with a preview of Duet AI in Google Cloud to reimagine data work with products such as Looker, BigQuery, and Dataplex. These innovations will help organizations harness the potential of data and AI to realize business value — from personalizing customer experiences, improving supply chain efficiency, and helping reduce operating costs, to helping drive incremental revenue. Interconnect end-to-end workflows and data in BigQueryData teams work with different tools for managing data warehouses, data lakes, governance, and machine learning, which can slow productivity. To interconnect the ways teams work with data we are announcing BigQuery Studio. Now available in preview, BigQuery Studio provides customers with a single interface for data analytics in Google Cloud. Now you can bring your data engineering, analytics, and predictive analysis together, simplifying how data teams work across end-to-end workflows without having to switch between tools. "Shopify has invested in employing a team with a diverse array of skill sets to remain ahead of trends for data science and engineering. In early testing with BigQuery Studio, we liked Google's ability to connect different tools for different users within a simplified experience. We see this as an opportunity to reduce friction across our team without sacrificing the scale we expect from BigQuery." - Zac Roberts, Data Engineering Manager, Shopify BigQuery Studio also allows data teams to edit SQL, Python, Spark and other languages to easily run analytics at petabyte scale without additional infrastructure management overhead. Notebooks are the preferred environment for writing and editing Python, so we’ve integrated BigQuery Studio with Colab Enterprise, a new offering that brings Google Cloud enterprise security and compliance support to the popular Colab data science notebook developed by Google Research. BigQuery Studio provides a single interface for data engineering, analytics, and predictive analysisTo provide flexibility of choice in your data science notebook, we are extending partnerships with Hex, Deepnote, and Jupyter. In addition, DataFrame in BigQuery provides the data structure for teams to access their favorite notebook with large datasets that would typically surpass memory limits. Disconnected and unstructured data present additional challenges for data teams. Large amounts of valuable data is contained in videos, documents, log files, and audio recordings which can be used with generative AI. Today, we are adding new capabilities to unify your structured business data with unstructured data, as well as help to provide secure access, without the need to move it. These innovations include: Enhanced support for open source formats like Hudi and Delta Lake within BigLake, which unifies data lakes and warehouses to breakdown data silos. Customer use of BigLake to combine data lake and warehouse workloads across clouds has grown 27x to hundreds of petabytes in the past 6 months. Innovations also include performance acceleration for Apache Iceberg which provides continuous data optimization for large-scale ingestion. Enhancements to analyze and train your data without moving it are available with cross-cloud materialized views and cross-cloud joins in BigQuery Omni. Now companies can bring together data across multiple clouds in a single lakehouse. In addition, Spark integration on Google Distributed Cloud extends the power of fast analytical query processings to on-premises to support data residency requirements. New governance capabilities in Dataplex for data lineage, quality, and metadata management help users understand what data to analyze, and train ML models on trusted data sources to help improve accuracy. New privacy-centric connections, including BigQuery data clean rooms and Ads Data Hub for Marketers, which can help you understand your Google and YouTube campaign performance. Bring AI to your data to manage, create, and scale generative AI The importance of AI and data continues to be a major focus for our customers. Many data teams are using their analytical data warehouses and lakes to build ML models using BigQuery ML as their starting point. In fact, customer use of BigQuery ML in the past two years has seen over 250% query growth. This year, customers have run hundreds of millions of prediction and training queries in BigQuery ML. To get improved insights from your data with generative AI, we are announcing access for Vertex AI foundation models, including PaLM 2, directly from BigQuery. This can remove complexity and allow data teams to scale simple SQL statements in secure ways against large language models, opening up endless possibilities for insights. Using new model inference in BigQuery, customers can run model inferences across formats like TensorFlow, ONNX, and XGBoost. In addition, new capabilities for real-time inference can identify patterns and automatically generate alerts. Faraday, a leading customer prediction platform, previously had to build data pipelines and join multiple datasets. Now, not only can they simplify sentiment analysis but they can also take the customer sentiment, join it with additional customer first-party data, and feed it back into the LLMs to generate hyper personalized content — all within BigQuery. BigQuery integration with Vertex AI foundation models helps data teams to manage and securely scale generative AI.For model tuning, we are adding vector and semantic search in BigQuery. Vector and text embeddings allow for efficient search and retrieval of unstructured data, such as text or images. This capability powers gen AI applications to more efficiently retrieve unstructured data and provide context to LLMs. In addition, customers can automatically synchronize vector embeddings in BigQuery with Vertex AI Feature Store for model grounding. Access to trusted data is critical to building and training new AI models — particularly specialized models for industries like financial services, retail, and manufacturing. We offer data sets in BigQuery from leading data providers including CoreLogic, Dun & Bradstreet, and TransUnion. Now, customers can use thousands of datasets from hundreds of providers including Acxiom, Bloomberg, Equifax, Nielsen, and Zoominfo. The availability of these datasets helps Google Cloud to be the best place for enterprises to build and train new AI models. Boost data team productivity with Duet AITo help data teams of all skill levels solve their everyday work challenges and boost productivity, we’re announcing our always-on generative AI-powered collaborator, Duet AI, is in preview across a variety of products in our portfolio such as Looker, BigQuery, and Dataplex. Powered by Google's state-of-the-art foundation models, these innovations can help data teams clean data, prepare it for analysis, answer questions, and predict trends. To provide insights to non-technical users via natural language, we are announcing Duet AI in Looker, which enables fast and simple conversational queries that empower you to get answers and refine results into visuals and reports. In addition, Duet AI provides automatic presentation creation with intelligent summaries, formula and visual assist to quickly create calculations, and the ability to rapidly create code using LookML with an understanding of intent. By bringing generative AI and natural language search together, our vision for Duet AI in Looker is to allow you to “talk” with your business data, much in the same way you “ask Google” a question. It’s like having a brilliant data analyst available for every employee. Duet AI in Looker provides conversational queries for non-technical users to gain rapid insight in visuals and reportsDuet AI in BigQuery is a collaborative experience integrated directly into the BigQuery interface. It provides contextual assistance for writing SQL queries and Python code, which can allow data teams to focus more on analyses and outcomes. It can auto-suggest code in real-time and generate full functions and code blocks, as well as recommend fixes. And, for improved access to trusted data, Duet AI in Dataplex provides metadata search using natural language for a view of your ML assets and datasets. Sign up now to try Duet AI. “Duet AI in BigQuery provides contextual awareness and extends our investment in Google Cloud's integrated data platform. We see this as an architectural advantage, eliminating the need to train, host, and manage custom models.” - VP of Data Engineering, Aritzia Simplicity and scale built for the era of AI Google has changed the way the world accesses information. Now, with Google’s Data and AI Cloud, we can bring new levels of simplicity, scale, security, and intelligence to your business data. To learn more about product innovations, hear customer stories, and gain hands-on knowledge from our developer experts, join our data analytics spotlights and breakout sessions at Google Cloud Next, or watch them on-demand.
- August 29, 2023
- - delta lake
Microsoft and Accenture partner to tackle methane emissions with AI technology

Microsoft Azure posted a topic in Artificial Intelligence

This post was co-authored by Dan Russ, Associate Director, and Sacha Abinader, Managing Director from Accenture. The year 2022 was a notable one in the history of our climate—it stood as the fifth warmest year ever recorded1. An increase in extreme weather conditions, from devastating droughts and wildfires to relentless floods and heat waves, made their presence felt more than ever before—and 2023 seems poised to shatter still more records. These unnerving circumstances demonstrate the ever-growing impact of climate change that we’ve come to experience as the planet continues to warm. Microsoft’s sustainability journey At Microsoft, our approach to mitigating the climate crisis is rooted in both addressing the sustainability of our own operations and in empowering our customers and partners in their journey to net-zero emissions. In 2020, Microsoft set out with a robust commitment: to be a carbon-negative, water positive, and zero-waste company, while protecting ecosystems, all by the year 2030. Three years later, Microsoft remains steadfast in its resolve. As part of these efforts, Microsoft has launched Microsoft Cloud for Sustainability, a comprehensive suite of enterprise-grade sustainability management tools aimed at supporting businesses in their transition to net-zero. Moreover, our contribution to several global sustainability initiatives has the goal of benefiting every individual and organization on this planet. Microsoft has accelerated the availability of innovative climate technologies through our Climate Innovation Fund and is working hard to strengthen our climate policy agenda. Microsoft’s focus on sustainability-related efforts forms the backdrop for the topic tackled in this blog post: our partnership with Accenture on the application of AI technologies toward solving the challenging problem of methane emissions detection, quantification, and remediation in the energy industry. “We are excited to partner with Accenture to deliver methane emissions management capabilities. This combines Accenture’s deep domain knowledge together with Microsoft’s cloud platform and expertise in building AI solutions for industry problems. The result is a solution that solves real business problems and that also makes a positive climate impact.”—Matt Kerner, CVP Microsoft Cloud for Industry, Microsoft. Why is methane important? Methane is approximately 85 times more potent than carbon dioxide (CO2) at trapping heat in the atmosphere over a 20-year period. It is the second most abundant anthropogenic greenhouse gas after CO2, accounting for about 20 percent of global emissions. The global oil and gas industry is one of the primary sources of methane emissions. These emissions occur across the entire oil and gas value chain, from production and processing to transmission, storage, and distribution. The International Energy Agency (IEA) estimates that it is technically possible to avoid around 75 percent of today’s methane emissions from global oil and gas operations. These statistics drive home the importance of addressing this critical issue. Microsoft’s investment in Project Astra Microsoft has signed on to the Project Astra initiative—together with leading energy companies, public sector organizations, and academic institutions—in a coordinated effort to demonstrate a novel approach to detecting and measuring methane emissions from oil and gas production sites. Project Astra entails an innovative sensor network that harnesses advances in methane-sensing technologies, data sharing, and data analytics to provide near-continuous emissions monitoring of methane across oil and gas facilities. Once operational, this kind of smart digital network would allow producers and regulators to pinpoint methane releases for timely remediation. Accenture and Microsoft—The future of methane management Attaining the goal of net-zero methane emissions is becoming increasingly possible. The technologies needed to mitigate emissions are maturing rapidly, and digital platforms are being developed to integrate complex components. As referenced in Accenture’s recent methane thought leadership piece, “More than hot air with methane emissions”. What is needed now is a shift—from a reactive paradigm to a preventative one—where the critical issue of leak detection and remediation is transformed into leak prevention by leveraging advanced technologies. Accenture’s specific capabilities and toolkit To date, the energy industry’s approach to methane management has been fragmented and comprised of a host of costly monitoring tools and equipment that have been siloed across various operational entities. These siloed solutions have made it difficult for energy companies to accurately analyze emissions data, at scale, and remediate those problems quickly. What has been lacking is a single, affordable platform that can integrate these components into an effective methane emissions mitigation tool. These components include enhanced detection and measurement capabilities, machine learning for better decision-making, and modified operating procedures and equipment that make “net-zero methane” happen faster. These platforms are being developed now and can accommodate a wide variety of technology solutions that will form the digital core necessary to achieve a competitive advantage. Accenture has created a Methane Emissions Monitoring Platform (MEMP) that facilitates the integration of multiple data streams and embeds key methane insights into business operations to drive action (see Figure 1 below). Figure 1: Accenture’s Methane Emissions Monitoring Platform (MEMP). The cloud-based platform, which runs on Microsoft Azure, enables energy companies to both measure baseline methane emissions in near real-time and detect leaks using satellites, fixed wing aircraft, and ground level sensing technologies. It is designed to integrate multiple data sources to optimize venting, flaring, and fugitive emissions. Figure 2 below illustrates the aspirational end-to-end process incorporating Microsoft technologies. MEMP also facilitates connectivity with back-end systems responsible for work order creation and management, including the scheduling and dispatching of field crews to remediate specific emission events. Figure 2: The Methane Emissions Monitoring Platform Workflow (aspirational). Microsoft’s AI tools powering Accenture’s Methane Emissions Monitoring Platform Microsoft has provided a number of Azure-based AI tools for tackling methane emissions, including tools that support sensor placement optimization, digital twin for methane Internet of Things (IoT) sensors, anomaly (leak) detection, and emission source attribution and quantification. These tools, when integrated with Accenture’s MEMP, allow users to monitor alerts in near real-time through a user-friendly interface, as shown in Figure 3. Figure 3: MEMP Landing Page visualizing wells, IoT sensors, and Work Orders. “Microsoft has developed differentiated AI capabilities for methane leak detection and remediation, and is excited to partner with Accenture in integrating these features onto their Methane Emissions Monitoring Platform, to deliver value to energy companies by empowering them in their path to net-zero emissions”—Merav Davidson, VP, Industry AI, Microsoft. Methane IoT sensor placement optimization Placing sensors in strategic locations to ensure maximum potential coverage of the field and timely detection of methane leaks is the first step towards building a reliable end-to-end IoT-based detection and quantification solution. Microsoft’s solution for sensor placement utilizes geospatial, meteorological, and historical leak rate data and an atmospheric dispersion model to model methane plumes from sources within the area of interest and obtain a consolidated view of emissions. It then selects the best locations for sensors using either a mathematical programming optimization method, a greedy approximation method, or an empirical downwind method that considers the dominant wind direction, subject to cost constraints. In addition, Microsoft provides a validation module to evaluate the performance of any candidate sensor placement strategy. Operators can evaluate the marginal gains offered by utilizing additional sensors in the network, through sensitivity analysis as shown in Figure 4 below. Figure 4: Left: Increase in leak coverage with a number of sensors. By increasing the number of sensors that are available for deployment, the leak detection ratio (i.e., the fraction of detected leaks by deployed sensors) increases. Right: Source coverage for 15 sensors. The arrows map each sensor (red circles) to the sources (black triangles) that it detects. End-to-end data pipeline for methane IoT sensors To achieve continuous monitoring of methane emissions from oil and gas assets, Microsoft has implemented an end-to-end solution pipeline where streaming data from IoT Hub is ingested into a Bronze Delta Lake table leveraging Structured Streaming on Spark. Sensor data cleaning, aggregation, and transformation to algorithm data model are done and the resultant data is stored in a Silver Delta Lake table in a format that is optimized for downstream AI tasks. Methane leak detection is performed using uni- and multi-variate anomaly detection models for improved reliability. Once a leak has been detected, its severity is also computed, and the emission source attribution and quantification algorithm then identifies the likely source of the leak and quantifies the leak rate. This event information is sent to the Accenture Work Order Prioritization module to trigger appropriate alerts based on the severity of the leak to enable timely remediation of fugitive or venting emissions. The quantified leaks can also be recorded and reported using tools such as the Microsoft Sustainability Manager app. The individual components of this end-to-end pipeline are described in the sections below and illustrated in Figure 5. Figure 5: End-to-end IoT data pipeline that runs on Microsoft Azure demonstrating methane leak detection, quantification, and remediation capabilities. Digital twin for methane IoT sensors Data streaming from IoT sensors deployed in the field needs to be orchestrated and reliably passed to the processing and AI execution pipeline. Microsoft’s solution creates a digital twin for every sensor. The digital twin comprises a sensor simulation module that is leveraged in different stages of the methane solution pipeline. The simulator is used to test the end-to-end pipeline before field deployment, reconstruct and analyze anomalous events through what-if scenarios and enable the source attribution and leak quantification module through a simulation-based, inverse modeling approach. Anomaly (leak) detection A methane leak at a source could manifest as an unusual rise in the methane concentration detected at nearby sensor locations that require timely mitigation. The first step towards identifying such an event is to trigger an alert through the anomaly detection system. A severity score is computed for each anomaly to help prioritize alerts. Microsoft provides the following two methods for time series anomaly detection, leveraging Microsoft’s open-source SynapseML library, which is built on the Apache Spark distributed computing framework and simplifies the creation of massively scalable machine learning pipelines: Univariate anomaly detection: Based on a single variable, for example, methane concentration. Multivariate anomaly detection: Used in scenarios where multiple variables, including methane concentration, wind speed, wind direction, temperature, relative humidity, and atmospheric pressure, are used to detect an anomaly. Post-processing steps are implemented to reliably flag true anomalous events so that remedial actions can be taken in a timely manner while reducing false positives to avoid unnecessary and expensive field trips for personnel. Figure 6 below illustrates this feature in Accenture’s MEMP: the ‘hover box” over Sensor 6 documents a total of seven alerts resulting in just two work orders being created. Figure 6: MEMP dashboard visualizing alerts and resulting work orders for Sensor 6. Emission source attribution and quantification Once deployed in the field, methane IoT sensors can only measure compound signals in the proximity of their location. For an area of interest that is densely populated with potential emission sources, the challenge is to identify the source(s) of the emission event. Microsoft provides two approaches for identifying the source of a leak: Area of influence attribution model: Given the sensor measurements and location, an “area of influence” is computed for a sensor location at which a leak is detected, based on the real-time wind direction and asset geo-location. Then, the asset(s) that lie within the computed “area of influence” are identified as potential emissions sources for that flagged leak. Bayesian attribution model: With this approach, source attribution is achieved through inversion of the methane dispersion model. The Bayesian approach comprises two main components—a source leak quantification model and a probabilistic ranking model—and can account for uncertainties in the data stemming from measurement noise, statistical and systematic errors, and provides the most likely sources for a detected leak, the associated confidence level and leak rate magnitude. Considering the high number of sources, low number of sensors, and the variability of the weather, this poses a complex but highly valuable inverse modeling problem to solve. Figure 7 provides insight regarding leaks and work orders for a particular well (Well 24). Specifically, diagrams provide well-centric and sensor-centric assessments that attribute a leak to this well. Figure 7: Leak Source Attribution for Well 24. Further, Accenture’s Work Order Prioritization module using Microsoft Dynamics 365 Field Service application (Figure 8) enables Energy operators to initiate remediation measures under the Leak Detection and Remediation (LDAR) paradigm. Figure 8: Dynamics 365 Work Order with emission source attribution and CH4 concentration trend data embedded. Looking ahead In partnership with Microsoft, Accenture is looking to continue refining MEMP, which is built on the advanced AI and statistical models presented in this blog. Future capabilities of MEMP look to move from “detection and remediation” to “prediction and prevention” of emission events, including enhanced event quantification and source attribution. Microsoft and Accenture will continue to invest in advanced capabilities with an eye toward both: Integrating industry standards platforms such as Azure Data Manager for Energy (ADME) and Open Footprint Forum to enable both publishing and consumption of emissions data. Leveraging Generative AI to simplify the user experience. Learn more Case study Duke Energy is working with Accenture and Microsoft on the development of a new technology platform designed to measure actual baseline methane emissions from natural gas distribution systems. Accenture Methane Emissions Monitoring Platform More information regarding Accenture’s MEMP can be found in “More than hot air with methane emissions”. Additional information regarding Accenture can be found on the Accenture homepage and on their energy page. Microsoft Azure Data Manager for Energy Azure Data Manager for Energy is an enterprise-grade, fully managed, OSDU Data Platform for the energy industry that is efficient, standardized, easy to deploy, and scalable for data management—ingesting, aggregating, storing, searching, and retrieving data. The platform will provide the scale, security, privacy, and compliance expected by our enterprise customers. The platform offers out-of-the-box compatibility with major service company applications, which allows geoscientists to use domain-specific applications on data contained in Azure Data Manager for Energy with ease. Related publications and conference presentations Source Attribution and Emissions Quantification for Methane Leak Detection: A Non-Linear Bayesian Regression Approach. Mirco Milletari, Sara Malvar, Yagna Oruganti, Leonardo Nunes, Yazeed Alaudah, Anirudh Badam. The 8th International Online & Onsite Conference on Machine Learning, Optimization, and Data Science. Surrogate Modeling for Methane Dispersion Simulations Using Fourier Neural Operator. Qie Zhang, Mirco Milletari, Yagna Oruganti, Philipp Witte. Presented at the NeurIPS 2022 Workshop on Tackling Climate Change with Machine Learning. 1https://climate.nasa.gov/news/3246/nasa-says-2022-fifth-warmest-year-on-record-warming-trend-continues/ The post Microsoft and Accenture partner to tackle methane emissions with AI technology appeared first on Azure Blog. View the full article
- August 23, 2023
- - delta lake
BigQuery now supports manifest files for querying open table formats

Google Cloud Platform posted a topic in Google Cloud Platform

In this blog we will give you an overview of the manifest support for BigQuery and also explain how it enables querying open table formats like Apache Hudi and Delta Lake in BigQuery. Open table formats rely on embedded metadata to provide transactionally consistent DML and time travel features. They keep different versions of the data files and are capable of generating manifests, which are lists of data files that represent a point-in-time snapshot. Many data runtimes like Delta Lake and Apache Hudi can generate manifests, which can be used for load and query use cases. BigQuery now supports manifest files, which will make it easier to query open table formats with BigQuery. BigQuery supports manifest files in SymLinkTextInputFormat, which is simply a newline-delimited list of URIs. Customers can now set the file_set_spec_type flag to NEW_LINE_DELIMITED_MANIFEST in table options to indicate that the provided URIs are newline-delimited manifest files, with one URI per line. This feature also supports partition pruning for hive-style partitioned tables which leads to better performance and lower cost. Here is an example of creating a BigLake table using a manifest file. code_block[StructValue([(u'code', u"CREATE EXTERNAL TABLE IF NOT EXISTS `my-project.mydataset.myTable`\r\nWITH CONNECTION `my-project.us.bl_connection`\r\nOPTIONS (\r\n uris = ['gs://demo/myTable/manifest/latest-manifest.csv'],\r\n format = 'PARQUET',\r\n file_set_spec_type = 'NEW_LINE_DELIMITED_MANIFEST');"), (u'language', u''), (u'caption', <wagtail.wagtailcore.rich_text.RichText object at 0x3e9d90b6d7d0>)])]Querying Apache Hudi using Manifest Support Apache Hudi is an open-source data management framework for big data workloads. It's built on top of Apache Hadoop and provides a mechanism to manage data in a Hadoop Distributed File System (HDFS) or any other cloud storage system. Hudi tables can be queried from BigQuery as external tables using the Hudi-BigQuery Connector. The Hudi-BigQuery integration only works for hive-style partitioned Copy-On-Write tables. The implementation precludes the use of some important query processing optimizations, which hurts performance and increases slot consumption. To overcome these pain points, the Hudi-BigQuery Connector is upgraded to leverage BigQuery’s manifest file support. Here is a step by step process to query Apache Hudi workloads using the Connector. Step 1: Download and build the BigQuery Hudi connector Download and build the latest hudi-gcp-bundle to run the BigQuerySyncTool. Step 2: Run the spark application to generate a BigQuery external table Here are the steps to use the connector using manifest approach: Drop the existing view that represents the Hudi table in BigQuery [if old implementation is used] The Hudi connector looks for the table name and if one exists it just updates the manifest file. Queries will start failing because of a schema mismatch. Make sure you drop the view before triggering the latest connector. Run the latest Hudi Connector to trigger the manifest approach Run the BigQuerySyncTool with the --use-bq-manifest-file flag. If you are transitioning from the old implementation, append --use-bq-manifest-file flag to the current spark submit that runs the existing connector. Using the same table name is recommended as it will allow keeping the existing downstream pipeline code. Running the connector with the use-bq-manifest-file flag will export a manifest file in a format supported by BigQuery and use it to create an external table with the name specified in the --table parameter. Here is a sample spark submit for the manifest approach. code_block[StructValue([(u'code', u'spark-submit \\\r\n --master yarn \\\r\n --packages com.google.cloud:google-cloud-bigquery:2.10.4 \\\r\n --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool \\\r\n hudi/packaging/hudi-gcp-bundle/target/hudi-gcp-bundle-0.14.0-SNAPSHOT.jar \\\r\n --project-id bq-hudi \\\r\n --dataset-name demo \\\r\n --dataset-location us \\\r\n --table nyc_taxi_hudi \\\r\n --source-uri gs://demo-bucket/hudi/taxi-trips/EventDate=* \\\r\n --source-uri-prefix gs://demo-bucket/hudi/taxi-trips/ \\\r\n --base-path gs://demo-bucket/hudi/taxi-trips/ \\\r\n --partitioned-by EventDate \\\r\n --use-bq-manifest-file'), (u'language', u''), (u'caption', <wagtail.wagtailcore.rich_text.RichText object at 0x3e9d90cdc610>)])]Step 3: Recommended: Upgrade to an accelerated BigLake table Customers running large-scale analytics can upgrade external tables to BigLake tables to set appropriate fine-grained controls and accelerate the performance of these workloads by taking advantage of metadata caching and materialized views. Querying Delta Lake using Manifest Support Delta Lake is an open-source storage framework that enables building a lakehouse architecture. It extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling. It also provides an option to export a manifest file that contains a list of data files that represent the point-in-time snapshot. With the manifest support, users can create a BigLake table to query the Delta Lake table on GCS. It is the responsibility of the user to generate the manifest whenever the underlying Delta Lake table changes and this approach only supports querying Delta Lake reader v1 tables. Here is a step by step process to query Delta Lake tables using manifest support. Step 1: Generate the Delta table’s manifests using Apache Spark Delta Lake supports exporting manifest files. The generate command generates manifest files at <path-to-delta-table>/_symlink_format_manifest/. The files in this directory will contain the names of the data files (that is, Parquet files) that should be read for reading a snapshot of the Delta table. code_block[StructValue([(u'code', u'Generating manifest file using Python\r\n\r\ndeltaTable = DeltaTable.forPath(<path-to-delta-table>)\r\ndeltaTable.generate("symlink_format_manifest")'), (u'language', u''), (u'caption', <wagtail.wagtailcore.rich_text.RichText object at 0x3e9d70070f50>)])]Step 2: Create a BigLake table on the generated manifests Create a manifest file based BigLake table using the manifest files generated from the previous step. If the underlying Delta Lake table is partitioned, you can create a hive style partitioned BigLake table. code_block[StructValue([(u'code', u'CREATE EXTERNAL TABLE IF NOT EXISTS `my-project.mydataset.myDeltaTable`\r\nWITH PARTITION COLUMNS (EventDate string)\r\nWITH CONNECTION `my-project.us.bl_connection`\r\nOPTIONS (\r\n hive_partition_uri_prefix = "<path-to-delta-table>/",\r\n uris = [\'<path-to-delta-table>/_symlink_format_manifest/*/manifest\'],\r\n file_set_spec_type = \'NEW_LINE_DELIMITED_MANIFEST\',\r\n format="PARQUET");'), (u'language', u''), (u'caption', <wagtail.wagtailcore.rich_text.RichText object at 0x3e9d73db5350>)])]Step 3: Recommended: Upgrade to an accelerated BigLake table Customers running large-scale analytics on Delta Lake workloads can accelerate the performance by taking advantage of metadata caching and materialized views. What’s Next? If you are an OSS customer looking to query your Delta lake or Apache Hudi workloads on GCS, please leverage the manifest support and if you are also looking to further accelerate the performance, you can do that by taking advantage of metadata caching and materialized views. For more InformationAccelerate BigLake performance to run large-scale analytics workloads Introduction to BigLake tables. Visit BigLake on Google Cloud. Acknowledgments: Micah Kornfield, Brian Hulette, Silvian Calman, Mahesh Bogadi, Garrett Casto, Yuri Volobuev, Justin Levandoski, Gaurav Saxena and the rest of the BigQuery Engineering team.
- August 10, 2023
- - delta lake
amazon athena Amazon Athena for Apache Spark now supports Apache Hudi, Apache Iceberg, and Delta Lake

Amazon Web Services posted a topic in Databases, Data Engineering & Data Science

Amazon Athena for Apache Spark now supports open-source data lake storage frameworks Apache Hudi 0.13, Apache Iceberg 1.2.1, and Linux Foundation Delta Lake 2.0.2. These frameworks simplify incremental data processing of large data sets using ACID (atomicity, consistency, isolation, durability) transactions and make it simpler to store and process large data sets in your data lakes. View the full article
- June 8, 2023
- - delta lake
  - athena
  - (and 4 more)
    Tagged with:
    
    delta lake
    
    athena
    
    apache spark
    
    spark
    
    apache hudi
    
    apache iceberg
aws glue crawlers AWS Glue Crawlers enhances support for Delta Lake Tables

Amazon Web Services posted a topic in Amazon Web Services

AWS Glue crawlers now have enhanced support for Linux Foundation Delta Lake tables, increasing operational efficiency to extract meaningful insights from analytics services such as Amazon Athena, Amazon EMR, and AWS Glue. This feature enables analytics services scan Delta Lake tables without requiring the creation of manifest files by Glue crawlers. Newly cataloged data is now quickly made available for analysis using your preferred analytics and machine learning (ML) tools. View the full article
- December 19, 2022
- - aws glue
  - delta lake
amazon athena Athena enhances read support for Delta Lake table format

Amazon Web Services posted a topic in Amazon Web Services

You can now query Delta Lake tables seamlessly in Amazon Athena, giving you the benefit of increased operational efficiency, improved query performance and reduced cost. Delta Lake is an open-source table format that helps implement modern data lake architectures commonly built on Amazon S3. Prior to this launch, reading Delta Lake tables in Athena required a complex process of generating and managing additional metadata files. Now you can use Athena to query Delta Lake tables directly without this additional effort. View the full article
- December 15, 2022
- - athena
  - delta lake
apache spark AWS Glue for Apache Spark Native support for Data Lake Frameworks (Apache Hudi, Apache Iceberg, Delta Lake)

Amazon Web Services posted a topic in Databases, Data Engineering & Data Science

AWS Glue for Apache Spark now supports three open source data lake storage frameworks: Apache Hudi, Apache Iceberg, and Linux Foundation Delta Lake. These frameworks allow you to read and write data in Amazon Simple Storage Service (Amazon S3) in a transactionally consistent manner. AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. This feature removes the need to install a separate connector and reduces the configuration steps required to use these frameworks in AWS Glue for Apache Spark jobs. View the full article
- November 28, 2022
- - aws glue
  - spark
  - (and 5 more)
    Tagged with:
    
    aws glue
    
    spark
    
    data lakes
    
    frameworks
    
    apache hudi
    
    apache iceberg
    
    delta lake
delta lake Delta Lake 2.0: An Innovative Open Storage Format

Linux.com posted a topic in Linux

Version 2.0 of the Delta Lake storage framework was recently released, adding multiple features that make it even easier to manage your data lake. Delta Lake is an open project that’s committed to the Linux Foundation’s desire to unlock the value of shared technology. It has shown strong growth in popularity since its initial 1.0 release, as is evidenced by the number of monthly downloads (presented in the Data + AI Summit 2022 Day 1 keynote by Michael Armbrust). The post Delta Lake 2.0: An Innovative Open Storage Format appeared first on Linux.com. View the full article
- October 12, 2022
- - open storage format
What’s next for digital transformation in the cloud

Google Cloud Platform posted a topic in Google Cloud Platform

Welcome to Google Cloud Next ’22 — an event where we come together from around the world to learn, engage, and solve tomorrow’s challenges. With customer expectations and global markets rapidly changing, organizations need to make sure they are prepared for tomorrow by making the right decisions today. This year’s event takes place at an inflection point in the cloud industry. Data and AI are transforming everything around us, and open, connected ecosystems are essential to everything we do. Now is the time where we need to embrace openness and interoperability, and we are incredibly excited about everything we are doing on that front. Helping customers transform for tomorrow We’re proud that industry-leading companies — many of the world’s biggest brands — are choosing Google Cloud. This includes 9 of the top 10 retailers, 8 of the top 10 banks, 8 of the top 10 automotive companies — and they’re driving impressive results. For example, Home Depot is saving 30% on its cloud costs; HSBC is running “what if” risk simulations 16 times faster; and Ford has already delivered more than 5 million software updates to customers automatically, adding new features and upgrades to drivers. In addition to large multinationals, some of the most impressive, new emerging companies are choosing Google Cloud. Tokopedia, one of the largest ecommerce companies in Indonesia, is using Google Cloud to deliver hundreds of terabytes of data in 700 milliseconds; Doordash is helping its 340K merchants better target promotions; and Rent the Runway is working with Google Cloud to migrate onto a more scalable container-based platform to better power its future growth and innovation. We are immensely proud that 70% of the top 100 startup “unicorns” run on Google Cloud, relying on our technology to create entirely new businesses and industries. From global brands like MLB in North America, H&M Group in Europe, Banco BV in Latin America, and ASX in Australia, to public sector institutions like the New York State, Central Dutch Government, and the U.S. Forest Service and U.S. Army, organizations worldwide continue to choose Google Cloud to help them build for the future. And, today, we are proud to announce new or expanded relationships with Coinbase, Prudential plc, Rite Aid, Snap, T-Mobile, Toyota, and Wayfair. What’s next in the cloud In the first half of 2022, we delivered approximately 1,300 new product and feature releases. Building on this work, we’re proud to reveal additional product innovations and partnerships to help the people in your organization — the data decision-makers, the developers and builders, the IT teams, the cybersecurity experts, and all employees — make transformation real, impactful, and sustainable. These span four key areas where Google Cloud differentiates: our data cloud, open infrastructure cloud, collaboration cloud, and trusted cloud. An open data cloud We’re announcing significant steps to provide the most open, extensible, and powerful data cloud — ensuring that customers can utilize all their data, from all sources, in all storage formats and styles of analysis, and across the cloud providers and platforms of their choice. We’re also releasing new applications and services to put powerful, state-of-the-art applied Google AI technology in the hands of more data people. Support for unstructured data in BigQuery: We’re launching a new capability to analyze unstructured, streaming data in BigQuery that will significantly expand the ability for people to work with all types of data. Support for major data formats in the industry: We're adding support for widely adopted data formats including Apache Iceberg, Delta Lake, and Apache Hudi, and offering a new integrated experience in BigQuery for Apache Spark. Looker Studio and Looker Studio Pro: We’re unifying two popular business intelligence tools, Looker and Google Data Studio, under the Looker umbrella to create a deep integration of Looker, Data Studio, and core Google technologies like AI and ML. This will help every self-service analyst infuse their workflows and applications with the intelligence needed to make data-driven decisions. We’re also introducing a professional version, Looker Studio Pro, that provides organizations support and key governance capabilities. Translation Hub: Organizations can now cost effectively localize content in more than 135 languages with Translation Hub, our new enterprise-scale translation AI Agent for self-serve document translation. Vertex AI Vision: This new service makes powerful computer vision and image recognition AI more accessible to data practitioners. It also reduces the time to create computer vision applications from days to minutes at one-tenth the cost of current offerings. Supporting all major open data platforms: To extend our leadership in driving an open data ecosystem, we’re expanding our integrations with many of the most popular enterprise data platforms, including Collibra, Elastic, MongoDB, Palantir Foundry, and ServiceNow. This will help to remove barriers between data, give customers more choice, and prevent data lock-in. Easy, transformative, open infrastructure To help IT teams build infrastructure for transformation, we're announcing a series of infrastructure and migration updates. These will make it easier than ever for organizations of all types and sizes to run on our cloud, at the edge, or in their data centers. Growing our global footprint: To meet the needs of our growing customer base, we continue to invest in the growth and expansion of our global network. Today, we’re announcing five new Google Cloud regions: Austria, Greece, Norway, South Africa, and Sweden. This is in addition to the four regions we announced earlier this year, for a total of 48 live and announced regions serving customers in more than 200 countries and territories. Workload-optimized infrastructure: We’re making a number of infrastructure enhancements tailored to our customers’ workloads, including the C3 machine series powered by the 4th Gen Intel Xeon Scalable processor and Google’s custom Intel Infrastructure Processing Unit (IPU). We’re also delivering AI-optimized infrastructure with the general availability of TPU v4, which runs large-scale training workloads up to 80% faster and up to 50% cheaper. Anthos enhancements: We’re making it easier for our customers to run their cloud where and how they want with enhancements to Anthos, including a more robust user interface, an upgraded fleet management experience, and the general availability of virtual machine support on Anthos clusters for retail edge environments. Dual Run and Migration Center: We’re unveiling several major investments that will assist our customers on their journey to the cloud: Dual Run will help remove the most common roadblocks from migrating 20+ year old mainframes into the cloud, and our new Migration Center brings assessment, planning, migration, and modernization tooling together in one location so organizations can proceed faster through their journey. Increasing our open source AI commitments: Anyone should be able to quickly and easily turn their artificial intelligence (AI) idea into reality. Today, we are taking that commitment a step further with the OpenXLA Project, an open-source ecosystem of ML technologies developed by Google, AMD, Arm, Intel, Meta, NVIDIA, and more. Advanced security tools to protect what’s important We continue to champion invisible security to help organizations move to a future where cloud security is engineered in and operations are simplified. Google has always provided a secure cloud infrastructure, and with the acquisition of Mandiant, we extend our cybersecurity leadership and expertise to help customers stay protected at every stage of the security lifecycle. This week we’re announcing new products, partnerships, and solutions to build the most open and extensible trusted cloud offering, including: Chronicle Security Operations: We’re unveiling a modern, cloud-born software suite that better enables cybersecurity teams to detect, investigate, and respond to threats with the speed, scale, and intelligence of Google. Confidential Space: The next solution in our groundbreaking Confidential Computing portfolio, Confidential Space will help unlock the value of secure data collaboration. Google Cloud Ready - Sovereign Cloud: This new program for partners and customers advances digital sovereignty on Europe's terms, to address growing demand for cloud solutions with the highest levels of control, transparency, and sovereignty. Software Delivery Shield: Protecting the integrity of the software that underpins the business is one of the top priorities for every organization today. To help our customers address this challenge, we’re introducing Software Delivery Shield, a fully managed, end-to-end software supply chain security solution from Google Cloud. New and expanded Google Cloud partnerships with leaders across the security ecosystem: We’re announcing a significant expansion of our trusted cloud ecosystem, featuring new integrations and offerings with more than 20 partners focused on digital sovereignty and cybersecurity. An integrated, AI-powered collaboration hub With hybrid work, the physical office is no longer the sole space in which people work together. Organizations need a digital workspace that helps their teams communicate and collaborate, create and share ideas, and get work done safely no matter where they’re working from. We're announcing innovations in Google Workspace and our partner ecosystem that will help organizations transform the way they work and unlock the full talent and productivity of their people: Immersive connections: For the past few years, we've reimagined Google Meet to deliver on our vision of enabling truly immersive connections so that a video call feels every bit as engaging as an in-person meeting. Now, we’re taking it a step further with new capabilities, including adaptive framing with AI-powered cameras from Huddly and Logitech that lets everyone in a conference room be seen clearly, and speaker spotlight, which enables presenters to better engage their audience by embedding their video directly in Google Slides. Beyond meetings, we’re helping teams have more authentic and expressive communications with custom emojis and inline threaded conversations for Google Chat. Smart canvas: We’ve invested heavily in smart canvas, our next-generation collaboration experience, that brings documents to life with intelligence and co-creation capabilities so teams can stay focused instead of switching between tools. The smart canvas experience is getting richer with custom building blocks in Google Docs, enabling companies to build their own templates that can be easily accessed by all users. We've also opened up smart canvas for 3rd-party applications. New smart chips for AODocs, Atlassian, Asana, Figma, Miro, Tableau, and ZenDesk will be released next year, so users can view and engage with rich third-party data without switching tabs or context. Work safer: We’re extending ways to keep people and data safe across more of the apps teams use on a day-to-day basis, including new data loss prevention (DLP) rules for Chat and the extension of Client Side Encryption to Gmail and Calendar. Extending the power of Workspace: For developers, we’re announcing new APIs for Meet and Chat, giving programmatic access to common functions like creating and starting meetings or initiating messages directly from a third-party app. Asana, Lumapps, and ZenDesk will be the first partners to leverage these in their apps. Developers can also embed their app directly into the Meet experience with a new Meet add-on SDK. Delivering next-generation computing to enterprises Our customers and partners put their trust in our team to deliver next-generation cloud technologies to help them become the best tech company in their industry. The combination of Google’s technical strengths, backed by its unique scale and deep experience in connecting that technology with consumer products and ecosystems, enables Google Cloud to put the tools of tomorrow in the hands of organizations today. A few examples of these developments include: The next frontier of collaborating and connecting from afar: Last year we announced Project Starline, a technology project that enables coworkers to feel like they are together, even when they are cities apart. It creates a 3D model of a person, making it feel like you are sitting with someone in the same room — not at the other end of a video call. After thousands of hours of testing in our own offices, including demos with enterprise partners, we’re seeing promising results. Today, we're announcing our next phase of testing with an early access program with enterprise partners such as Salesforce, T-Mobile, and others. Starting this year, we'll be deploying units in select partner offices for regular testing. As we build the future of hybrid work together with our enterprise partners, we look forward to seeing how Project Starline can help people form strong ties with one another. Powering the future of the Web3 ecosystem: We’re also helping Web3 founders and developers with scalability, reliability, security, and data, so they can spend the bulk of their time on innovation. Many of the largest exchanges in the world use Google Cloud to bring scale and speed to always-open spot markets for cryptocurrency trading. Today, we’re announcing a new partnership with Coinbase, who has selected Google Cloud to build advanced exchange and data services. We will also enable select customers to pay for cloud services via select cryptocurrencies by using Coinbase Commerce. This news builds on our recent Web3 announcements with Nansen, BNB Chain, Sky Mavis and NEAR Protocol. Powerful new ways for organizations to develop sustainably A critical part of helping customers make lasting change is by creating a more sustainable future, and we are committed to helping customers do this through products and services that minimize environmental impact. We proudly operate the cleanest cloud in the industry, and customers who run on Google Cloud can instantly improve their sustainability profiles. For example, Salesforce has reduced its cloud carbon emissions for certain workloads by up to 80%; Etsy migrated in record time and is speeding up innovation to meet sustainability goals; and Carrefour reduced energy consumption by 45% by moving from on-premises data centers to our public cloud infrastructure. To continue helping companies reach their sustainability goals and address climate change demands, we’re announcing today that Google Cloud Carbon Footprint is now Generally Available, and provided at no cost for every customer in the cloud console. In addition, eco-friendly routing is coming soon to the Google Maps Platform for developers, to help ridesharing and delivery companies embed eco-friendly routes into their driver applications. All of this work can’t be done alone. To scale our efforts, we are building our partner network across industries and since we announced the Google Cloud Ready - Sustainability designation this summer, we’ve seen incredible momentum with more than 20 partners achieving the designation. Next ’22 is a unique opportunity to share insights and learn together from customers, thought leaders and developers around the world. Throughout the week, we’ll share many more exciting product announcements and customer stories on our Google Cloud blog and press room, and replays of our more than 125 sessions will be available on the Next website after the event. We hope you join us! Stay up to date on Google Cloud's biggest event of the year. Related Article Read Article
- October 11, 2022
- - delta lake
Welcoming PyTorch to the Linux Foundation

Linux.com posted a topic in Linux

Today we are more than thrilled to welcome PyTorch to the Linux Foundation. Honestly, it’s hard to capture how big a deal this is for us in a single post but I’ll try. TL;DR — PyTorch is one of the most important and successful machine learning software projects in the world today. We are excited to work with the project maintainers, contributors and community to transition PyTorch to a neutral home where it can continue to enjoy strong growth and rapid innovation. We are grateful to the team at Meta, where PyTorch was incubated and grew into a massive ecosystem, for trusting the Linux Foundation with this crucial effort. The journey will be epic. The AI Imperative, Open Source and PyTorch Artificial Intelligence, Machine Learning, and Deep Learning are critical to present and future technology innovation. Growth around AI and ML communities and the code they generate has been nothing short of extraordinary. AI/ML is also a truly “open source-first” ecosystem. The majority of popular AI and ML tools and frameworks are open source. The community clearly values transparency and the ethos of open source. Open source communities are playing and will play a leading role in development of the tools and solutions that make AI and ML possible — and make it better over time. For all of the above reasons, the Linux Foundation understands that fostering open source in AI and ML is a key priority. The Linux Foundation already hosts and works with many projects that are either contributing directly to foundational AI/ML projects (LF AI & Data) or contributing to their use cases and integrating with their platforms. (e.g., LF Networking, AGL, Delta Lake, RISC-V, CNCF, Hyperledger). PyTorch extends and builds on these efforts. Obviously, PyTorch is one of the most important foundational platforms for development, testing and deployment of AI/ML and Deep Learning applications. If you need to build something in AI, if you need a library or a module, chances are there is something in PyTorch for that. If you peel back the cover of any AI application, there is a strong chance PyTorch is involved in some way. From improving the accuracy of disease diagnosis and heart attacks, to machine learning frameworks for self-driving cars to image quality assessment tools for astronomers, PyTorch is there. Originally incubated by Meta’s AI team, PyTorch has grown to include a massive community of contributors and users under their community-focused stewardship. The genius of PyTorch (and a credit to its maintainers) is that it is truly a foundational platform for so much AI/ML today, a real Swiss Army Knife. Just as developers built so much of the technology we know today atop Linux, the AI/ML community is building atop PyTorch – further enabling emerging technologies and evolving user needs. As of August 2022, PyTorch was one of the five-fastest growing open source software communities in the world alongside the Linux kernel and Kubernetes. From August 2021 through August 2022, PyTorch counted over 65,000 commits. Over 2,400 contributors participated in the effort, filing issues or PRs or writing documentation. These numbers place PyTorch among the most successful open source projects in history. Neutrality as a Catalyst Projects like PyTorch that have the potential to become a foundational platform for critical technology benefit from a neutral home. Neutrality and true community ownership are what has enabled Linux and Kubernetes to defy expectations by continuing to accelerate and grow faster even as they become more mature. Users, maintainers and the community begin to see them as part of a commons that they can rely on and trust, in perpetuity. By creating a neutral home, the PyTorch Foundation, we are collectively locking in a future of transparency, communal governance, and unprecedented scale for all. As part of the Linux Foundation, PyTorch and its community will benefit from our many programs and support communities like training and certification programs (we already have one in the works), to community research (like our Project Journey Reports) and, of course, community events. Working inside and alongside the Linux Foundation, the PyTorch community also has access to our LFX collaboration portal, enabling mentorships and helping the PyTorch community identify future leaders, find potential hires, and observe shared community dynamics. PyTorch has gotten to its current state through sound maintainership and open source community management. We’re not going to change any of the good things about PyTorch. In fact, we can’t wait to learn from Meta and the PyTorch community to improve the experiences and outcomes of other projects in the Foundation. For those wanting more insight about our plans for the PyTorch Foundation, I invite you to join Soumith Chintala (co-creator of PyTorch) and Dr. Ibrahim Haddad (Executive Director of the PyTorch Foundation) for a live discussion on Thursday entitled, PyTorch: A Foundation for Open Source AI/ML. We are grateful for Meta’s trust in “passing us the torch” (pun intended). Together with the community, we can build something (even more) insanely great and add to the global heritage of invaluable technology that underpins the present and the future of our lives. Welcome, PyTorch! We can’t wait to get started! The post Welcoming PyTorch to the Linux Foundation appeared first on Linux Foundation. The post Welcoming PyTorch to the Linux Foundation appeared first on Linux.com. View the full article
- September 12, 2022
- - delta lake
Welcoming PyTorch to the Linux Foundation

Linux Foundation posted a topic in Linux

Today we are more than thrilled to welcome PyTorch to the Linux Foundation. Honestly, it’s hard to capture how big a deal this is for us in a single post but I’ll try. TL;DR — PyTorch is one of the most important and successful machine learning software projects in the world today. We are excited to work with the project maintainers, contributors and community to transition PyTorch to a neutral home where it can continue to enjoy strong growth and rapid innovation. We are grateful to the team at Meta, where PyTorch was incubated and grew into a massive ecosystem, for trusting the Linux Foundation with this crucial effort. The journey will be epic. The AI Imperative, Open Source and PyTorch Artificial Intelligence, Machine Learning, and Deep Learning are critical to present and future technology innovation. Growth around AI and ML communities and the code they generate has been nothing short of extraordinary. AI/ML is also a truly “open source-first” ecosystem. The majority of popular AI and ML tools and frameworks are open source. The community clearly values transparency and the ethos of open source. Open source communities are playing and will play a leading role in development of the tools and solutions that make AI and ML possible — and make it better over time. For all of the above reasons, the Linux Foundation understands that fostering open source in AI and ML is a key priority. The Linux Foundation already hosts and works with many projects that are either contributing directly to foundational AI/ML projects (LF AI & Data) or contributing to their use cases and integrating with their platforms. (e.g., LF Networking, AGL, Delta Lake, RISC-V, CNCF, Hyperledger). PyTorch extends and builds on these efforts. Obviously, PyTorch is one of the most important foundational platforms for development, testing and deployment of AI/ML and Deep Learning applications. If you need to build something in AI, if you need a library or a module, chances are there is something in PyTorch for that. If you peel back the cover of any AI application, there is a strong chance PyTorch is involved in some way. From improving the accuracy of disease diagnosis and heart attacks, to machine learning frameworks for self-driving cars to image quality assessment tools for astronomers, PyTorch is there. Originally incubated by Meta’s AI team, PyTorch has grown to include a massive community of contributors and users under their community-focused stewardship. The genius of PyTorch (and a credit to its maintainers) is that it is truly a foundational platform for so much AI/ML today, a real Swiss Army Knife. Just as developers built so much of the technology we know today atop Linux, the AI/ML community is building atop PyTorch – further enabling emerging technologies and evolving user needs. As of August 2022, PyTorch was one of the five-fastest growing open source software communities in the world alongside the Linux kernel and Kubernetes. From August 2021 through August 2022, PyTorch counted over 65,000 commits. Over 2,400 contributors participated in the effort, filing issues or PRs or writing documentation. These numbers place PyTorch among the most successful open source projects in history. Neutrality as a Catalyst Projects like PyTorch that have the potential to become a foundational platform for critical technology benefit from a neutral home. Neutrality and true community ownership are what has enabled Linux and Kubernetes to defy expectations by continuing to accelerate and grow faster even as they become more mature. Users, maintainers and the community begin to see them as part of a commons that they can rely on and trust, in perpetuity. By creating a neutral home, the PyTorch Foundation, we are collectively locking in a future of transparency, communal governance, and unprecedented scale for all. As part of the Linux Foundation, PyTorch and its community will benefit from our many programs and support communities like training and certification programs (we already have one in the works), to community research (like our Project Journey Reports) and, of course, community events. Working inside and alongside the Linux Foundation, the PyTorch community also has access to our LFX collaboration portal, enabling mentorships and helping the PyTorch community identify future leaders, find potential hires, and observe shared community dynamics. PyTorch has gotten to its current state through sound maintainership and open source community management. We’re not going to change any of the good things about PyTorch. In fact, we can’t wait to learn from Meta and the PyTorch community to improve the experiences and outcomes of other projects in the Foundation. For those wanting more insight about our plans for the PyTorch Foundation, I invite you to join Soumith Chintala (co-creator of PyTorch) and Dr. Ibrahim Haddad (Executive Director of the PyTorch Foundation) for a live discussion on Thursday entitled, PyTorch: A Foundation for Open Source AI/ML. We are grateful for Meta’s trust in “passing us the torch” (pun intended). Together with the community, we can build something (even more) insanely great and add to the global heritage of invaluable technology that underpins the present and the future of our lives. Welcome, PyTorch! We can’t wait to get started! The post Welcoming PyTorch to the Linux Foundation appeared first on Linux Foundation. View the full article
- September 12, 2022
- - delta lake
delta lake Delta Lake project announces the availability of 2.0 Release Candidate

Linux.com posted a topic in Databases, Data Engineering & Data Science

Today, the Delta Lake project announced the Delta Lake 2.0 release candidate, which includes a collection of new features with vast performance and usability improvements. The final release of Delta Lake 2.0 will be made available later this year... View the full article
- June 28, 2022
delta lake Delta Lake project announces the availability of 2.0 Release Candidate

Linux Foundation posted a topic in Databases, Data Engineering & Data Science

Today, the Delta Lake project announced the Delta Lake 2.0 release candidate, which includes a collection of new features with vast performance and usability improvements. The final release of Delta Lake 2.0 will be made available later this year... The post Delta Lake project announces the availability of 2.0 Release Candidate appeared first on Linux Foundation. View the full article
- June 28, 2022
delta lake How to build an open cloud datalake with Delta Lake, Presto & Dataproc Metastore

Google Cloud Platform posted a topic in Databases, Data Engineering & Data Science

Organizations today build data lakes to process, manage and store large amounts of data that originate from different sources both on-premise and on cloud. As part of their data lake strategy, organizations want to leverage some of the leading OSS frameworks such as Apache Spark for data processing, Presto as a query engine and Open Formats for storing data such as Delta Lake for the flexibility to run anywhere and avoiding lock-ins... Read Article
- July 9, 2021
- - presto
  - dataproc
  - (and 1 more)
    Tagged with:
    
    presto
    
    dataproc
    
    metastore

Forum Statistics

73.3k
Total Topics

71.2k
Total Posts

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Calendars

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

Website URL

LinkedIn Profile URL

About Me

Cloud Platforms

Cloud Experience

Development Experience

Current Role

Skills

Certifications

Favourite Tools

Interests

Forum Statistics