Search the Community
Showing results for tags 'spanner'.
-
In today's fast-paced business environment, startups need to leverage the power of the cloud to achieve scale, performance, and consistency for their apps. Google Cloud provides three popular cloud databases that enable reliable PostgreSQL: Spanner, AlloyDB and Cloud SQL. In this article, we will explore the features and benefits of these databases, focusing on AlloyDB and Spanner and how startups can use them — together or separately — to simplify infrastructure, reduce operational costs, and maximize performance. Spanner: a scalable and globally distributed database Spanner is a fully managed database for both relational and non-relational workloads that is designed to scale horizontally across multiple regions and continents. Combining strong consistency, high availability, and low latency, it stands as the ideal solution for mission-critical applications demanding high throughput and rapid response times. Spanner provides a PostgreSQL interface, ensuring your schemas and queries are portable to other environments within or outside of Google Cloud. This also allows developers to leverage many of the tools and techniques they already know, flattening the learning curve when transitioning to Spanner. One of the key features of Spanner is its ability to replicate data across multiple regions while maintaining strong, ACID (atomicity, consistency, isolation, durability) transactions and a familiar SQL interface. On top of that, Spanner offers schema changes without downtime, fully automatic data replication, and data redundancy. As a result, developers can build applications that operate seamlessly across multiple regions without worrying about data consistency issues, regional failures, or planned maintenance. When is Spanner the right fit? Spanner also offers automatic horizontal scaling, from an inexpensive slice of one compute node to thousands of nodes (see in graph below), making it easy for a startup to increase or decrease their query and data capacity based on their workload demands. Spanner allows you to resize elastically without downtime or other disruption, so you can better align your usage with the workload. As a result, startups save money by paying only for the resources needed. In contrast to legacy scale-up databases, changing capacity typically involves 1) standing up new infrastructure, 2) migrating the schema and all of the data, and 3) a big-bang cutover in coordination with downstream applications. Spanner allows you to adjust capacity — read and write — on the fly with no downtime. A built-in managed autoscaler adjusts the capacity for you based on signals, such as CPU usage. Spanner scales linearly from tiny workloads—100 processing units, the equivalent of 0.1 node, and 400GB of data—to thousands of nodes, handling PB of data and millions of queries per second. Recent improvements have raised the storage capacity to 10TB per node and increased the throughput by 50%. For example, Niantic runs 5,000 node instances handling the traffic for Pokémon GO. This elasticity saves you money, reduces risks, and provides scale insurance. Even if you aren’t there today, rest assured you can grow to Niantic or Gmail-sized workloads without disruptive re-architecture with Spanner. Start small and scale with Spanner AlloyDB: A cloud-native and managed PostgreSQL database Google Cloud AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible database service that's designed for your most demanding workloads, including 1) transactional, 2) analytical, and 3) hybrid transactional and analytical processing (HTAP). In Google’s performance tests, AlloyDB delivers up to 100X faster analytical queries than standard PostgreSQL and is more than 2X faster than Amazon’s comparable PostgreSQL-compatible service for transactional workloads. AlloyDB also offers a number of features designed to simplify application development. For example, it supports standard PostgreSQL syntax and extensions, making it easy to write queries and manipulate data. Another important consideration: AlloyDB may be the better choice if you’re planning to build GenAI apps, thanks to AlloyDB AI, a built-in set of capabilities for working with vectors, models, and data. AlloyDB uses columnar storage for its columnar engine, which is designed to accelerate analytical queries. The columnar engine stores frequently queried data in an in-memory, columnar format, which can significantly improve the performance of these queries. Intelligent, workload-aware dynamic data organization leverages both row-based and column-based formats. Multiple layers of cache ensure excellent price-performance. Choosing the right database for your startup When it comes to choosing the right database for your startup, there are several factors to consider. First and foremost, you need to consider your application's requirements in terms of performance, availability, global consistency, and scalability. Are you building a consumer app for millions of concurrent users? Maybe a corporate app that will be used for real-time analytics? Each database has its own strengths. Feature AlloyDB Spanner Type Cloud-native, managed PostgreSQL database Globally distributed scalable database Supported engines PostgreSQL PostgreSQL, GoogleSQL Security Data encryption at rest and in transit Data encryption at rest and in transit Data residency Single region by default, multi-region available Multi-region by default Best for Hybrid transactional & analytical workloads, AI applications Mission-critical apps with high data consistency & global reach (multi-writer across regions) Spanner is the ideal choice for mission-critical applications that demand high scalability, unwavering consistency, and 99.999% SLA availability. Teams building applications that are evaluating sharding or active-active configurations, to work around scaling limitations can benefit from Spanner’s built-in, hands-free operations. Spanner enables development teams with its familiar SQL interface (including PostgreSQL dialect support) for seamless large-scale data processing. This ensures portability, flexibility, and simplifies use cases requiring high write scaling, global consistency, and adaptability to variable traffic. AlloyDB is a good choice for applications that need a high-performance, reliable, and scalable database with built-in support for advanced analytics and full PostgreSQL compatibility. AlloyDB supports real-time analytics applications because of its automatic data placement across tiers (e.g., buffer cache, ultra-fast cache, and block storage), and its ability to process up to 64 TiB of data per cluster in real time. AlloyDB is also reliable and offers a 99.99% SLA, including maintenance. Another option to consider is Cloud SQL, an enterprise-ready, fully managed relational database service that offers PostgreSQL, MySQL, and SQL Server engines. It is user-friendly as it provides a straightforward user interface with the familiar SQL interface with PostgreSQL, MySQL and SQL Server for easy interaction and only takes minutes to get your database up and running. Additionally, another important factor to keep in mind is your team's expertise and familiarity with different database technologies. If your team is already familiar with relational databases and the Google Cloud ecosystem, then Spanner may be the easier choice. If your team is more comfortable with PostgreSQL, then AlloyDB may be the better fit. Conclusion In conclusion, Spanner and AlloyDB are two powerful databases that offer different benefits and features for startups and can be used together or separately, depending on your needs. Together, AlloyDB and Spanner are a dynamic duo with which you can achieve performance and scalability based on Google’s innovations, delivering both responsive user interactions and robust, scalable back-end functionalities. With PostgreSQL and Google Cloud as the unifying threads, both services can co-exist seamlessly, forming a powerful combination for any application demanding high performance and unwavering reliability. For example, Character.ai uses AlloyDB and Spanner together in the same app that is at core of their business: AlloyDB for powering the interactive experience: At the user-facing front-end, AlloyDB shines as the engine behind quick, responsive interactions. Its unparalleled speed and performance ensure a smooth and intuitive user experience, critical for engaging with the AI model. Spanner as the backbone of history and workflow: Behind the scenes, Spanner maintains the complete history and workflow data integral to the AI integration. Its unshakeable scale and availability guarantee seamless data management, regardless of load or complexity. Both Spanner and AlloyDB operate within the familiar PostgreSQL ecosystem, offering a consistent and unified development experience. This empowers developers to leverage their existing skills and knowledge, accelerating integration and workflow. Additionally, the Google Cloud Platform provides a robust and secure environment for both services, ensuring seamless data management and operational efficiency. View the full article
-
Editor’s note: In the previous post in this serieswe introduced primary key default values in Spanner. We also showed how to use the UUIDs and integer SEQUENCEs to automatically generate keys in the database. In this post we’ll show how to use these new capabilities to migrate schemas and data from other databases to Spanner, minimizing changes to downstream applications and ensuring Spanner best practices. Spanner is a distributed relational database that is designed for the highest levels of availability and consistency at any scale. It allows users to seamlessly scale resources up and down to optimize costs based on their real-time needs, while maintaining continuous operation. Customers in gaming, retail, financial services, and many other industries rely on Spanner for their most demanding workloads. Migrating to SpannerMany of these workloads did not start on Spanner, though. Customers come to Spanner from different relational and non-relational databases, looking to take advantage of Spanner’s seamless scaling and fully managed experience. Spanner provides a set of tools and best practices to facilitate migrations. The Spanner Migration Tools include assessment, schema translation, and data movement for terabyte-sized databases coming from MySQL and PostgreSQL. For broader migration guidance, you can refer to the Spanner documentation. In this post we’ll focus specifically on migrating databases that use auto-generated keys, in particular, auto-incrementing sequential integers and UUIDs. Each of the migration strategies below addresses the key requirements: Ensure the fidelity and correctness of the migrated keysMinimize downstream application changes, such as changing types or values of the keys themselvesSupport replication scenarios where either the source or target database generates the keys and data is synchronized between them, for example, to do a live cutover between systemsImplement Spanner best practices for performance and scalabilityMigrating sequential keysWe’ll start with the most common scenario for relational workloads coming to Spanner: migrating from a single-instance database that uses sequential monotonic keys, for example AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL, or the standard IDENTITY type in SQL Server or Oracle. For databases that manage writes on a single machine, a counter to provide sequential keys is simple. However, ordered keys can cause performance hotspots in a distributed system like Spanner. At a high level, the strategy to migrate sequential keys to Spanner is: Define a copy of the table in Spanner using an integer primary key, just as in the source database.Create a sequence in Spanner and set the table’s primary key to use it for its default value.Load the data with its keys as-is from the source database into Spanner, for example using the Spanner Migration Tool or the lower-level Dataflow templates.Optionally set foreign key constraints for any dependent tables.Before inserting new data, configure the Spanner sequence to skip values in the range of the existing keys.Insert new data, as before, allowing the sequence to generate keys by default.Let’s start by defining the table and related sequence. In Spanner you define a new SEQUENCE object and set it as the default primary value of the destination table, for example using the GoogleSQL dialect: code_block<ListValue: [StructValue([('code', "CREATE SEQUENCE singer_id_sequence OPTIONS (\r\n sequence_kind = 'bit_reversed_positive’\r\n);\r\n\r\nCREATE TABLE singers (\r\n singer_id INT64 DEFAULT\r\n (GET_NEXT_SEQUENCE_VALUE(SEQUENCE singer_id_sequence)),\r\n name STRING(1024),\r\n biography STRING(MAX),\r\n) PRIMARY KEY (singer_id);\r\n\r\nCREATE TABLE albums (\r\n album_id INT64,\r\n singer_id INT64\r\n album_nam STRING(1024),\r\n song_list STRING(MAX),\r\n CONSTRAINT FK_singer_album\r\n FOREIGN KEY (album_id) REFERENCES singers (singer_id)\r\n) PRIMARY KEY (album_id);"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef96b79f250>)])]>The requiredbit_reversed_positiveoption indicates that the numbers generated by the sequence will be greater than zero, but not ordered (see theintroductory postfor more information on bit-reversed sequences). Generated values are of type INT64. As you migrate existing rows from your source database to Spanner, the rows’ keys remain unchanged. For new inserts that don’t specify a primary key, Spanner automatically calls theGET_NEXT_SEQUENCE_VALUE()function to retrieve a new number. Since these values distribute uniformly across the range[1, 263],there could be collisions with the existing keys. If this occurred your insert would fail with a “key already exists” error. To prevent this, you can configure the sequence toskipthe range of values covered by the existing keys. For example, assuming that the tablesingerswas migrated from PostgreSQL, where its key,singer_id, was inSERIALtype: code_block<ListValue: [StructValue([('code', 'CREATE TABLE singers (\r\n singer_id SERIAL PRIMARY KEY,\r\n name varchar(1024),\r\n biography varchar\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef96b79fa60>)])]>The column values are monotonically increasing. After migration, we retrieve the maximum value of thesinger_id: code_block<ListValue: [StructValue([('code', 'SELECT MAX(singer_id) FROM singers;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcab80>)])]>Assuming the returned value is 20,000, you configure the sequence in Spanner to skip the range[1, 21000]. The extra 1,000 serves as a buffer to accommodate writes to the source database after the initial bulk migration. These values would typically be replicated later and we want to ensure they also will not conflict. code_block<ListValue: [StructValue([('code', 'ALTER SEQUENCE singer_id_sequence SET OPTIONS (\r\n skip_range_min = 1,\r\n skip_range_max = 21000\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca430>)])]>The diagram below illustrates a few migrated rows along with new rows inserted in Spanner after migration: Now new keys generated in Spanner are guaranteed to not conflict with the range of keys generated in the source PostgreSQL database. Multi-database usageYou can take this skipped range concept one step further to support scenarios where either Spanner or the upstream database generates primary keys, for example to enable replication in either direction for disaster recovery during a migration cutover. To support this, you can configure each database to have a non-overlapping key value range. When you define a range for the other database, you can tell the Spanner sequence to skip over that range with the skipped range syntax. For example, after the bulk migration of our music tracking application, we’ll replicate data from PostgreSQL to Spanner to reduce the amount of time it takes to cut over. Once we’ve updated and tested the application on Spanner, we’ll cut over from PostgreSQL to Spanner, making it the system of record for updates and new primary keys. When we do, we‘ll reverse the flow of data between databases and replicate data back to the PostgreSQL instance just in case we need to revert if there’s a problem. In this scenario, sinceSERIALkeys in PostgreSQL are 32-bit signed integers, while our keys in Spanner are larger 64-bit numbers, we will do the following steps: 1. In PostgreSQL, alter the key column to be a 64-bit column, orbigint, code_block<ListValue: [StructValue([('code', 'ALTER TABLE singers ALTER COLUMN singer_id TYPE bigint;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca5e0>)])]>2. Since the sequencesingers_singer_id_seqused bysinger_idis still of typeint, its maximum value is already 231-1. To be safe, we can optionally set a CHECK constraint to the table in the source PostgreSQL database to ensure thatsinger_idvalues are always smaller or equal to 231-1: code_block<ListValue: [StructValue([('code', 'ALTER TABLE singers ADD CHECK (singer_id <= 2147483647);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca3d0>)])]>3. In Spanner, we’ll alter the sequence to skip the range [1, 231-1]. code_block<ListValue: [StructValue([('code', 'ALTER SEQUENCE singer_id_sequence SET OPTIONS (\r\n skip_range_min = 1,\r\n skip_range_max = 2147483647 -- 2^31-1\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca640>)])]>4. Deploy and test your usage, including from PostgreSQL to Spanner and vice versa. Using this technique, PostgreSQL will always generate keys in the 32-bit integer space, while Spanner’s keys are restricted to the 64-bit integer space, larger than all of the 32-bit numbers and wide enough for future growth. This ensures that both systems can independently generate keys that are guaranteed not to conflict. Migrating UUIDsUUID primary keys are generally easier to migrate than sequential integer keys.UUIDs, v4 in particular, are effectively unique regardless of where they are generated. (The math behind this is an interesting application of thebirthday problemin statistics.) As a result, UUID keys generated elsewhere will integrate easily with new UUID keys generated in Spanner and vice versa. The high-level strategy for migrating UUID keys is as follows: Define your UUID keys in Spanner using string columns with a default expression,GENERATE_UUID()orspanner.generate_uuid()in the PostgreSQL dialect. Export data from the source system, serializing the UUID keys as strings. Import the keys into Spanner as-is. Optionally enable foreign keys. In Spanner, you define a UUID primary key column as aSTRINGorTEXTtype, and assignGENERATE_UUID()as its default value. During migration, you bring all values of existing rows from the source database to Spanner, including key values. (See thismigration guidefor more details.) After migration, as new rows are inserted, Spanner callsGENERATE_UUID()to generate new UUID values for them. For example, the primary keyFanClubIdwill get a UUIDv4 value when we insert a new row into the table,FanClubs: code_block<ListValue: [StructValue([('code', 'CREATE TABLE fan_clubs (\r\n fan_club_id STRING(36) DEFAULT (GENERATE_UUID()),\r\n club_name STRING(1024),\r\n) PRIMARY KEY (fan_club_id);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcaa60>)])]>code_block<ListValue: [StructValue([('code', 'INSERT INTO fan_clubs (club_name) VALUES ("SwiftFanClub");'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca1f0>)])]>Migrating your own primary keysBit-reversed sequences and UUIDs provide unique values that won’t hotspot at scale when used as a primary key in Spanner. But they don’t provide any guarantees on the ordering of their values… by design! However, some applications rely on the order of the keys to determine recency or to sequence newly created data. Databases manually sharded for scale typically rely on a global counter, coordinated outside of any independent database instances. To use ordered keys generated externally in Spanner you create acomposite keythat combines a uniformly distributed value, such as a shard ID or a hash, as the first component and a sequential number as the second component. This preserves the ordered key values, but won’t hotspot at scale. In this example, we are migrating a MySQL table with anAUTO_INCREMENTprimary key,students, to Spanner. The downstream application generates student IDs, and the IDs are shared to end users (students, faculty, etc.) code_block<ListValue: [StructValue([('code', '// This is the table to be migrated from MySQL\r\nCREATE TABLE students (\r\n student_id INT NOT NULL AUTO_INCREMENT,\r\n info VARCHAR(2048),\r\n PRIMARY KEY (student_id)\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcae20>)])]>In Spanner, we add a generated column containing a hash of theStudentIdcolumn: code_block<ListValue: [StructValue([('code', 'CREATE TABLE student (\r\n student_id_hash INT64 AS (FARM_FINGERPRINT(student_id)) STORED,\r\n student_id INT64 NOT NULL,\r\n info STRING(2048),\r\n) PRIMARY KEY (student_id_hash, student_id);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcab50>)])]>Get started todayWe recently introduced new capabilities that help users implement best practices for primary keys in Spanner using the SQL concepts they already know. The strategies detailed above minimize downstream application changes and maximize performance and availability in Spanner when migrating auto-incrementing and UUID from other relational databases. You can learn more about what makes Spanner unique and how it’s being used today. Or try it yourself for free for 90-days or for as little as $65 USD/month for a production-ready instance that grows with your business without downtime or disruptive rearchitecture.
-
Forum Statistics
70.4k
Total Topics68.3k
Total Posts