Criar um "cadastro nacional de usuários de internet" não é uma boa ideia. Na verdade, pra chegar perto de ser péssima ainda precisaria melhorar muito.
Não é tecnicamente viável,
não é economicamente viável,
não resolve o problema que se propunha a resolver e ainda cria novos.
NOTA PÚBLICA em que expressa discordância sobre o Projeto de Lei que propõe criação de
If you’re building a SaaS application, you probably already have the notion of tenancy built in your data model. Typically, most information relates to tenants/customers/accounts and your database tables capture this natural relation.
With smaller amounts of data (10s of GB), it’s easy to throw more hardware at the problem and scale up your database. As these tables grow however, you need to think about ways to scale your multi-tenant database across dozens or hundreds of machines.
After our blog post on sharding a multi-tenant app with Postgres, we received a number of questions on architectural patterns for multi-tenant databases and when to use which. At a high level, developers have three options:
The option you pick has implications on scalability, how you handle data that varies across tenants, isolation, and ease-of-maintenance. And these implications have been discussed in detail across many StackOverflow questions and database articles. So, what is the best solution?
In practice, each of the three design options -with enough effort- can address questions around scale, data that varies across tenants, and isolation. The decision depends on the primary dimension you’re building/optimizing for. The tldr:
In this blog post, we’ll focus on the scaling dimension, as we found that more users who talked to us had questions in that area. (We also intend to describe considerations around isolation in a follow-up blog post.)
To expand on this further, if you’re planning to have 5 or 50 tenants in your B2B application, and your database is running into scalability issues, then you can create and maintain a separate database for each tenant. If however you plan to have thousands of tenants, then sharding your tables on a tenantid/accountid column will help you scale in a much better way.
Common benefits of having all tenants share the same database are:
Resource pooling (reduced cost): If you create a separate database for each tenant, then you need to allocate resources to that database. Further, databases usually make assumptions about resources available to them–for example, PostgreSQL has shared_buffers, makes good use of the operating system cache, comes with connection count settings, runs processes in the background, and writes logs and data to disk. If you’re running 50 of these databases on a few physical machines, then resource pooling becomes tricky even with today’s virtualization tech.
If you have a distributed database that manages all tenants, then you’re using your database for what it’s designed to do. You could shard your tables on tenant_id and easily support 1000s or tens of thousands of tenants.“”
Google’s F1 paper is a good example that demonstrates a multi-tenant database that scales this way. The paper talks about technical challenges associated with scaling out the Google AdWords platform; and at its core describes a multi-tenant database. The F1 paper also highlights how best to model data to support many tenants/customers in a distributed database.
The data model on the left-hand side follows the relational database model and uses foreign key constraints to ensure data integrity in the database. This strict relational model introduces certain drawbacks in a distributed environment however.
In particular, most transactions and joins you perform on your database, and constraints you’d like to enforce across your tables, have a customer/tenant dimension to them. If you shard your tables on their primary key column (in the relational model), then most distributed transactions, joins, and constraints become expensive. Network and machine failures further add to this cost.
The diagram on the right-hand side proposes the hierarchical database model. This model is the one used by F1 and resolves the previously mentioned issues. In its simplest form, you add a customerid/tenantid column to your tables and shard them on customer_id. This ensures that data from the same customer gets colocated together – co-location dramatically reduces the cost associated with distributed transactions, joins, and foreign key constraints.
Ease of maintenance: Another challenge associated with supporting 100-100K tenants is schema changes (Alter Table) and index creations (Create Index). As your application grows, you will iterate on your database model and make improvements.
If you’re following an architecture where each tenant lives in a separate database, then you need to implement an infrastructure that ensures that each schema change either succeeds across all tenants or gets eventually rolled back. For example, what happens when you changed the schema for 5,000 of 10K tenants and observed a failure? How do you handle that?
When you shard your tables for multi-tenancy, then you’re having your database do the work for you. The database will either ensure that an Alter Table goes through across all shards, or it will roll it back.
What about data that varies across tenants? Another challenge with scaling to thousands of tenants relates to handling data that varies across tenants. Your multi-tenant application will naturally include a standard database setup with default tables, fields, queries, and relationships that are appropriate to your solution. But different tenants/organizations may have their own unique needs that a rigid, inextensible default data model won’t be able to address. For example, one organization may need to track their stores in the US through their zip codes. Another customer in Europe might not care about US zip codes, but may be interested to keep tax ratios for each store.
This used to be an area where having a tenant per database offered the most flexibility, at the cost of extra maintenance work from the developer(s). You could create separate tables or columns per tenant in each database, and manage those differences across time.
If then you wanted to scale your infrastructure to thousands of tenants, you’d create a huge table with many string columns (Value0, Value1, … Value500). Probably, the best known example of this model is Salesforce’s multi-tenant architecture.
In this database model, your tables have a preset collection of custom columns, labeled in this image as V1, V2, and V3. Dates and Numbers are stored as strings in a format such that they can be converted to their native types. When you’re storing data associated with a particular tenant, you can then use these custom columns and tailor them to each tenant’s special needs.
Fortunately, designing your database to account for “flexible” columns significantly easier with the introduction of semi-structured data types. PostgreSQL has a rich set of semi-structured data types that include hstore, json, and jsonb. You can now represent the previous database schema by simply declaring a jsonb column and scale to thousands of tenants.
Of course, these aren’t the only design criteria and questions to be aware of. If you shard your database tables, how do you handle isolation or integrate with ORM libraries? What happens if you have a table that you can’t easily add a tenant_id column? In this blog post, we focused on building multi-tenant databases with scaling as the primary consideration in mind; and skipped over certain points. If you’re looking to learn more about designing multi-tenant databases, please sign up for our upcoming webinar on the topic!
The good news is, databases have advanced quite a bit in the past ten years in accommodating SaaS applications at scale. What was once only available to the likes of Google and Salesforce with significant engineering effort, is now becoming accessible to everyone with open-source technologies such as PostgreSQL and Citus. If you’re thinking of modernizing your multi-tenant architecture, drop us a line and we’d be happy to chat.
This is Part 1 of Bible Design Blog’s extended look at the new 6-Volume Reader’s Bible published by Crossway. This post gives an overview of the project and shares my general assessment of its success. In later posts I will dig deeper into some of the details, from the typography and paper to slipcases. For a complete list of articles, scroll to the bottom of this post.
The ESV Reader’s Bible, Six-Volume Set arrived on my doorstep on the fifteenth of August at 12:53 p.m. It was a warm day, pleasantly windy. The box felt heavy in my arms. I set it on the dining table and went in search of a knife. Before I opened the package, I studied the printing on the side: Legatoria Editoriale Giovanni Olivotto, with an address in Vicenza. The moment felt a bit momentous, so I did something I never do: snapped a photo of the unopened box. I hate unboxing videos. I’m temperamentally opposed to watching a grown person open a package online and linger adoringly over invoice, brochure, and packing peanuts. I resisted the urge to violate this conviction. But only just.
I don’t get excited about Bibles anymore. That’s what I kept telling myself, anyway. For almost a decade I’ve been writing about quality editions of the Bible, poring over details of print and paper and binding. Publishers send review copies, and if I’m interested in what I see, I write about them. When I meet my readers in person, two questions always come up: “Why don’t you post more often?” and “How cool is it that publishers send you free Bibles?” Well, it is cool, but not screaming-like-a-kid-on-a-rollercoaster cool. I’m a professional, after all. Sort of.
The cloth-over-board set (left) is available for $110 from EvangelicalBible.com, and the leather-over-board set (right) costs $300.
So I opened the box and lifted the wooden slipcase from its cushioned berth, pretending that I wasn’t jumping up and down on the inside. I was, though. A lot. And the more time I spend with the 6-Volume Reader’s Bible, the less reserved I get. This is a beautiful concept executed beautifully. It’s one of the best editions I have ever covered at Bible Design Blog.
Good book design should be reader-friendly. Some texts present more of a challenge than others. Novels are easy. Bibles are hard. Scripture consists of sixty-six separate books of various lengths (more if you include the Apocrypha). That’s a lot of words. The simple task of designing a single volume to hold all that and still be readable is a challenge. Then you add all the chapter and verse numbers, the cross-references, the concordances, and the task becomes rather difficult. No matter how good the designer, certain compromises are inevitable: minuscule text, two columns, ant-like armies of references crawling down the margins.
This is what we’re used to.
The history of the printed Bible began in the mid-to-late fifteenth century and quickly became the history of the reference Bible. As Glenn Paauw relates the story in his excellent Saving the Bible from Ourselves, the steady creep of extra-biblical material onto the page resulted by the mid-sixteenth century in the reference edition more or less as we know it today. “It was the death knell for a certain kind of Bible,” Paauw writes, “a Bible that presented something closer to what the Scriptures inherently were.”
Reader’s Bibles are an attempt to unring that bell. They remove the extras and give the biblical text room to breathe. They offer up Scripture in a flowing single column paragraphed layout. They design the Bible like the kind of book you actually read, instead of the sort you only use for looking things up.
Crossway released an excellent ESV Reader’s Bible in 2014. In my review, I expressed the hope that the format would catch on. “I’d love to see one of these on everyone’s shelf, regardless of your preferred translation,” I wrote. “This is a format to spend some time with in the hope of recapturing a less mediated experience of reading the Bible.” Crossway has also published reader-friendly formats of The Psalms and The Gospels (see below).
For some people, the idea of a multi-volume edition of Scripture might be a hard sell. Why would I want a Bible in six volumes, far too heavy and cumbersome for easy portability, when I can have the whole epic story under one cover? Well, dividing the text into multiple volumes actually solves one of the greatest challenges associated with Bible printing: the necessity for sheer, ultra thin paper. Compare the original single-volume Reader’s Bible with the new ESV Reader’s Bible, Six-Volume Set and you’ll notice one thing right away. The pages in the new set are much more opaque. See, as long as you’re fitting all those words under one cover, thin speciality paper is a must. Dividing the sixty-six books into six separate volumes frees you from that necessity. Whereas the one-volume Reader’s Bible was printed on 30 gsm Apple Thin Opaque paper, this one is printed on 80 gsm Munken Premium Cream. The sales literature describes it as “opaque and soft without being too bulky,” which is right on. Another way of putting it would be, this just feels like a nicely made book. You won’t think about the paper at all. You’ll think about the words on the page.
It helps to stop and consider what kind of set this is. The 6-Volume Reader’s Bible isn’t going to replace your fine print all-in-one edition. That’s not the point. Rather, it fills a niche that has largely gone unaddressed in the past: the need for a Bible designed for a lifetime of reading.
When I develop a love for a particular author, one of the things I do is search for nice editions of that writer’s work. Last year a friend pulled me into a reading challenge: together we would make our way through all of John Buchan’s Richard Hannay novels, from The Thirty-Nine Steps to The Island of Sheep. Since I was planning to spend a lot of time with Buchan, I hunted online for a set of the Folio Society edition of the novels. The five novels are beautifully designed and bound, grouped together in a sturdy slipcase. (Sound familiar?) When you look at the Folio Society set side by side with the 6-Volume Reader’s Bible, a light bulb should illuminate above your head: “Ah ha! So, that’s the kind of thing this is.”
And the 6-Volume Reader’s Bible is quite a good example of that sort of thing, too. In all its details, from design to printing to binding, it compares favorably to the work of high end publishers like the Folio Society.
There are two versions of the set, one bound in cloth-covered boards with slipcase ($110) and another bound leather-over-boards with a dovetailed walnut slipcase ($300). The leather-over-boards set is an EvangelicalBible.com exclusive, by the way. Considering the cost of high quality Bibles these days, the leather set feels like value for money. Both options ooze with distinction, though.
The interior design is new for this edition. The text is set in 12 pt. Trinité No. 2, a typeface “inspired by the ideal harmony found in Renaissance incunabula,” and the lines of text are generously leaded. A single page in the original Reader’s Bible contained 42 lines of text. In the 6-Volume Reader’s Bible, there are just 28. Apart from the occasional section heading, running headers at the top of the page, and the actual page numbers, there is nothing on the landscape but a gloriously proportioned single column text.
Trade-offs: the original ESV Reader’s Bible (right) is much more portable, but the new 6-Volume set is much more readable.
Compared to the one-volume Reader’s Bible (above), the new 6-Volume Reader’s Bible has larger type, more opaque paper, and almost half as many lines of text per page.
In other words, when you open the 6-Volume Reader’s Bible, what you see is just a well-designed book. No clutter, nothing to call attention to itself. Here’s a telling observation: when EvangelicalBible.com posted the first photos of these sets online, creating a bit of a social media sensation, I snapped a photo of the one I happened to be reading and posted it on Instagram. No feeding frenzy, though, because I photographed the book opened on a table, where it is pretty much indistinguishable from any other book — which is the point. (One commenter did get wise: “That looks suspiciously like a Bibliotheca volume.” Well, close.)
Crossway has produced a video that gives us a look inside the production process:
A wealth of production information is included in the booklet accompanying the set, too. The books are printed and bound in Italy by Legatoria Editoriale Giovanni Olivotto — L.E.G.O. for short. Printed on a Timson T48 offset web press, the 48-page signatures are gathered into books and Smyth-sewn. The cover cloth is Manifattura Tasmania 7107 stretched over 2.25 mm board and the ink, in case you’re wondering, is Inkredible Revolution Black. The leather bindings are done in lightly grained black cowhide with a nice sheen.
A Beautiful Read
All of which means little if the 6-Volume Reader’s Bible isn’t a delight to read. Well, it is. It truly is. Each volume, thick or thin, feels good in the hand. They have a trim size of 8” x 5.5” — the same as the original ESV Reader’s Bible — which makes them comfortable to hold. Unlike the leather-over-boards edition of The Gospels, they open flat and are not too bulky. The boards are relatively thin and the leather sufficiently pared to avoid extra thickness.
A deeper look at the paper is coming soon. Suffice to say, the 80 gsm sheets strike a pleasing balance between opacity and suppleness. As much as I love The Psalms and The Gospels, I find the paper in each volume a bit thick. Not here. I can hold these books open with one hand, read for a long period, and never be distracted by bleed-through or the feel of the pages. A well made book doesn’t call attention to itself, and these are well made books. In comparison to the leather-bound editions of those earlier reader-friendly volumes, too, L.E.G.O. has brought an extra level of refinement to the binding.
The Gospels (above) is quite a nice edition, but the thicker paper prevents it from opening flat out of the box. The Reader’s Bible (below) offers a more refined experience.
Compared to earlier L.E.G.O. leather-over-boards editions like The Gospels (below), this binding is trimmer, more elegant, and has a pleasing gloss finish.
Each volume has a single ribbon for marking progress. I’m used to having two or three ribbons, so at first I wanted more. Then I remembered that this Bible actually has six ribbons, one in each volume. That’s plenty, right? You will need that ribbon, too, because a Bible like this invites deeper reading. I’m still amazed how much more I read, and how much more I notice in what I read, compared to traditional reference formats.
The question is, do you go with the clothbound set or spring for the leather? On aesthetics, the leather-over-board option wins. The deep black and warm brown combination of leather and wood is ridiculously handsome, not to mention ridiculously photogenic. I’m not as big a fan of the earth-tone cloth-over-board covers with their intricate design … until I handle them. The cloth has a nice tactile feedback, and the volumes feel great in the hand. There really isn’t a bad option here. If you can swing the leather set, though, it’s heirloom quality and I doubt you’ll regret it.
But here’s my real recommendation: find yourself a good reading chair. You will need it. The 6-Volume Reader’s Bible doesn’t want to sit on the shelf. It wants a special nook next to a comfy chair and a lamp.
The ESV Reader’s Bible, Six-Volume Set Complete Series
More to come!
The post The ESV Reader’s Bible, Six-Volume Set — Part 1: Simply Beautiful appeared first on Bible Design Blog.
In an earlier post I commented on a post by Evan Klitzke on his reasons for recommending a move from PostgreSQL to MySQL.
The summary was that the technical details were incorrect, apart from two points. This post returns to those points to discuss what we’ve done about them.
1. When one indexed column is updated then currently all indexes need to be maintained. When you have lots of indexes this causes additional write traffic to disk and to the transaction log. The effect was described as “Write Amplification”, though that term is emotionally charged and implies something non-linear; it would be better to say just simply that the use case could be much more fully optimized than the current state.
My colleague Pavan Deolasee has written a patch to optimize this case better, which he calls the Write Amplification Reduction Method or WARM. That’s a great name because in technical terms the optimization is a relaxation of the HOT optimization, so its quite literally a cooler name. But most importantly it works very well, measured at 77% better performance for UPDATEs on tables with 4 indexes and well over 100% performance improvement for cases with more indexes. The patch for that has been submitted to PostgreSQL project for review.
2. PostgreSQL indexes refer to the heap location (via Tuple Identifier, or TID) directly, whereas MySQL secondary indexes refer to the tuple they index indirectly via the Primary Key. For MySQL, this capability avoids some, but not all of the penalty associated with write amplification, though at the cost of slowing down MySQL index reads.
My colleague Alvaro Herrera has developed a prototype for Indirect Indexes for PostgreSQL, based on enhancements to the btree index type. That seems a straightforward feature that we can add to PostgreSQL, though looks like it will work best with integer Primary Keys, much the same as MySQL. We’re seeing a 46% improvement on updates from the worst case. We have more work to do yet before we submit, but that is a pretty good start.
Yes, Evan highlighted some cases where PostgreSQL could benefit from some tuning. So thanks very much for that, we fully and genuinely appreciate that. Again I would highlight that those are not all cases, nor even the common case for most applications.
What I’d like to point out is that it’s about 8 weeks since Evan’s blog was published and we’ve already got two useful and effective solutions to the areas of poor performance highlighted. And what that shows is that these problems are not architectural limitations in the very heart of Postgres, they are just simple use cases that can be tuned, like many others. We’re hopeful that at least one of the above mentioned solutions is likely to get into the next release, PostgreSQL 10.0.
If Evan had come to us with those concerns earlier then we could have fixed them sooner. PostgreSQL is rapidly moving forwards – we made more than 500 improvements in PostgreSQL 9.6 and will be making even more in the future. It’s evolving quickly because we have lots of happy users experimenting with new and interesting use cases, including many technically savvy people who outline what they want to see (like Evan!).
Oh, and if you haven’t seen it, you really should see “Cool Runnings”, the film that is.
As an open source project, the Postgres community has always had great difficulty in measuring many aspects of Postgres adoption. For example, how many people use Postgres? We don't know, because people can get Postgres from so many sources, and we have no easy way to track them. Surveys tell us that Postgres is probably the fourth most popular database, but more detailed information has proven elusive. We do get detailed Postgres case studies occasionally, e.g. Yandex Mail, but these are, of course, single-user reports.
Fortunately, EnterpriseDB commissioned IDC to study its own Postgres customers. The recently-released report has some valuable information, both for users considering Postgres and for Postgres support companies trying to convince users to choose Postgres.
The report only surveyed seven EnterpriseDB customers, but that is probably a representative sample of enterprises using Postgres. The study interviewed each of them and got interesting statistics about administrative overhead, deployment flexibility, and the amount of money saved. It also has some nice charts and customer quotes.
Timing can often be extremely fortuitous. Yesterday marked the official release of Postgres 9.6!
I’ve covered 9.6 previously, but that was a beta and clearly doesn’t count. Besides, while the beta was undoubtedly high quality, the frequency of patch turnover is enough to produce a significantly different final release. So let’s skim through the release notes a bit for stuff that really stands out or seems different from last time.
Oddly enough, this is all pretty much the same as previous tests suggested. The final release seems to reflect almost exactly the same parallel performance as the beta. A simple query that relied on a sequential scan kept improving until we hit a total of six background workers addressing the query. They did rename the
max_parallel_degree setting to
max_parallel_workers_per_gather for some reason, but there’s no accounting for taste.
What is surprising is that nested loop performance still seems awful. This full test easily demonstrates that a parallel nested loop is much slower than a standard one:
\timing ON CREATE TABLE para_test AS SELECT a.id, repeat(' ', 20) AS junk FROM generate_series(1, 20000000) a(id); CREATE INDEX idx_test_id ON para_test (id); ANALYZE para_test; SET max_parallel_workers_per_gather TO 1; EXPLAIN ANALYZE SELECT p1.id FROM para_test p1 JOIN para_test p2 USING (id) WHERE id BETWEEN 1 AND 100000;
Admittedly our contrived query might confuse the planner sufficiently that performance worsens for only this specific case. But I was hoping they would “fix” nested loops before releasing the final iteration of 9.6. For now, I’m not sure whether or not to enable this, knowing what it seems to do to nested loops. It would be nice if there were a knob to have it only use parallelism on sequential scans until they fix this particular snag.
VACUUM no longer sucks!
Since tables require regular maintenance to mark dead tuples for reuse, VACUUM is the tool of choice. The release notes suggest that “frozen” pages are not revisited in subsequent vacuums. The key word here is “page”. Since Postgres’ page size is (almost always) 8KB, each page will generally contain several tuples. In any table that contains a large amount of static pages, these will be skipped in all subsequent vacuums. The implication here is that vast warehouse-style tables just got a lot easier to maintain.
This is a huge step forward. Interestingly, it goes well especially with partitioned tables, as older partitions tend to just sit there and gather dust. Vacuuming these over and over again is a massive waste of resources. Good riddance!
Synchronous replication got a notable improvement that brings it to parity with many other engines that provide high availability.
In Postgres synchronous replication, no transaction may return from commit until the first online listed replica acknowledges and writes the data. If there’s another candidate listed, it might as well not exist until the first one becomes unavailable for some reason. And that’s it. One other server gets writes from the primary at what could be considered real-time.
Well, in 9.6 things are a bit different. Now multiple synchronous standby servers can act in conjunction. If we have four synchronous standby servers for example, we can specify that writes are confirmed by at least two of them before proceeding. Or as in the case above, two standby servers and one must confirm.
What I’d like to see next is for Postgres to actually utilize the other listed servers as active alternates. Imagine there’s some network hiccough that disrupts or delays communication sufficiently without cutting the connection entirely. Suddenly the master can no longer commit transactions. Ouch. If any replica may confirm—not just the first available in the list—suddenly whichever is fastest wins. As long as at some minimum of synchronous systems report back, the state of the cluster is secure.
Besides that, there’s also the new
remote_apply setting. Replication usually works by the replica acknowledging the data was written to disk as a WAL file. For situations where this isn’t sufficient, now the confirmation won’t go through until the changes are actually replayed in the replica. In practice, this is almost the same thing. However, sufficiently high throughput could expose edge cases where it wasn’t. Not anymore.
Some of these line-items are easy to miss, but fantastically important. In previous versions, when Postgres was ready to write data to tables, it simply flushed working memory to disk in its current order. Sure they implemented various throttles and spreading to prevent overwhelming the disk subsystem, but we have something better now: write ordering. Now checkpoints are sorted in page order before being written to disk, substituting primarily random writes for sequential ones.
Along this same line, writes can now be batched in smaller groups. Dirty buffers get flushed to the kernel page cache, which will make it to storage… eventually. Unfortunately, some write strategies are better than others, and many operating systems optimize for throughput instead of latency. As a result, a few seconds of completely saturated write bandwidth is fine most of the time. Of course, this strategy is awful for databases.
Now Postgres has a stick to poke the kernel to tell it to flush to disk more often. Previously admins needed to tweak very crude kernel parameters that operated on percentages of memory, or overly restrictive byte counts. The first is terrible for systems with a lot of RAM, as flushing even 1% of 512GB would be a disaster to practically all types storage hardware. The second is better, but is often turned down too far, resulting in performance loss due to underutilized memory for faster access to dirty pages.
Table extensions are also allocated in wider swaths in the event multiple simultaneous requests come in at once. Instead of one linear extent per request and the implied locking, extents are multiplicative. For quickly growing tables, this should do an admirable job of reducing write contention.
It’s too bad spinning rust is on the way out, especially in server environments. Otherwise, this should greatly improve write metrics for disk-based systems until they’re finally retired.
Ever had a connection that was idle in transaction that prevented some DDL, which in turn blocked a bunch of pending updates? Well, no more! The very instant 9.6 goes live on our systems, I’m setting
idle_in_transaction_session_timeout to 3600 and never looking back. Timeouts like this are long overdue, really. Webservers have had this right for decades: never trust the client. The client is a big poopie-head.
The pgbench utility is great, as we’ve already explored, but we’ve still only really scratched the surface. That article took advantage of the fact that we could write custom scripts for testing, but did not mention that multiple scripts can be specified for the same test. Servers have mixed workload, so why not run mixed scripts? Each could contain a different access vector, set of tables, or read/write balance.
It’s not a perfect replacement for a full production harness that replays the previous day of activity or something equally grandiose, but it’s an excellent facsimile. Especially since it’s possible to individually set the probability of each script.
Beyond our tests with the beta, the fact the Postgres foreign data wrapper has advanced so much from the previous iteration in 9.5 is impressive. Foreign joins, sorts, and UPDATE or DELETE all give the remote system sufficient leverage to perform its own optimizations. The more data federation Postgres supports, the closer we get to a native sharding extension that doesn’t carry an interminable list of caveats and incompatibilities with the core Postgres feature set.
Postgres 9.6 is looking good so far. It’s great that there is a high amount of consistency from our earlier tests with the beta back in May. There’s a story in there regarding the reliability of the Postgres development model, that several commitfests can produce a beta that performs roughly the same as the final result after several months of battle testing and squashed bugs.
This is also the last release of Postgres 9.x. We hate to see it go, but the 9.x tree has already lived longer than any previous iteration. Still, it’ll be a weird feeling to see the new numbering scheme in action.
I, for one, welcome our Postgres 10 overlords.
PostgreSQL 9.6, the latest version of the world's leading open source database, was released today by the PostgreSQL Global Development Group. This release will allow users to both scale up and scale out high performance database workloads. New features include parallel query, synchronous replication improvements, phrase search, and improvements to performance and usability, as well as many more features.
Version 9.6 adds support for parallelizing some query operations, enabling utilization of several or all of the cores on a server to return query results faster. This release includes parallel sequential (table) scan, aggregation, and joins. Depending on details and available cores, parallelism can speed up big data queries by as much as 32 times faster.
"I migrated our entire genomics data platform - all 25 billion legacy MySQL rows of it - to a single Postgres database, leveraging the row compression abilities of the JSONB datatype, and the excellent GIN, BRIN, and B-tree indexing modes. Now with version 9.6, I expect to harness the parallel query functionality to allow even greater scalability for queries against our rather large tables," said Mike Sofen, Chief Database Architect, Synthetic Genomics.
Two new options have been added to PostgreSQL's synchronous replication feature which allow it to be used to maintain consistent reads across database clusters. First, it now allows configuring groups of synchronous replicas. Second, the "remote_apply" mode creates a more consistent view of data across multiple nodes. These features support using built-in replication to maintain a set of "identical" nodes for load-balancing read workloads.
The PostgreSQL-to-PostgreSQL data federation driver, postgres_fdw, has new capabilities to execute work on remote servers. By "pushing down" sorts, joins, and batch data updates, users can distribute workload across multiple PostgreSQL servers. These features should soon be added to other FDW drivers.
"With the capabilities of remote JOIN, UPDATE and DELETE, Foreign Data Wrappers are now a complete solution for sharing data between other databases and PostgreSQL. For example, PostgreSQL can be used to handle data input going to two or more different kinds of databases," said Julyanto Sutandang, Director of Business Solutions at Equnix.
PostgreSQL's full text search feature now supports "phrase search." This lets users search for exact phrases, or for words within a specified proximity to each other, using fast GIN indexes. Combined with new features for fine-tuning text search options, PostgreSQL is the superior option for "hybrid search" which puts together relational, JSON, and full text searching.
Thanks to feedback and testing by PostgreSQL users with high-volume production databases, the project has been able to improve many aspects of performance and usability in this release. Replication, aggregation, indexing, sorting, and stored procedures have all been made more efficient, and PostgreSQL now makes better use of resources with recent Linux kernels. Administration overhead for large tables and complex workloads was also reduced, especially through improvements to VACUUM.
Version 9.6 has many other features added over the last year of development, among them:
Additionally, the project has changed and improved the API for binary hot backups. As such, developers of custom backup software for PostgreSQL should do additional testing around the new version. See the Release Notes for more detail.