MariaDB CONNECT Storage Engine JSON Autodiscovery
The MariaDB CONNECT storage engine offers access to JSON file and allows you to see a external JSON file as a MariaDB table. A nice feature of the CONNECT storage Engine is its capability to auto discover a table structure when the table correspond to external data. In our case the CONNECT storage engine will automatically [...]
Good Leaders are game changers: Raft & Paxos
Consensus is a key component to providing fault-tolerant services such as synchronously replicated data stores, non-blocking atomic commitment and Paxos and Raft are among the most popular consensus algorithms. Paxos has been widely studied by researchers while Raft has become very popular among engineers.…
Use MySQL and Perl to automatically find, follow and unfollow twitter users
A friend of mine asked me how they could automatically follow and unfollow people on Twitter. But they didn’t want to follow just anyone and everyone. He had a Twitter account which they used for recruiting in a very narrow construction industry.…
Become a ClusterControl DBA: Safeguarding your Data
In the past four posts of the blog series, we covered deployment of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health and in the last post, how to make your setup highly available through HAProxy and MaxScale.
So now that you have your databases up and running and highly available, how do you ensure that you have backups of your data?
You can use backups for multiple things: disaster recovery, to provide production data to test against development or even to provision a slave node. This last case is already covered by ClusterControl. When you add a new (replica) node to your replication setup, ClusterControl will make a backup/snapshot of the master node and use it to build the replica. After the backup has been extracted, prepared and the database is up and running, ClusterControl will automatically set up replication.
Creating an instant backup
In essence creating a backup is the same for Galera, MySQL replication, Postgres and MongoDB. You can find the backup section under ClusterControl > Backup and by default it should open the scheduling overview. From here you can also press the “Backup” button to make an instant backup.
As all these various databases have different backup tools, there is obviously some difference in the options you can choose. For instance with MySQL you get choose between mysqldump and xtrabackup. If in doubt which one to choose (for MySQL), check out this blog about the differences and use cases for mysqldump and xtrabackup.
On this very same screen, you can also create a backup schedule that allows you to run the backup at a set interval, for instance, during off-peak hours.
Backing up MySQL and Galera
As mentioned in the previous paragraph, you can make MySQL backups using either mysqldump or xtrabackup. Using mysqldump you can make backups of individual schemas or a selected set of schemas while xtrabackup will always make a full backup of your database.
In the Backup Wizard, you can choose which host you want to run the backup on, the location where you want to store the backup files, and its directory and specific schemas.
If the node you are backing up is receiving (production) traffic, and you are afraid the extra disk writes will become intrusive, it is advised to send the backups to the ClusterControl host. This will cause the backup to stream the files over the network to the ClusterControl host and you have to make sure there is enough space available on this node.
If you would choose xtrabackup as the method for the backup, it would open up extra options: desync, compression and xtrabackup parallel threads/gzip. The desync option is only applicable to desync a node from a Galera cluster.
After scheduling an instant backup you can keep track of the progress of the backup job in the Settings > Cluster Jobs. After it has finished, you should be able to see the backup file in the configured location.
Backing up PostgreSQL
Similar to the instant backups of MySQL, you can run a backup on your Postgres database. With Postgres backups the are less options to fill in as there is one backup method: pg_dump.
Backing up MongoDB
Similar to PostgreSQL there is only one backup method: mongodump. In contrary to PostgreSQL the node that we take the backup from can be desynced in the case of MongoDB.
Now that we have played around with creating instant backups, we now can extend that by scheduling the backups.
The scheduling is very easy to do: you can select on which days the backup has to be made and at what time it needs to run.
For xtrabackup there is an additional feature: incremental backups. An incremental backup will only backup the data that changed since the last backup. Of course, the incremental backups are useless if there would not be full backup as a starting point. Between two full backups, you can have as many incremental backups as you like. But restoring them will take longer.
Once scheduled the job(s) should become visible under the “Current Backup Schedule” and you can edit them by double clicking on them. Like with the instant backups, these jobs will schedule the creation of a backup and you can keep track of the progress via the Cluster Jobs overview if necessary.
You can find the Backup Reports under ClusterControl > Backup and this will give you a cluster level overview of all backups made. Also from this interface you can directly restore a backup to a host in the master-slave setup or an entire Galera cluster.
The nice feature from ClusterControl is that it is able to restore a node/cluster using the full+incremental backups as it will keep track of the last (full) backup made and start the incremental backup from there. Then it will group a full backup together with all incremental backups till the next full backup. This allows you to restore starting from the full backup and applying the incremental backups on top of it.
Offsite backup in Amazon S3 or Glacier
Since we have now a lot of backups stored on either the database hosts or the ClusterControl host, we also want to ensure they don’t get lost in case we face a total infrastructure outage. (e.g. DC on fire or flooded) Therefore ClusterControl allows you to copy your backups offsite to Amazon S3 or Glacier.
To enable offsite backups with Amazon, you need to add your AWS credentials and keypair in the Service Providers dialogue (Settings > Service Providers).
Once setup you are now able to copy your backups offsite:
This process will take some time as the backup will be sent encrypted and the Glacier service is, in contrary to S3, not a fast storage solution.
After copying your backup to Amazon S3 or Glacier you can get them back easily by selecting the backup in the S3/Glacier tab and click on retrieve. You can also remove existing backups from Amazon S3 and Glacier here.
An alternative to Amazon S3 or Glacier would be to send your backups to another data center (if available). You can do this with a sync tool like BitTorrent Sync. We wrote a blog article on how to set up BitTorrent Sync for backups within ClusterControl.
We showed you how to get your data backed up and how to store them safely off site. Recovery is always a different thing. ClusterControl can recover automatically your databases from the backups made in the past that are stored on premises or copied back from S3 or Glacier. Recovering from backups that have been moved to any other offsite storage will involve manual intervention though.
Obviously there is more to securing your data, especially on the side of securing your connections. We will cover this in the next blog post!
Blog category: ClusterControlTags: backupclustercontrolMariaDBMongoDBMySQLpostgresPostgreSQLxtrabackup
Read, write & space amplification - pick 2
Good things come in threes, then reality bites and you must choose at most two. This choice is well known in distributed systems with CAP, PACELC and FIT. There is a similar choice for database engines. An algorithm can optimize for at most two from read, write and space amplification. These are metrics for efficiency and performance. This means one algorithm is unlikely to be better than another at all three. For example a B-Tree has less read amplification than an LSM while an LSM has less write amplification than a B-Tree. I abbreviate the metrics as read-amp, write-amp and space-amp. I also abbreviate this as the framework.The framework assumes a database workload that consists of point-queries, range-queries of length N and writes. Were I to add a delete operation then this would match the RocksDB and LevelDB API. The write is a blind-write as it doesn't imply a read prior to the write.This is part one of a topic that requires several blog posts. The second post will compare a B-Tree and LSM using the framework. The third post will argue that an algorithm cannot be optimal for all three metrics.PurposeRead, write and space amplification explain performance and efficiency when evaluating algorithms for real and potential workloads. They aren't a replacement for Big O notation. They usually assume a specific workload and configuration including RAM size, database size and type of storage.We began using the framework to compare InnoDB and RocksDB because better performance is an insufficient metric on which to choose an algorithm. Endurance (write amp) and capacity (space amp) matter when using flash. IOPs (read amp for point and range queries, write amp for writes) matters when using disk.The framework is useful for understanding the compromises made in search of better QPS. It is easy to trade write for space or read efficiency in write-optimized algorithms but these trades should be disclosed because they are not free. New algorithms can show better write throughput than RocksDB by making range reads less efficient but the Linkbench workload needs efficient writes and efficient range reads.The framework is useful because key comparisons aren't created equal. Traditional algorithm analysis is great for understanding in-memory performance via bounds on the number of key comparisons. But big-O notation is harder to use when some keys are read from cache, others from RAM and some from disk. Constant factors matter. The difference between 1.2 and 1.5 disk reads per query can be a big deal.Read amplificationRead-amp is the amount of work done per logical read operation. This can be defined for in-memory databases, persistent databases assuming no cache (worst-case behavior) and persistent databases assuming some cache (average-case behavior). The work done in-memory can be the number of key comparisons and traditional algorithm analysis can be used. The work done on-disk includes the number of bytes transferred and seeks (seeks matter on disks, not on NVM). The work done can also include the cost of decompressing data read from storage which is a function of the read block size and compression algorithm.Read-amp is defined separately for point and range queries. For range queries the range length matters (the number of rows to be fetched). In Linkbench the average range query fetches about 20 rows.Read-amp can also be defined for point queries on keys that don't exist. Some algorithms use a bloom filter to avoid disk IO for keys that don't exist. Queries for non-existent keys is common in some workloads. Bloom filters can't be used for a range query. The most frequent query in Linkbench is a range query that includes an equality predicate on the first two columns of the range query index. With RocksDB we define a prefix bloom filter to benefit from that.Write amplificationWrite-amp is the amount of work done per write operation. This can include the number of bytes written to storage and disk seeks per logical write operation. This can be split into in-memory and on-disk write-amp but I frequently ignore in-memory write-amp.There is usually a cost to pay in storage reads and writes following a logical write. With write-amp we are ignoring the read cost. The read cost is immediate for an update-in-place algorithm like a B-Tree as a page must be read to modify it. The read cost is deferred for a write-optimized algorithm like an LSM as compaction is done in the background and decoupled from the logical write. There is usually some write cost that is not deferred - updating in-memory structures and writing a redo log.With flash storage there is usually additional write-amp from the garbage collection done by the FTL to provide flash blocks that can be rewritten. Be careful about assuming too much about the benefit of sequential and large writes from a write-optimized database engine. While the physical erase block size on a NAND chip is not huge, many storage devices have something that spans physical erase blocks when doing GC that I will call a logical erase block. When data with different lifetimes ends up in the same logical erase block then the long-lived data will be copied out and increase flash GC write-amp (WAF greater than 1). I look forward to the arrival of multi-stream to reduce flash GC WAF.Space amplificationSpace-amp is the ratio of the size of the database to the size of the data in the database. Compression decreases space-amp. It is increased by fragmentation with a B-Tree and old versions of rows with an LSM. A low value for space-amp is more important with flash storage than disk because of the price per GB for storage capacity.Efficiency & PerformanceI work on small data systems. Small data is another name for OLTP. Small data workloads are highly concurrent and with concurrency better efficiency usually implies better performance. But performance and efficiency are not always strongly correlated. For example an algorithm with a high read-amp for range queries might hide the extra latency by doing disk reads in parallel. This improves response time but doesn't improve efficiency and the algorithm with less read-amp will sustain more QPS at higher concurrency.