Shinguz: MySQL table Point-in-Time-Recovery from mysqldump backup
Taxonomy upgrade extras: BackupRestoreRecoverymysqldumppoint-in-time-recoverypitrSometimes we face the situation where we have a full MySQL database backup done with mysqldump and then we have to restore and recover just one single table out of our huge mysqldump file.
Further our mysqldump backup was taken hours ago so we want to recover all the changes on that table since our backup was taken up to the end.
In this blog article we cover all the steps needed to achieve this goal for MySQL and MariaDB.
Recommendation: It is recommended to do theses steps on a testing system and then dump and restore your table back to the production system. If you do it directly on your production system you have to know exactly what you are doing...
Further this process should be tested carefully and regularly to get familiar with it and to assure your backup/restore/recovery procedure works properly.
The table we want to recover is called test.test from our backup full_dump.sql.gz. As a first step we have to do the recovery with the following command to our test database:
shell> zcat full_dump.sql.gz | extract_table.py --database=test --table=test | mysql -u root The script extract_table.py is part of the FromDual Recovery Manager to extract one single table from a mysqldump backup.
As a next step we have to extract the binary log file and its position where to start recovery from out of our dump:
shell> zcat full_dump.sql.gz | head -n 25 | grep CHANGE
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000216', MASTER_LOG_POS=1300976;Then we have to find out where we want to stop our Point-in-Time-Recovery. The need for recover is possibly due to a TRUCATE TABLE command or similar operation executed on the wrong system or it is just a time somebody has indicated us to recover to. The position to stop we can find as follows:
shell> mysqlbinlog -v mysql-bin.000216 | grep -B5 TRUNCATE --color
#150123 19:53:14 server id 35622 end_log_pos 1302950 CRC32 0x24471494 Xid = 3803
# at 1302950
#150123 19:53:14 server id 35622 end_log_pos 1303036 CRC32 0xf9ac63a6 Query thread_id=54 exec_time=0 error_code=0
TRUNCATE TABLE testAnd as a last step we have to apply all the changes from the binary log to our testing database:
shell> mysqlbinlog --disable-log-bin --database=test --start-position=1300976 --stop-position=1302950 mysql-bin.000216 | mysql -u root --forceNow the table test.test is recovered to the wanted point in time and we can dump and restore it to its final location back to the production database.
shell> mysqldump --user=root --host=testing test test | mysql --user=root --host=production testThis process has been tested on MySQL 5.1.73, 5.5.38, 5.6.22 and 5.7.5 and also on MariaDB 10.0.10 and 10.1.0.
So long, and thanks for all the help.
Today is my last day at Tokutek. On Monday I'm starting a new opportunity as VP/Technology at CrunchTime!. If you are a web developer, database developer, or quality assurance engineer in the Boston area and looking for a new opportunity please contact me or visit the CrunchTime! career page.I've really enjoyed my time at VoltDB and Tokutek. Working for Mike Stonebraker (at VoltDB) was on my career "bucket list" and in these past 3.5 years at Tokutek I've experienced the awesomeness of the MySQL ecosystem and the surging NoSQL database market.But now I'm ready to going back to consuming databases, not creating them. It's probably a good idea for anyone in technology to work on the other side of the producer/consumer line, nothing beats "real-world" experience.If you like what I've been writing here then keep this blog in your RSS reader and/or follow me on twitter using @tmcallaghan and @acmebench. My goal is to keep blogging twice a month. I hope there will be less concern about my benchmarking objectivity, as I'll no longer have a stake in the benchmarked technologies.Here are some topics I've already started working on:Creating a brand new benchmark that has more "real world" appeal than existing ones. I've stared work on an "email server" workload with operations like send, read, label, search, forward, reply, etc. MongoDB has hinted that they are creating an "inbox fan in/out" benchmark which might be similar (I wish they'd make it public), mine will be available on my GitHub and will likely support multiple database technologies.A lot of MongoDB 2.8 benchmarks comparing the MMAPV1, WiredTiger, and TokuMXse Storage Engines as well as TokuMX.MySQL benchmarks including TokuDB plus the 5.7 enhancements.Analysis of good and bad benchmarks that I've seen. Please email me or comment on this blog if you've found an interesting new benchmark that could use review.And, of course, blogs about the practice of benchmarking itself.Lastly, I'd like to give a shout out to many people and companies I've worked with over the past 6+ years. You've all been welcoming and supportive, and really made my job fun. Specifically,My "greater-known but nonethemore smart" brother, Mark Callaghan.My amazing team at Tokutek, a true bunch of wall-breakers. I'll miss Rich, Zardosht, Leif, John, Christian, Joel, Joe, and Abdelhak.Gerry and Sheeri for letting me sing the jingle on the OurSQL Podcast.The entire MySQL ecosystem.Vadim, Peter, and Rob at Percona, Amrith at Tesora, Robert at Continuent [now VMware], Baron at VividCortex, Shlomi at Booking.com, and Henrik at MongoDB.Too many to people to name from Percona, Oracle/MySQL, DbHangOps, VoltDB, and MariaDB.And lastly thanks to everyone who has attended one of my webinars or presentations, commented on my blogs, or used TokuDB/TokuMX (commercially or community).My personal email address is available by clicking the disclaimer widget on the right hand side of the screen.So long... and stay tuned.
Using Percona Cloud Tools to solve real-world MySQL problems
For months when speaking with customers I have been positioning Percona Cloud Tools (PCT) as a valuable tool for the DBA/Developer/SysAdmin but only recently have I truly been able to harness the data and make a technical recommendation to a customer that I feel would have been very difficult to accomplish otherwise.Let me provide some background: I was tasked with performing a Performance Audit for one of our customers (Performance Audits are extremely popular as they allow you to have a MySQL Expert confirm or reveal challenges within your MySQL environment and make your database run faster!) and as part of our conversation we discussed and agreed to install Percona Cloud Tools. We let the site run for a few days, and then I started my audit. What I noticed was that at regular intervals there was often a CPU spike, along with a corresponding drop in Queries Per Second (QPS), but that lasted only for a few seconds. We decided that further investigation was warranted as the customer was concerned the spikes impacted their users’ experience with the application.Here are the tasks that Percona Cloud Tools made easy while I worked to identify the source of the CPU spike and QPS drop:Per-second granularity data capture of PCT allowed me to identify how significant the spike and QPS actually were – if I was looking at the 1 minute or higher average values (such as Cacti would provide) I probably wouldn’t have been able to detect the spike or stall as clearly in the first place, it would have been lost in the average. In the case of PCT the current graphs group at the 1 minute range but you have the ability to view the min and max values during this 1 minute range since they are the true highest and lowest observed 1s intervals during the 1 minute group.Ability for all graphs to maintain the same resolution time allowed me to zero-in on the problematic time period and then quickly look across all graphs for corresponding deflections. This analysis led me to discover a significant spike in InnoDB disk reads.Ability to use the Query Analytics functionality to zoom-in again on the problematic query. By adjusting Query Analytics to an appropriate time period narrowed down the range of unique queries that could be considered the cause. This task in my opinion is the best part of using PCT.Query Analytics allowed me to view the Rows Examined in Total for each query based on just this shortened interval. I then tagged those that had higher than 10k Rows Examined (arbitrary but most queries for this customer seemed to fall below this) so that I could then review in real-time with the customer before making a decision on what to do next. We can only view this sort of information by leveraging the slow query log – this data is not available via Performance_Schema or via network sniffing.Once we were able to identify the problematic queries then the rest was routine query optimization – 10 minutes work using Percona Cloud Tools for what might have been an hour using traditional methods!For those of you wondering how else this can be done, assuming you detected the CPU spike / QPS drop (perhaps you are using Graphite or other tool that can deliver per-second resolution) then you’d also need to be capturing the slow query log at a good enough resolution level (I prefer long_query_time=0 to just get it all), and then be adept at leveraging pt-query-digest with –since and –until options to narrow down your range of queries. The significant drawback to this approach is that each time you want to tweak your time range you probably need to stream through a fairly large slow log file multiple times which can be both CPU and disk intensive operations, which means it can take some time (probably minutes, maybe hours) depending on the size of your log file. Certainly a workable approach but nowhere near as quick as reloading a page in your browser So what are you waiting for? Start using Percona Cloud Tools today, it’s free! Register for the free beta here.The post Using Percona Cloud Tools to solve real-world MySQL problems appeared first on MySQL Performance Blog.
Proposal to change additional defaults in MySQL 5.7
Following on from my earlier proposal to change Replication + InnoDB settings, in the MySQL team, we are proposing to make the following additional changes to defaults in MySQL 5.7:
Enable events_statements_history and events_transactions_history consumers by default
MySQL CLI Prompt
prompt="\u@\h [\d] > "
To explain the motivation behind these changes:
The binlog_error_action variable was introduced in MySQL 5.6 after the GA release, and for compatibility defaulted to previous behavior (ignore errors). Since some applications depend on reading from slaves after writing to masters, we feel that the better behavior is to abort.
The InnoDB CRC32 checksum was introduced in MySQL 5.6, and offers CPU accelerated page checksum computation. Since previous versions of MySQL were unable to read the new checksum, enabling this feature would have restricted downgrades to dump-and-restore. With MySQL 5.6 supporting the CRC32 checksum, the timing now makes sense to make this the default in 5.7.
In some workloads the InnoDB purge and page cleaning operations will not be able to keep up with modifications made inside InnoDB. In combination with newer storage devices that offer very high throughput and accommodate parallel writes, we feel that having multiple threads is a better default.
Currently the InnoDB storage engine ignores CREATE TABLE options that it does not understand (innodb_strict_mode=0). We feel this negatively impacts user experience, and are proposing to enable InnoDB strict mode in advance. The downside of this change is that some DDL statements that worked in 5.6 failing by default in 5.7. Thank you to Daniël van Eeden for providing this suggestion.
Since the InnoDB buffer pool now supports online resizing (and the innodb log files do not), we believe it makes sense to increase the size of the log files slightly in anticipation of an increased buffer pool size. This comes at a small cost of additional disk space usage. We felt that 128M was a good balance in size, since while larger log files are better for performance, there is also a trade off point where the log files benefit from filesystem caches. Thank you to Justin Swanhart for providing this suggestion.
With the increased availability of large memory, waiting for caches to warm back up after restarts has been an increased operational concern. MySQL 5.6 introduced a feature to be able to dump buffer pool page addresses (at a small fraction of total size) to assist in warming caches after restarts. MySQL 5.7 introduces an additional feature to dump only a percentage of the buffer pool, and we felt that by limiting this to 25% was a good tradeoff to enable this by default. Thank you to Daniel Black and Daniël van Eeden for this suggestion.
The InnoDB Barracuda file format was first introduced in the InnoDB plugin for MySQL 5.1, and enables InnoDB to use the new DYNAMIC row format (with improved BLOB handling) and COMPRESSED tables. Setting the InnoDB file format to Barracuda depended on innodb-file-per-table=1 (default changed in MySQL 5.6), and introduced an incompatibility that would prevent downgrades to earlier releases. With both MySQL 5.5 and 5.6 supporting Barracuda, the timing now makes sense to enable this by default. Thank you to Daniël van Eeden for providing this suggestion.
The innodb_large_prefix prefix option was introduced for Barracuda tables as an option to support much longer index key prefixes (up to 3072 bytes). It was disabled by default so that users did not accidentally create Barracuda tables that would be incompatible on Antelope systems (which would not allow downgrading). With Barracuda becoming the default, and our friends at WordPress and Drupal wanting to create much larger indexes to support utf8mb4 encoded columns, we decided to change this to default to on.
The binlog_gtid_recovery_simplified option was released after MySQL 5.6 was released as GA, and defaulted to OFF for backwards compatibility. We feel that ON is the better default, and plan to make this change for MySQL 5.7.
With the addition of binary log group commit in MySQL 5.6, the impact from enabling sync_binlog=1 on many non-synthetic workloads should be reduced considerably. This change makes MySQL binary logs durable/crash safe by default.
The slave_net_timeout defaults to one hour, which is too long for most modern systems. In addition: MySQL 5.7 now uses this value divided by two to configure the default master heartbeat period. We believe that a one minute default is more appropriate. Thank you to Miguel Angel Nieto for providing this suggestion.
In MySQL 5.7, performance_schema can now instrument transactions, which can be incredibly useful when enabled by default. In addition, enabling statements history (introduced in 5.6) provides a lot of useful meta-data for diagnosing performance problems. Thank you to Daniël van Eeden for providing this suggestion.
The MySQL command line client has long supported the ability to change the default prompt from "mysql >" to include additional details such as the username and database selected. This is mostly a cosmetic change, but it is useful when you are managing multiple servers. Thank you to Daniël van Eeden for providing this suggestion.
In MySQL 5.7 we have already made changes to the default SQL mode: enabling ONLY_FULL_GROUP_BY and STRICT_TRANS_TABLES by default. There is a long standing usability issue with user grants, where a typo in the username could result in a new user being created. We have decided that it is best to disable this behavior by enabling the NO_AUTO_CREATE_USER sql mode by default. Thank you to Daniël van Eeden for providing this suggestion.
In addition to these changes, we have decided to withdraw our proposal to change the default isolation level to READ-COMMITTED. We will re-evaluate this decision for future releases.
For those wishing to test out these new configuration settings, I have configuration file available to try out:
For MySQL 5.7
For MySQL 5.6 (enables all 5.7 options where applicable)
Please let us know what you think of these changes!
You can leave a comment here, or get in touch via email.
MaxScale Firewall Filter
Fri, 2015-01-23 15:14markusmakelaMaxScale’s filter system is very flexible and enables a new way of interacting with queries. The upcoming firewall filter shows just one of the many ways that you can control and manage the flow of queries through MaxScale.
The firewall filter is meant to offer finer and more varied control over queries and their execution. The main idea of the filter is to work as a base to build and improve upon and to show just how that can be done with MaxScale.
The filter provides a variety of ways to control which kinds of queries get executed. The simplest ones block queries that happen during a certain time range or do a certain operation. The more complex ones can match queries using a regular expression, check for the existence of a WHERE clause in the query or deny the query on the basis of the current session’s query speed. These rules can be applied to specific users or network ranges or the combination of these two. This is done to allow the rules of the filter to be more easily integrated into an existing environment which already has rules in the database.
Rules and Users
To define rules for the filter first the name of the rule must be defined. This is used when the rules are applied to users. After that the keyword ‘deny’ marks the beginning of the rule’s content. The required part for the content is one of the wildcard, columns, regex, limit_queries or no_where_clause keywords. The wildcard keyword requires no other parameters and only checks if the query uses the wildcard character. The columns keyword requires a list of columns and checks if the query targets one of these. The regex keyword expects a regular expression enclosed in single or double quotes. All quotes inside the regular expression need to be escaped with the backslash character. The limit_queries keyword requires three parameters. It first expects a number of queries then the time period in seconds in which the queries are measured and finally the time in seconds for which further queries are denied if the amount of queries exceeds the first parameter during the time period defined in the second parameter. The no_where_clause keyword checks if the query has a WHERE -clause.
Syntax for defining the rules
rule NAME deny [wildcard | columns VALUE ... | regex REGEX |
limit_queries COUNT TIMEPERIOD HOLDOFF | no_where_clause]
[at_times VALUE...] [on_queries [select|update|insert|delete]]
Syntax for applying rules to users
users NAME ... match [any|all] rules RULE ...
To allow easy modification of rules without having to touch MaxScale’s configuration file this filter uses an external text file to store the rules. All the rules and the users to whom they are applied to need to be in this file. This file is provided to the filter by entering the path of the rule file into the configuration file. Here is an example configuration of the firewall filter with the ‘rules’ variable being the one that points to the file that contains the rules.
The contents of the rule file could be as follows.
rule peak_hour1 deny limit_queries 1000 1.0 5 at_times 17:00:00-17:30:00
rule peak_hour2 deny wildcard at_times 17:00:00-17:30:00
rule personal deny columns salary phone on_queries update|delete
users email@example.com.% match any rules personal
users %@% match any rules peak_hour1 peak_hour2
This defines three rules and applies one of them to the user ‘maxuser’ from the address 192.168.% and the other two to all users from any network. The first rule states that if the speed of the incoming queries exceeds 1000 queries per second then further queries are blocked for the next five seconds. The second one denies the usage of the wildcard. These two rules are only active between the times 17:00 and 17:30. The third rule denies updates and deletes that target the columns ‘salary’ or ‘phone’.
What does this allow me to do?
Let’s say I’m facing a problem with a couple of large and complex read queries taking too much time and slowing down a large database too much. Now let’s say we can’t route them to a secondary slave during peak hours because resources are limited and we still do not want to allow these queries to slow the database down. If we knew exactly who was doing these queries we could just block those users and be done with it. But in this situation every query from those users is blocked.
This is where the firewall filter comes in. If we know what the heavy queries look like (let’s say we used MaxScale’s topfilter to find it out) we could come up with a regular expression that matches one, a part or all of them. Also we could only make the rule active during peak hours when the performance of the database is critical. By detecting these queries we can still allow normal queries from these users during peak hours while still allowing the heavy queries outside of peak hours.
This is just one example of a tailored solution to a specific problem and how the modularity of MaxScale enables you expand your horizons. The firewall filter could be seen as a surgeon’s blade that operates only on things that need to be operated on and is only brought out when needed.
About the Author
Markus Mäkelä is a Software Engineer working on MaxScale. Markus is finishing his studies at Metropolia University of Applied Sciences in Helsinki, Finland.