Banner
Home News Feeds Planet MySQL
Newsfeeds
Planet MySQL
Planet MySQL - http://www.planetmysql.org/

  • Alternatives for chunking bulk deletes in common_schema
    I've blogged about common_schema multiple times in the past, and it's a tool I use frequently. Last week I had a project to delete millions of rows from multiple rollup tables in a star schema. Since the tables are not partitioned I needed to use DELETE instead of DROP PARTITION, but I didn't want to delete millions of rows in a single transaction. My first instinct was to use common_schema's split() function to break the deletes into chunks. So I ran a query on INFORMATION_SCHEMA to generate a bunch of statements like this: call common_schema.run("split(delete from rollup_table1 where the_date > '2013-03-30') pass;"); call common_schema.run("split(delete from rollup_table2 where the_date > '2013-03-30') pass;"); call common_schema.run("split(delete from rollup_table3 where the_date > '2013-03-30') pass;"); ... That's the simplest way to do deletes with split(), and the tool will automatically determine which index and what chunk size to use. If I were running this on an active database (or a master) I would probably use throttle to control the speed of the deletes, but in this case it was running on passive replicas so I just used pass to run the deletes with no sleep time in between them. I sorted the deletes by table size, from smallest to largest, and had a total of 33 tables to process. I started running the SQL on a stage database and it deleted data from the first 32 tables with no problem, but it got stuck on the 33rd table. I checked the process list and found this query running (table and column names have been changed for simplicity): select REQ_DATE, DIMENSION1, DIMENSION2, DIMENSION3, DIMENSION4, DIMENSION5 from `flite`.`rollup_table33` order by REQ_DATE ASC, DIMENSION1 ASC, DIMENSION2 ASC, DIMENSION3 ASC, DIMENSION4 ASC, DIMENSION5 ASC limit 1 into @_split_column_variable_min_1, @_split_column_variable_min_2, @_split_column_variable_min_3, @_split_column_variable_min_4, @_split_column_variable_min_5, @_split_column_variable_min_6 The relevant table has about 100 million rows, but I would expect the above query to be fairly fast since there is a unique index on the columns being selected. I ran an explain and found that the query was doing a full table scan. I'm not sure exactly why it was doing a full table scan, but the table does have some quirks: Two of the dimension columns use the TEXT data type, and thus only a substring from each fo those column is indexed The default charset for the table is latin1, but for the TEXT columns it is utf8 The table uses ROW_FORMAT=DYNAMIC Rather than trying to figure out exactly why that query was doing a full table scan, I checked the common_schema documentation to see if there were any options I could use to avoid running this particular query. The parameters for split are fairly limited, but I did try using start to see if that would get around it: call common_schema.run("split({start:'2013-03-30,0,,0,0,', size: 10000} : delete from rollup_table3 where the_date > '2013-03-30') pass;"); I thought telling split() where to start and what chunk size to use might help, but it still ran the same query with the full table scan. At that point I started looking for alternatives to split(). I remembered seeing an example in the documentation using WHILE to do a delete with a WHERE clause and a LIMIT. Here's the example: while (DELETE FROM world.Country WHERE Continent = 'Asia' LIMIT 10) { throttle 2; } So I decided to try that with my table. Here's the command I ran: call common_schema.run(" while (delete from rollup_table33 where req_date > '2013-06-14' LIMIT 10000) { pass; }"); As I expected that delete started fast and slowed down over time, but within 3 hours the deletes were done! At some point I intend to go back and figure out why my first approach was so slow, and might file a bug report against common_schema. But for now I'm happy that I could get the tool to do what I needed thanks to its flexibility.

  • Announcing iiBench for MySQL in Java
    I just pushed the new Java based iiBench for MySQL (and Percona Server and MariaDB), the code and documentation are available now in the iibench-mysql Github repo. Pull request are welcome!The history of iiBench goes back to the early days of Tokutek. Since "indexed insertion" is a strength of Fractal Tree indexes, the first iiBench was created by Tokutek in C++ back in 2008. Mark Callaghan rewrote iiBench in Python, adding several features along the way. His version of iiBench is available in Launchpad.So why did I create a new iiBench in Java?Raw SpeedBy eliminating the insert calls from each version I tested how many inserts per second each version. Any time spent waiting for the next batch of inserts is time that could be put toward inserting rows (and yes I understand that concurrent clients can reduce this concern).All tests were run on my desktop (Intel i7-4790K). As the below graph shows, the 1 thread version of the Java iiBench is almost 4x faster than the 4 threaded Python iiBench, and the 4 thread Java version scales quite nicely.CapabilityBecause of the Python's Global Interpreter Lock, I need to run 4 copies of the Python iiBench to create 4 loader "threads". Each of these benchmark clients creates it's own log files that need to be aggregated to show cumulative insertion performance. Java threading handles it cleanly, allowing a single client application to run regardless of how many client threads are needed.MiscellaneousNot really relevant to the version or language, but I cringe every time I need to get software from Launchpad. I am not a fan of Bazaar.Again, contributions/comments/complaints are always welcome, here or in github!

  • Percona Server 5.6.21-70.1 is now available
    Percona is glad to announce the release of Percona Server 5.6.21-70.1 on November 24, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.Based on MySQL 5.6.21, including all the bug fixes in it, Percona Server 5.6.21-70.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free. Complete details of this release can be found in the 5.6.21-70.1 milestone on Launchpad.Bugs Fixed:A slave replicating in RBR mode would crash, if a table definition between master and slave differs with an allowed conversion, and the binary log contains a table map event followed by two row log events. This bug is an upstream regression introduced by a fix for bug #72610. Bug fixed #1380010.An incorrect source code function attribute would cause MySQL to crash on an InnoDB row write, if compiled with a recent GCC with certain compilation options. Bug fixed #1390695 (upstream #74842).MTR tests for Response Time Distribution were not packaged in binary packages. Bug fixed #1387170.The RPM packages provided for CentOS 5 were built using a debugging information format which is not supported in the gdb version included with CentOS 5.10. Bug fixed #1388972.A session on a server in mixed mode binlogging would switch to row-based binlogging whenever a temporary table was created and then queried. This switch would last until the session end or until all temporary tables in the session were dropped. This was unnecessarily restrictive and has been fixed so that only the statements involving temporary tables were logged in the row-based format whereas the rest of the statements would continue to use the statement-based logging. Bug fixed #1313901 (upstream #72475).Other bugs fixed: #1387227, and #1388001.Release notes for Percona Server 5.6.21-70.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker The post Percona Server 5.6.21-70.1 is now available appeared first on MySQL Performance Blog.

  • Log Buffer #398, A Carnival of the Vanities for DBAs
    This Log Buffer Edition covers some informative and interesting posts from Oracle, SQL Server and the MySQL. Oracle: If you’re not using Toad DBA Suite, it’s sometimes hard to find solutions. Somebody wanted to know how to find indexes that aren’t that aren’t indirect. Indirect indexes are those created for a primary key because a primary key column or set of columns are both not null and uniquely constrained. You can’t drop a unique index for a primary key without dropping the primary key constraint that indirectly created it. At the NoCOUG fall conference at the eBay town hall in San Jose, we got a first-hand look at the workings of the most complex database environments in the world. Is there a feature you would like added to Inventory or Inventory Items? Tired of logging Service Requests with Support to request product enhancements? Well those days are over. You can now submit Enhancement Requests (ER’s) for Logistics (Inventory) and/or Inventory Items (APC/PLM/PIM) directly in their respective Communities. Oracle Database 12c : EXPAND_SQL_TEXT, APPROX_COUNT_DISTINCT, Session Sequences and Temporary Undo. Integrating Cordova Plugin with Oracle MAF – iOS Calendar Plugin by Chris Muir. SQL Server: Learn how to invoke SSRS reports from an SSIS package after the data load is completed. Questions About Using TSQL to Import Excel Data You Were Too Shy to Ask. This article shows a step-by-step guide to move the distribution database to a new SQL Server instance. Monitoring Azure SQL Database. Stairway to SQL Server Agent – Level 2: Job Steps and Subsystems. Where in the Application Should Data Validation be Done? MySQL: Creating JSON documents with MariaDB. Geographic replication with MySQL and Galera. Sys Schema for MySQL 5.6 and MySQL 5.7. Logging with MySQL: Error-Logging to Syslog & EventLog. Multi-source Replication with Galera Cluster for MySQL.

  • Schema changes in MySQL for OpenStack Trove users
    People using OpenStack Trove instances can hit a common issue in the MySQL world: how to perform schema change operations while minimizing the impact on the database server? Let’s explore the options that can allow online schema changes.SummaryWith MySQL 5.5, pt-online-schema-change from Percona Toolkit is your best option for large tables while regular ALTER TABLE statements are only acceptable for small tables. Also beware of metadata locks.With MySQL 5.6, almost all types of schema changes can be done online. Metadata locks can also be an issue. pt-online-schema-change can still be worth using as it is also online on read replicas.Regular ALTER TABLE with MySQL 5.5If you are still using MySQL 5.5, almost all schema changes will require a table rebuild and MySQL will set a write lock. Therefore all writes to the table that gets modified will be blocked. As soon as the table gets large or if you cannot afford to have a maintenance window, using ALTER TABLE becomes tricky.The only exception is that secondary indexes can be added or removed without rebuilding the whole table. The table is still write locked during the operation but it is much faster.You can spot this ‘fast index creation’ process by looking at SHOW PROCESSLIST (see manage keys in the State field):+----+------+-----------+--------+---------+------+-------------+-----------------------------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +----+------+-----------+--------+---------+------+-------------+-----------------------------------------+-----------+---------------+-----------+ | 1 | root | localhost | sbtest | Query | 4 | manage keys | alter table sbtest2 add index idx_k (k) | 0 | 0 | 0 | +----+------+-----------+--------+---------+------+-------------+-----------------------------------------+-----------+---------------+-----------+It is helpful for small or even medium tables, but it is not fast enough for large tables.A common workaround when you have a master-slave configuration is to perform the schema change on the slave first, promote the slave and then perform the schema change on the old master. However replication is not only available if you are using Icehouse and even with Juno, replicas are read-only. So this workaround is not usable with Trove.pt-online-schema-changeA good alternative is to use pt-online-schema-change. In a nutshell the tool creates a new table with the desired structure, creates triggers on the original table, and copies rows in chunks from the original table to the new table.Nice features of the tool include being able to write to the original table while the schema change is being performed (hence the name of the tool), monitoring replication lag on slaves and automatic throttling to keep lag under a reasonable threshold.Let’s assume you want to add a new field called ‘city’ in the ‘user’ table. Instead of running:mysql> ALTER TABLE mydb.user ADD city VARCHAR(50) NOT NULL DEFAULT '';you will run:# pt-online-schema-change --progress=percentage,10 --alter="ADD city VARCHAR(50) NOT NULL DEFAULT ''" h=localhost,u=root,D=mydb,t=user --executeAll ALTER TABLE statements can be converted to an equivalent pt-online-schema-change command.Is there any tradeoff? The first one is that pt-online-schema-change will be slower than a plain ALTER TABLE. This is expected as the tool has extra code to throttle itself if it adds too much load.How much slower is it? As always it depends. On a lightly loaded server the difference will be minimal, for instance I measured 3mn24s for ALTER TABLE and 3mn30s for pt-online-schema-change on a test table. However on a heavily loaded server, it can be like 5x slower.The second tradeoff is that pt-online-schema-change adds triggers on the original table to capture data change. So if you already have triggers, you can’t use the tool (this will be lifted in MySQL 5.7).Metadata LocksMetadata locks were introduced in MySQL 5.5 for better transaction isolation.But one side effect is: if you start a schema change operation on a table while another transaction is reading or writing on the same table, ALTER TABLE will be stuck in the Waiting for metadata lock state.Another negative side effect of metadata locks is that in some situations ALTER TABLE will also block all queries to the table that is being altered, even reads. In the worst cases, this may create a query pileup situation that will freeze the whole server. For more on this topic, have a look at this post.What about pt-online-schema-change and metadata locks? It is of course hit by this issue/feature. However the good thing is that the timeout for metadata locks is set by the tool to 60s instead of the default 1 year. So after 60s, pt-online-schema-change will simply retry the operation that failed because of metadata locks.MySQL 5.6: Online Schema Changes?Metadata locks also exist with MySQL 5.6, the same issues as described above can then happen.However the good news with MySQL 5.6 is that most schema changes can be done online. Now the question is: should you use pt-online-schema-change or a regular online ALTER TABLE statement?Both have pros and cons:ALTER TABLE is easy to use while being confident with pt-online-schema-change requires some time.There is no way for ALTER TABLE to know if it’s overloading a replica while pt-online-schema-change monitors replication lag and throttles itself if needed.ALTER TABLE is only online for the master, not for replicas while pt-online-schema-change is online for all servers.The last point can be a bit confusing: why an online ALTER TABLE on the master wouldn’t be online on the slave as well? The truth is that unless you are using multi-threaded replication, writes on slaves are serialized. So while the ALTER TABLE is running, the slave will not process any other write, which is not much different from an offline ALTER TABLE.So if you are running a single server (all Icehouse users for instance), ALTER TABLE is probably the right choice. But if you have read replicas and specifically if the application cannot tolerate replication lag, pt-online-schema-change is definitely a better choice.ConclusionPerforming schema changes is becoming easier with newer versions of MySQL. However potential issues can be found with any version, so always be careful when you need to run ALTER TABLE.The post Schema changes in MySQL for OpenStack Trove users appeared first on MySQL Performance Blog.

Banner

 

Copyright © 2001 - 2013 K-Factor Technologies, Inc.



Site Meter

a href=