Home News Feeds Planet MySQL
Planet MySQL
Planet MySQL -

  • How to hire for Infrastructure Operations Engineers
    I have been working for a very large Australian website for over six years and during this period have been fortunate enough to hire many Infrastructure Operations Engineers that now work for that company. I want to detail the evolution of the hiring process and what I have driven it to over the last six years.How was I hiredThis is the interview process I went through at my current company:A technical and cultural pre-screening from a recruiterA short phone interview with a small set of adhoc questions around technical skill set. How does DNS work?An hour long face to face interview with the hiring manager and another senior engineer testing both technical capability and culture fitAnother hour long face to face interview with a HR representative testing cultural fit Lastly, a reference check to confirm technical capability and cultural fit performed over the phoneThis sounds pretty standard. Lets dive into some subjective positives and negatives of this process:What did workThe hiring manager, who was a technical expert in the field was able to confirm with a high degree of confidence my technical capability What didn't workHow I might work with my peers wasn't considered in the hiring equation at allThe questions from the hiring manager were completely custom and specific to my resume and my responses during the interview This requires the same manager to perform the interview with all the candidates to provide a comparison between candidates which generates a hiring bottleneckFirst evolution 2009 - 2011The first evolution that I was apart of consisted of standardising our pre-screening questions to enable a larger set of people to pre-screen candidates. Further, we also asked our candidates self rate their technical abilities in certain technologies and then tested their own rating during the technical interview. On a scale of 1 to 5, how would you rate yourself in the administration of MySQL?  The interview processed had three phases:Recruiter pre-screenTechnical phone pre-screen:On a scale of 1 to 5, how would you rate yourself in the administration of MySQL?And depending on the response: what does the binary log do in MySQL?A two stage interview consisting of:A technical interview:Why, from time to time, does replication fail? A cultural interviewWhat is more important to you: Training and development vs Autonomy?What did workStandardising the technical pre-screening enabled us to scale our hiring processSelf-rating and testing that self-rating during the technical interview enabled us to filter any technical embellishment in the candidates CVThis really enabled us to assert the technical level of our candidates What didnt workOur interview process did not have a practical component which introduced a level of subjectiveness in ones technical skill setThere was minimal focus on the cultural aspect of hiring The hiring process was long and the candidate had limited feedback to where they were at in the processSecond evolution 2012 - 2014I drove the second evolution and it was largely based around the changes that were made in the developer hiring process at my company. The developer process consisted of:Recruiter pre-secreenA cultural pre-screen performed by someone at the company - generally over coffeeAn offline coding test by the candidate  - the famous 'robot test'A review of that coding test by subject matter experts in the companyA three stage interview consisting ofRefactoring the code submittedA technical interviewA cultural interviewFurther, the interview was largely run by the individuals in the team that the candidate would join. I was really impressed with the practical components and the cultural focus that this process enjoyed. Overlaying the 'DevOPS' and 'configuration as code' movement in the Infrastructure Operations space, and the gaps in the company at the time, myself and other leaders in the company deduced that we wanted the role that consisted ofKnowing configuration as codeSomeone that would work directly with developers to coach them on Infrastructure OperationsSomeone that is able to troubleshoot technical problems at a macro and micro levelI didnt think we were getting much value out of the 'self-ranking' so I decided to drop it in favour of an alternate standard set of pre-secreen questions.Thus I changed the hiring process toRecruiter pre-secreenA technical and cultural phone pre-screen by someone at the companyAn offline configuration-as-code test ( two stage interview consisting ofA technical interviewA cultural interviewWe also involved a lot more people into the pre-screening and final stages of the interview process and ran the final stage as a two one hour interviews back-to-back. This enabled quicker feedback to the candidate and enabled us to 'batch process' the candidate.What did workThe configuration-as-code test helped filter candidates that have the technical understanding of where the industry is headingI was able to obtain great insight into the candidate by the way they solve the puzzle Hint: I actually care more about how you submit it, than how you complete itThe test itself provided lo-fi filtering - so we could discount candidates quickerInvolving the members of the team in the interview process enabled the candidate to interview the team as much as the team to interview the candidateEmpowering the team to own the hiring process generated a 'family like' feel to the teamWait, I actually have to work with this person!What didnt workThe practical troubleshooting wasnt a part of the interview process - how would the candidate handle the stress of responding to a complete outage? Having the team involved in the interview process introduced an element of 'group think' and in the extreme, the team could be 'socially hacked' in the candidate reviewOver that two year period, I performed a few experiments on candidates by altering our final stage to A three stage interview consisting ofA break fix interviewA technical interviewA cultural interviewThe break fix was largely based around the RHCE exam of being given a Linux box running a webserver that was unable to boot. Fix the box, get the box to boot and get the webserver running. While I think we better confirmed the technical skill set, there were some negative aspects which is why I abandoned it:In an age of disposable infrastructure, how often would you need to fix GRUB? Are we testing the wrong thing?"The right way" of fixing a problem is in configuration-as-code and re-deploying - are we sending the wrong message in this 'hack-and-slash' break fix?Further, 70% of the role of an Infrastructure Operations Engineer is consistent across all of the company, there are some areas where the candidate is fully embedded in a development team. To drive the teams empowerment in the interview process, we made some minor refinements A three stage interview consisting ofA technical interview from Infrastructure Operations expertsA technical interview from developer leads A cultural interviewWe were largely testing their ability to work with developers and whether they would be able to coach others in the team from different backgrounds in the important of Infrastructure Operations. This really enabled the candidate to get a first hand view on the people they would be working with if they were successful and they still wanted the job.Third evolution 2015 onwardsMy peer and I really wanted to turn the dial on practical troubleshooting in our interview process. The practical test focuses on ones ability to troubleshoot a broken website in a stressful situation and what tools they reach for in the troubleshooting process.  I get really excited when I conduct this test as I'm fascinated with how people solve this puzzle in new and unexpected ways.  While it still has the same drawbacks of the break fix experiment, the value in understanding the thought process far out weights implementing a fix that isn't 'the right way'.As a nice byproduct, this test is now used as a training tool for our developers. I was like a kid at a candy store conducting this test with my developer peers looking at how they tackled the problem.Further to this, I have adjusted my pre-screening interview to a face to face "coffee catchup" instead of a technical and cultural Q&A on the phone. This pre-screen interview has a higher focus on culture instead of technical and is a direct response to candidates in late 2014 that were technically brilliant but culturally shocking.Thus the current recruitment process is:Recruiter pre-secreen if the candidate is coming from an external recruiterAn internal HR recruiter pre-screen An offline coding test ( to assess technical skill set A review of the coding test by subject matter expects in the companyA cultural and technical 'coffee catchup' assessing a cultural fix Sometimes the coding test and coffee catch up are swapped around - I'm still unsure which provides better filtering A three stage interview consisting ofA troubleshooting test A technical interview A cultural interviewWhats workingI'm seeing interesting data in the new troubleshooting test - I like having the new data, though I'm not sure how much weight the test needs to factor into the equationThank you to the recent hires for being the guinea pigs!A large part of the organisation participate in the interview  (~10 people per candidate)Whats not workingI'm not convinced that 'coffee catchups' are a 'good' - what social signals do we send by setting with a 'coffee catchup' to drill them about their work history and professional desires and aspirations? Is there a better way to do this? I'm not sureI'm not sure if we have the balance right between pushing the autonomy and accountability of the hiring of the candidate to the teamIs the team hiring someone they like vs hiring someone they need?Where to nowI'm proud that I have been able to refine the interview process over the last 6 years and I have no doubt that it will continue to be refined. Its critical to reassess the interview process for candidates as the role and responsibility for the role changes over time - don't stand still!This is what I look for in a candidateThis is what I look at with a candidate (if the role has a web operations element)TroubleshootingWhat is in your toolbox to troubleshoot? You may say that you look at logs, but do you actually look at logs and understand what they say?Do you know what goes into a web request? Can you divide and conquer to narrow the area of the problem?How do you react in a highly stressful situation?Technical Do you know when to use "the right technology"?How do you turn the dial on your productivity?Can you grow others? Can others grow you?  CulturalDo you complement the team? Do you add a new dimension to the team?Will you work well with the team?Will you stand up and fight for something if its appropriate to do so?  Do you know when it is appropriate? Will you accept feedback from others? Will you give feedback?I have found myself increasingly interested in analysing the people conducting the interview - looking atDo you have self awareness of how you and your peers respond to the questions you asked - and how your social cues could prime subsequent responses from the candidate?Are you looking for 'the right candidate' or a candidate that is similar to you?Are you conscious on how this candidate might fill the gap that we are hiring for? What you should look forDuring the interview process the candidate should be interviewing the company and assessing whether they would want to work there. Here is what I look forWill you be able to grow yourself in the organisation?Does the organisation subscribe to similar beliefs to you?How much politics is in the company? Will that prohibit 'getting stuff done'?Will you enjoy going to work?

  • on ORDER BY optimization | Domas Mituzas An insightful exploration by Domas (Facebook) on how some of the MySQL optimiser’s decision logic is sometimes naive, in this case regarding ORDER BY optimisation. Quite often, “simple” logic can work better than complex logic as chasing all the corner cases can just make things worse – but sometimes, logic can be too simple. Everything must be made as simple as possible, but no simpler. — Albert Einstein / Roger Sessions

  • Log Buffer #434: A Carnival of the Vanities for DBAs
    This Log Buffer Edition throws spotlight on some of the salient blog posts from Oracle, SQL Server and MySQL. Oracle: STANDARD date considerations in Oracle SQL and PL/SQL My good friend, Oracle icon Karen Morton passed away. Multiple invisible indexes on the same column in #Oracle 12c Little things worth knowing: Data Guard Broker Setup changes in 12c Things that are there but you cannot use SQL Server: Dynamic Grouping in SSRS Reports SQL 2014 Clustered Columnstore index rebuild and maintenance considerations SQL Server 2016 CTP2 Azure SQL Database Security Features Visualize the timeline of your SQL jobs using Google graph and email MySQL: Shinguz: Max_used_connections per user/account Generally in MySQL we send queries massaged to a point where optimizer doesn’t have to think about anything. Replication is the process that transfers data from an active master to a slave server, which reproduces the data stream to achieve, as best as possible, a faithful copy of the data in the master. Unknown column ‘smth’ in ‘field list’ -> Oldie but goodie error Why base64-output=DECODE-ROWS does not print row events in MySQL binary logs Learn more about Pythian’s expertise in Oracle , SQL Server & MySQL. The post Log Buffer #434: A Carnival of the Vanities for DBAs appeared first on Pythian - Data Experts Blog.

  • MariaDB automatic failover with MaxScale and MariaDB Replication Manager
    Fri, 2015-07-31 12:14guillaumelefrancMandatory disclaimer: the techniques described in this blog post are experimental, so use at your own risk. Neither me nor MariaDB Corporation will be held responsible if anything bad happens to your servers. Context MaxScale 1.2.0 and above can call external scripts on monitor events. In the case of a classic Master-Slave setup, this can be used for automatic failover and promotion using MariaDB Replication Manager. The following use case is exposed using three MariaDB servers (one master, two slaves) and a MaxScale server. Please refer to my Vagrant files if you want to jumpstart such a testing platform. Requirements A mariadb-repmgr binary, version 0.4.0 or above. Grab it from the github Releases page, and extract in /usr/local/bin/ on your MaxScale server. A working MaxScale installation with MySQL Monitor setup and whatever router you like. Please refer to the MaxScale docs for more information on how to configure it correctly. MaxScale installation and configuration The MySQL Monitor has to be configured to send scripts. Add the following three lines to your [MySQL Monitor] section: monitor_interval=1000 script=/usr/local/bin/ events=master_down Failover script As of the current MaxScale development branch, custom options are not supported, so we have to use a wrapper script to call MariaDB Replication Manager. Create the following script in /usr/local/bin/ #!/bin/bash # # wrapper script to repmgr # user:password pair, must have administrative privileges. user=root:admin # user:password pair, must have REPLICATION SLAVE privileges. repluser=repluser:replpass ARGS=$(getopt -o '' --long 'event:,initiator:,nodelist:' -- "$@") eval set -- "$ARGS" while true; do case "$1" in --event) shift; event=$1 shift; ;; --initiator) shift; initiator=$1 shift; ;; --nodelist) shift; nodelist=$1 shift; ;; --) shift; break; ;; esac done cmd="mariadb-repmgr -user $user -rpluser $repluser -hosts $nodelist -failover=dead" eval $cmd Make sure to configure user and repluser script variables to whatever your user:password pairs are for administrative user and replication user. Also make sure to make the script executable (chown +x) as it's very easy to forget that step. Testing that failover works Let's check the current status, where I have configured server3 as a master and server1-2 as slaves: $ maxadmin -pmariadb "show servers" Server 0x1b1f440 (server1) Server: Status: Slave, Running Protocol: MySQLBackend Port: 3306 Server Version: 10.0.19-MariaDB-1~trusty-log Node Id: 1 Master Id: 3 Slave Ids: Repl Depth: 1 Number of connections: 0 Current no. of conns: 0 Current no. of operations: 0 Server 0x1b1f330 (server2) Server: Status: Slave, Running Protocol: MySQLBackend Port: 3306 Server Version: 10.0.19-MariaDB-1~trusty-log Node Id: 2 Master Id: 3 Slave Ids: Repl Depth: 1 Number of connections: 8 Current no. of conns: 1 Current no. of operations: 0 Server 0x1a7b2c0 (server3) Server: Status: Master, Running Protocol: MySQLBackend Port: 3306 Server Version: 10.0.19-MariaDB-1~trusty-log Node Id: 3 Master Id: -1 Slave Ids: 1, 2 Repl Depth: 0 Number of connections: 2 Current no. of conns: 0 Current no. of operations: 0 Everything looks normal. Let's try failover by shutting down server3. server3# service mysql stop * Stopping MariaDB database server mysqld [ OK ] Let's check the server status again: $ maxadmin -pmariadb "show servers" Server 0x1b1f440 (server1) Server: Status: Slave, Running Protocol: MySQLBackend Port: 3306 Server Version: 10.0.19-MariaDB-1~trusty-log Node Id: 1 Master Id: 2 Slave Ids: Repl Depth: 1 Number of connections: 0 Current no. of conns: 0 Current no. of operations: 0 Server 0x1b1f330 (server2) Server: Status: Master, Running Protocol: MySQLBackend Port: 3306 Server Version: 10.0.19-MariaDB-1~trusty-log Node Id: 2 Master Id: -1 Slave Ids: 1 Repl Depth: 0 Number of connections: 8 Current no. of conns: 1 Current no. of operations: 0 Server 0x1a7b2c0 (server3) Server: Status: Down Protocol: MySQLBackend Port: 3306 Server Version: 10.0.19-MariaDB-1~trusty-log Node Id: 3 Master Id: -1 Slave Ids: Repl Depth: 0 Number of connections: 2 Current no. of conns: 0 Current no. of operations: 0 MariaDB Replication Manager has promoted server2 to be the new master, and server1 has been reslaved to server2. server3 is now marked as down. If you restart server3, it will be marked as "Running" but not as slave - to put it back in the cluster, you just need to repoint replication with GTID with this command: CHANGE MASTER TO MASTER_HOST='server1', MASTER_USE_GTID=CURRENT_POS; The failover script could handle this case as well, although it remains to be tested. Tags: High AvailabilityMaxScaleReplication About the Author Guillaume Lefranc is managing the MariaDB Remote DBA Services Team, delivering performance tuning and high availability services worldwide. He's a believer in DevOps culture, Agile software development, and Craft Brewing.

  • MySQL QA Episode 10: Reproducing and Simplifying: How to get it Right
    Welcome to the 10th episode in the MySQL QA series! Today we’ll talk about reproducing and simplifying: How to get it Right.Note that unless you are a QA engineer stuck on a remote, and additionally difficult-to-reproduce or difficult-to-reduce bug, this episode will largely be non-interesting for you.However, what you may like to see – especially if you watched episodes 7 (and possibly 8 and 9) – is how reducer automatically generates handy start/stop/client (cl) etc. scripts, all packed into a handy bug tarball, in combination with the reduced SQL testcase.This somewhat separate part is covered directly after the introduction (ends at 11:17), as well as with an example towards the end of the video (starts at time index 30:35).The “in between part” (11:17 to 30:35) is all about reproducing and simplifying, which – unless you are working on a remote case – can likely be skipped by most; remember that 85-95% of bugs reproduce & reduce very easily – and for this – episode 7, episode 8 (especially the FORCE_SKIPV/FORCE_SPORADIC parts), and the script-related parts of this episode (start to 11:17 and 30:35 to end) would suffice.As per the above, the topics covered in this video are: 1. percona-qa/reproducing_and_simplification.txt 2. Automatically generated scripts (produced by Reducer)========= Example bug excerpt for copy/paste – as per the video Though the testcase above should suffice for reproducing the bug, the attached tarball gives the testcase as an exact match of our system, including some handy utilities $ vi {epoch}_mybase # Update base path in this file (the only change required!) $ ./{epoch}_init # Initializes the data dir $ ./{epoch}_start # Starts mysqld (MYEXRA –option) $ ./{epoch}_stop # Stops mysqld $ ./{epoch}_cl # To check mysqld is up $ ./{epoch}_run # Run the testcase (produces output) using mysql CLI $ ./{epoch}_run_pquery # Run the testcase (produces output) using pquery $ vi /dev/shm/{epoch}/error.log.out # Verify the error log $ ./{epoch}_gdb # Brings you to a gdb prompt $ ./{epoch}_parse_core # Create {epoch}_STD.gdb and {epoch}_FULL.gdb; standard and full var gdb stack tracesFull-screen viewing @ 720p resolution recommendedThe post MySQL QA Episode 10: Reproducing and Simplifying: How to get it Right appeared first on MySQL Performance Blog.



Copyright © 2001 - 2013 K-Factor Technologies, Inc.

Site Meter

a href=