Percona Toolkit pt-query-digest
Learn why you use the pt-query-digest and some real-world examples to help deepen the reasoning.
Who is this video for?
- Architects
- Developers
- DevOps
Video content
- Learn about pt-query-digest usage
- Learn about the benefits and shortcomings of this Percona Toolkit feature
- Understand the results and learn what possible performance steps should be considered
Transcript
Today’s topic is PTQuery Digest. What’s PTQuery Digest? PTQuery Digest is one of the periclonal tools. It’s basically a periscript that allows you to analyze MySQL queries from logs, processes, and TC dump. Basically, the issue, the pain point we are trying to solve today is, for example, oftentimes your website or an application or experience, it is spike, and as a result of that, a database bottleneck might happen. We open an investigation to dig into the issue in an attempt to basically shed light on what might have happened. PTQuery Digest might come in handy. The benefits of PTQuery Digest are the fact that it summarizes all queries from the slow query log, processes, and TC dump in a very intuitive format. Obviously, it also has shortcomings. One of the shortcomings is the fact that it doesn’t basically tell the full story. That means it doesn’t provide as much details as PT stock. Here’s how we execute PT Digest. We are assuming that you already have Prokono toolkit installed on your server. PTQuery Digest is one of many dozens of Prokono tools that allow you to tackle very complicated, combo-loaded database tasks. We are assuming PTQuery Digest is installed. What you need to do, you just need to invoke it with the slow query log. There are two parameters you can pass to basically narrow the window, because here it’s going to analyze the entire slow query log. It could be 24 hours worth of query. If you want to narrow it, let’s say the bottleneck happened between 8 AM to 10 AM UTC. If you want to narrow it down to this specific window, you can pass two parameters since and until. It’s going to provide this output. Here you have the first query. What does this output mean? It’s going to give you the percentage during this specific window you specify, if you specify in a window. It’s going to tell you the percentage of the percentage of the specific query. In this example, this query basically accounts for 20 percent of all queries during this specific window time range. This is the time range from December 22nd, 6 AM to December 23rd, 1 AM UTC. This is a pretty large window. This query accounts for 26 percent of all queries. There are a total of 209 single queries that executed during this specific window. The most interesting part is this one. You need to know how long the query takes. Basically, on average, it takes three seconds to execute. However, 95 percent of the query, 95 percent of time, it took three seconds. Basically, this could be seven or four seconds. It could be different, but in this specific case, the average and 95 percent of time, the query executed is the same. You have this query here. It took three seconds and there’s a way to optimize it. PD query digest is telling you if a full scan was performed, yes, 100 percent and zero, no. Basically, 100 percent of the time, the query did a full table scan. Basically, we can add an index, a covering index. It’s not the topic, but we can add it to speed up the query. So the PD query digest is going to basically give you some insight as to how to eventually optimize the query. So it’s telling you, it tells you about how many rows that were sent, and if there’s the query size. But again, the most important line is basically the execution time line. The second query is basically the same thing. You focus on this line, but let’s jump to the last one. The last one, as I said before, you see the average is three seconds, whereas the 95 percent metric is different, because 95 percent of this time, the query took seven seconds, which means it’s pretty slow query. There’s a temp table was created 100 percent of the time. So it doesn’t say anything about full scan, but it tells you that 100 percent of the time, the temp table was created. So what you can do is to optimize the query accordingly. What we did is we created covering index on all of the fields that are in the query. Quantity ordered, updated, all of them. We added them to a single index, that’s hence the covering index. In a nutshell, it’s one specific situation like use case of PTQuery digests. The recommendation would be to run it, executed alongside other comments, because again, it doesn’t tell you the full story. It doesn’t tell you there are so many other details that could shed light and could explain what was happening on the database. Thank you for listening.
Code references
Be sure to change to match your logs and time frame
$ pt-query-digest mysql-slow.log.7 > mysql-slow.log.7.DIGEST
Useful resources
recommendation-more-help
3a5f7e19-f383-4af8-8983-d01154c1402f