Query indexes and Excel formulas
Explore query indexes and how to write Excel formulas for manipulating and sorting datasets with Edge Delivery Services.
Transcript
Hello, my name is James Talbot and I’m here with Varun Mitra, who is a cloud architect on the AEM engineering team. Varun, I’d like to just quickly ask some questions. In one of our previous Edge delivery videos, we actually talked about the query index. Can you talk a little bit about how this file is actually generated? Sure, James. Depending upon the backend, that is your storage solution, SharePoint or Google Drive, an Edge delivery project lead can go ahead and create a file called a query index. This is basically a workbook or a spreadsheet. Within this file, they need to create a sheet called as raw underscore index. And whenever you publish any page or any content within Edge delivery, that particular content gets added to this particular sheet. So basically what Edge delivery does is it takes or grabs the data within that particular file and it uses it to update the query index. This query index could be used later on for querying, for running your search queries or for just loading up results from the front end. I see. So with each page published, the query index will be updated. But is there a way for us to define what information is actually captured in that file? Yes, absolutely. By default, Edge delivery services relies on a file called as helix-query.yml. You can add different fees to this YAML file or change the default indexing configuration. I see. So if I were to add a custom sheet in the query index, would it get updated automatically? No. First and foremost, you need to follow a particular naming convention when defining custom sheets. You need to append helix- as a prefix to your sheet name. Secondly, you need to make use of a formula to automatically populate the new sheet using raw index as a source. Interesting. So can you talk a little bit about these Excel formulas and how they pertain to an Edge delivery services project? Sure, James. Let me walk you through the query index file for one of the projects that I’m working on. Let me quickly go ahead and share my screen. Great. OK, so this is the query index up. Before I start, are you able to see my screen, James? Yes, thank you. OK, yeah, I’ll start. So this is the query- index that I’m using for one of my projects. As you can see, this is a very big file. It has a lot of records within this, particularly for this particular project. This particular project, we migrated over to 200,000 pages and all of that data resides within this query index. As you can imagine, this is a very big file. And if you were to query or if you were to run a query over this file, that would mean loading or traversing a lot of records. Whenever you query over such a big result set, what happens is that your overall query execution time goes up and this, in turn, ends up increasing your total blocking time, thereby affecting your Google Lighthouse code. So what you need to do is you need to go ahead and split this particular sheet into different sheets. As you can see, I created different sheets over here. And what this and what I’ve done is I’ve defined a particular formula that would pull in pertinent data into this sheet. Like for example, in Helix-slideshow, I’m only pulling the data for slideshows and I’m using it to populate this sheet. Now, whenever I were to run a query, I will only go ahead and only query upon this particular sheet instead of running the query over raw underscore index. I see. Well, certainly that is a big file, Varun. Do you think you could show us a quick example of maybe a simple Excel formula that our learners can use? Sure, James. What I will go ahead and do is I will again share my screen. Great. All right. So this time around, I have brought up another query index file. Now, this query index file belongs to the weekend project. Learners can download this particular file or the entire weekend project for edge delivery services from demo hub. So as you can see, this is a query index file which contains different records. It is not as big as the previous file that we saw. And that’s okay for our example tool. So what I can do is I can go ahead and define an Excel formula that will basically allow me to split this particular file into different sheets. So I will just go ahead and create a new sheet, maybe give it a new name over here. As I mentioned earlier, you need to append helix hyphen and I will just call it test. You can call it whatever you like. I’m going to also copy the column headers from my raw underscore index. And paste it in over here. Now what I will do is I will quickly go ahead and define an Excel formula. The formula that I’m going to define, it will be defined on cell number A2. Let me make it a bit bigger for learners to see. So I will define that formula in A2 and then later on, as you know, with spreadsheets, you can copy that formula over to other cells. I will type in equal to and then I will remove the filter function. Now filter function allows you to filter on the basis of certain condition. And that’s exactly what I want to do here. The first input for the filter formula is an array. So I’m going to go ahead and grab the data from my raw underscore index as that is the sheet that is automatically updated each time any content is published. And I will define the scope that is from where I want to grab this data. So I want to grab this data from raw underscore index from cell A2 to cell E1088. Now the original raw underscore index has five columns, A to E, and the data resides within A2 to E1088. You can define the range appropriately based on your actual project. Now this was my data set, my array. Now I need to also provide a filtering condition. This filtering condition will define or will allow me to specify what data to grab. And if that particular condition matches, only then that particular data will be fetched in. So let’s start with a simple condition over here. Again, I will specify raw index. And what I will do over here is that I will specify a not null condition. That is from raw underscore index. I want to just check if in the column A, we have any null values. So I will just add a not null condition that is grab everything from column A if the values are not null. And then I will just hit enter and you will see that all the data is automatically populated. And if there were any null values in the column A, that particular data was not crafted over here. So you can use a simple formula such as this to fetch data and use it to populate your new sheet in here. If you want, you can go ahead and create a much more complex condition in here. As a matter of fact, let me quickly also do that. Say for example, I have a description in here. Let me see if I can find a particular value to provide in here. So let me quickly go ahead and modify this again. I’ll go back to A2 and let me quickly modify this over here. That is, I’m going to simply change it to D2 as D is the column I’m working with. And what I want to do over here is I want to check if this particular column D contains a specific value. So I will add it equal to and I’ll provide the string in here. So it is pretty straightforward what I’ve done in here. And the next thing which I will also do is add an additional condition in here. So now this particular condition is optional. What it does is if it doesn’t find that particular value, it will set that particular row to null. And I will hit enter. And as you can see, I’m able to achieve a filtering. I’m able to filter the existing data based on a specific condition. So yeah, so this is how you can filter the data that you’re getting from raw index. And you can achieve different kind of filtering conditions in here. If I had more columns, I could have done a bit more. I could have maybe added a filtering on the basis of category based on template type, et cetera. But yeah, this is how I can achieve some simple filtering. Let me quickly wrap this up by showing one final thing. Right now this data is not sorted. What if you want to sort it out on the basis of a certain condition? What I can do is I can concatenate these different formulas together. That is I can take the output from the filter, that is the filter function, pass it on to the sort function. And then using this, I would be able to also sort my current data. So this is what I will go ahead and do. I will add it, the output from filter, passing it on to the sort function. And I will also add a few additional conditions over here. First I will provide what column I want to filter this data from. I will add the column number, that is five. I want to filter it on the last modified date. Next, I will go ahead and specify the sorting order. Do I want it sorted in ascending or descending order? I will select minus one, which means that the latest record will be on top and then so on and so forth. Finally, I will also specify whether I want to sort by column or sort by row. Pretty straightforward. And then I will hit enter and you will see that the data is also sorted. So this is how learners can simply split their raw underscore index into different sheets using different formulas based on different categories, template types, et cetera. And thereby they can speed up their query execution and only fetch the filter results. Wow, fantastic. Thank you, Varun. This was extremely helpful and you provided a great explanation of query indexes and some great tips for using Excel formulas. So thank you. Sure thing.
4859a77c-7971-4ac9-8f5c-4260823c6f69