ÃÛ¶¹ÊÓƵ

Cloud 5 - Third-party search

Explore how third-party search can best be integrated into Edge Delivery Services.

video poster

Transcript
Hey Varun, welcome to the first series of interviewing experts on edge delivery services. So I know you’ve had some great experience with edge delivery service working on various different projects, and I would just like you to introduce yourself. Thank you, James. Hello everyone. I’m Varun Mitra. I’ve been working with ÃÛ¶¹ÊÓƵ for about 11 years now. So far I’ve worked on four different edge delivery services project. And yeah, I’m happy to be here talking with James. Great. Well, let’s dive right in. And the topic of today’s session is of course third party search services and integrating that with edge delivery services. Would you mind just giving a quick overview of how search functionality is handled in edge delivery services? Sure, James. So edge delivery services utilize a query index. Just like content in edge delivery, this query index resides within SharePoint or Google Drive. On publishing query index, this particular index is transformed into a JSON file, and you can use JavaScript fetch API or ffetch.js to query and retrieve results. I can quickly go ahead and also show you what this query index actually looks like. So let me go ahead and share my screen. So as you can see, this is the query index that we are utilizing in one of our projects. This is the query index.json. That is, this is the transform format. In actuality, this is what it looks like. This particular file resides within our SharePoint. If I were to quickly go to the SharePoint, you can see that this is the query index.xlx. And right over here, when I preview and publish this file, it gets transformed into query index.json. Now I can use the fetch API to retrieve results directly from this JSON file, or I can use edge delivery services ffetch.js APIs to fetch the results from here. Very cool. How is this query index actually generated? Does it happen automatically? Yes, James. So what happens is that each time you publish any content in edge delivery, it automatically updates the query index. So basically, whenever you are doing any publishing, your content would be automatically added in here. That is the metadata properties. Whatever the content is, it will get added to the query index. You can also trigger the reindexing job manually using the admin APIs. Oh, very cool. Do you have any best practices to structure that query index so that you can facilitate easier data access? Yes, James. Let me again go back to the actual query index. As you can see, this is a pretty big file. It has about 17,620 records. Now if I was to just do a blank query search over here that is fetch each and everything, the resulting result set will become quite large, and it would be downloaded over the internet and your web pages might load up slowly. So what we ended up doing, particularly in this case, was to divide this query index into different sheets. Since this is a spreadsheet, we can create multiple sheets over here. As you can see, I have got so many different sheets over here. Each of them, each of this sheet corresponds to a specific formula or a specific condition. Like data centers sheet contains results only with data center category. I can add an Excel formula so that the results could be fetched from the raw index and could be stored in here. Now whenever I want to load up the data center results, I can directly go to the data center sheet and just go ahead and fetch all the results or whatever result I want from here. So it will reduce the size, the overall size of my query index. It will also reduce the number of results that I need to fetch at any given time. Now one of the major advantages of doing this is that your query execution time will go down. Hmm. I see. Do you have any other performance constraints to keep in mind when you’re dealing with such large data sets? Yes, as I mentioned earlier, iterating over a large data set can increase the query execution time. Furthermore, it can also cause the total blocking time to go up. This will impact your Google Lighthouse code. As such, we try to format query index into smaller chunks as we just saw by dividing into sheets and making sure that it is sorted. That makes sense, Varun. How about third party search libraries? Can we use those? Yes, we can use third party search. So far, I’ve seen two different JavaScript-based search libraries being used. One of them is called Elastic Lunar, and the other one is called Code View. I see. How do we actually implement search functionality using one of those, for example, Elastic Lunar? Sure. So Elastic Lunar is a lightweight, full-text, JavaScript-based search engine. So let me walk you through the setup process. So I’m going to again share my screen over here. So over here, I have my Visual Studio Code, and this particular implementation is from an earlier project that I worked on. So the first step involves creating an Elastic Lunar database. So this is done with the help of a GitHub workflow. So I’m sure a lot of our audience is familiar with GitHub workflows, but anyways, I’ll go over it again. So what this involves is it involves calling a particular Node.js-based file, which in our case is called Build Search Index. Now, in order to create any GitHub workflows, you need to set up a YAML file. This YAML file decides when our search index that is our build search index or JS file will be called. As you can see, we have provided an event that is resource published. What this means is each time we publish anything, this particular search index will get updated, just like our normal search index, that is query index, which gets updated each time we publish any resource. Now you will see there are a bunch of steps over here. It is installing dependencies. So what I’ll do is I’ll move over to build search index or JS. Now this is the file which gets called. This is the JavaScript file, which gets called. As you can see at line number six, what we are doing is we are calling the query index or JSON, and we are fetching the records from it. And simply we are iterating over these records and we are simply adding them to a new search index, which in our case is Elastic Lunar. Now what will happen is that we will fetch all of this record, we fetch all of this data and add it to a particular search index or a search database as it is called in Elastic Lunar. This search index resides within our GitHub. It would be committed to a GitHub repository. So in this case, this is the search DB. As you can see search index or DB, it has all the data that is present within our query index or whatever data we choose to store over here. Now once you have this set up within your GitHub, what you can do is you can call or you can retrieve data from the search index directly from your search text box or wherever you have written your search code. So in this case, this is search results.js. From here, I’m going to simply invoke a search worker, which will simply go ahead and retrieve the results from this database and return to me whatever data I want. So you can see that search.js is calling search index or DB, which was basically our Elastic Lunar database. And it is returning all these results back to me and I can iterate over them, arrive at my result and fetch that particular piece of data. Now this particular implementation would be much faster than using the plain old query index as we are using a third party API, which is optimized for returning search results much quicker. Ah, that’s fantastic Varun. Thank you for a very thorough overview of search in Edge Delivery Services. We look forward to seeing you next time. Go James. Thank you everyone.

Additional Resources

Watch related videos on the Cloud 5 season 3 page.

recommendation-more-help
4859a77c-7971-4ac9-8f5c-4260823c6f69