Split SXA Sitemap into Multiple Sitemaps if the Size Limit is Exceeded


Challenge:

One of the best Sitemap practices is limiting a single sitemap to 50MB (uncompressed) or 50,000 URLs. We must break the Sitemap into multiple sitemaps if we have a larger file or more URLs. Then, reference each of them in the Sitemap index. Sitecore 10.3 onwards, SXA Sitemap supports splitting the Sitemap into multiple sitemaps if it exceeds the given URL count limit. In this post, we enhance the SXA Sitemap to support splitting the Sitemap into multiple sitemaps if it also exceeds 50MB or the given size.

Solution:

The enhancement discussed in this post is developed with Sitecore 10.3 Update 1 but should work with Sitecore 10.3 as well.

SXA Sitemap settings can be found at /sitecore/content/<Tenant>/<site>/Settings/Sitemap.

Please read the Sitecore documentation. Configure a sitemap to ensure the sitemap is working properly and to understand the purpose of each field in the sitemap setting item. Validate if the sitemap is working properly at /sitemap.xml.

By default, the field “Maximum number of pages per sitemap (if undefined, all URLs will be rendered into single)” has no value. As per class “Sitecore.XA.Foundation.SiteMetadata.Settings. SitemapSettingsProvider”, the default value set to this field, i.e., SitemapIndexThreshold property, is int.MaxValue (2147483647), as shown below.

SitemapIndexThreshold = MainUtil.GetInt(configurationRoot[Sitecore.XA.Foundation.SiteMetadata.Templates.SitemapSettings.Fields.SitemapIndexThreshold], int.MaxValue),

As per the Sitemap best practices, let’s provide 50000 to limit the number of URLs to 50000 for a single Sitemap. Hence, if we have a URL count more than this, then it will split the Sitemap into multiple Sitemaps and reference them in a Sitemap index.

Let’s customize the class “Sitecore.XA.Foundation.SiteMetadata.Services.SitemapManager” to split the Sitemap into multiple Sitemaps if the Sitemap size exceeds 50MB or the given size. Basically, each Sitemap size should be within the given size.

Following is the customized SitemapManager code. Please check the comments for more details. For the complete code, download the CustomSXA.Foundation.SiteMetaData.Services.SitemapManager code and add to a suitable foundation layer project. Feel free to update the namespace as per your project.

//Existing using namespaces here and below are the new one

using Sitecore;
using Sitecore.XA.Foundation.SiteMetadata.Services;
using System.IO;
using System.Text;
using System.Xml;

//Custom namespace and class. 
namespace CustomSXA.Foundation.SiteMetaData.Services 
{
    public class SitemapManager : ISitemapManager
    {
        //Existing properties here
        
        //New property added for content size limit of Sitemap
        private long SitemapMaxSizeLimit { get; } = StringUtil.ParseSizeString(Sitecore.Configuration.Settings.GetSetting("CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit", "50MB"));

        //Existing methods here

        //customized code for GenerateSitemap
        public SitemapContent GenerateSitemap(SiteContext site)
        {
            Item homeItem = this.GetHomeItem(site);
            Sitecore.XA.Foundation.SiteMetadata.******.Sitemap.SitemapSettings sitemapSettings = this.SitemapSettingsProvider.GetSitemapSettings(homeItem);
            if (sitemapSettings == null)
                return (SitemapContent)null;
            if (sitemapSettings.CacheType == SitemapStatus.Inactive)
                return (SitemapContent)null;
            IList<Item> items = this.SitemapSettingsProvider.GetItemCrawler(sitemapSettings).GetItems(homeItem);
            int count = items.Count;
            bool flag = sitemapSettings.IncludeXdefault && sitemapSettings.GenerateAlternateLinks;
            if (flag)
                count += items.GroupBy<Item, ID>((Func<Item, ID>)(i => i.ID)).Where<IGrouping<ID, Item>>((Func<IGrouping<ID, Item>, bool>)(i => i.Count<Item>() >= 2)).Count<IGrouping<ID, Item>>();
            SitemapContent sitemap;
            if (sitemapSettings.SitemapIndexThreshold < count)
            {
                List<string> stringList = new List<string>();
                IEnumerable<IGrouping<ID, Item>> groupings = items.GroupBy<Item, ID>((Func<Item, ID>)(i => i.ID));
                List<Item> objList = new List<Item>();
                int num1 = 0;
                foreach (IGrouping<ID, Item> grouping in groupings)
                {
                    int num2 = grouping.Count<Item>() < 2 ? 0 : 1;
                    if (flag)
                        num1 += num2;
                    if (objList.Count + grouping.Count<Item>() + num1 <= sitemapSettings.SitemapIndexThreshold || objList.Empty<Item>())
                    {
                        objList.AddRange((IEnumerable<Item>)grouping);
                    }
                    else
                    {
                        SplitSitemap(sitemapSettings, stringList, objList); //New SplitSitemap function is called instead of old code - stringList.Add(this.RenderSitemap((IList<Item>) objList, sitemapSettings));
                        objList.Clear();
                        if (flag)
                            num1 = num2;
                        objList.AddRange((IEnumerable<Item>)grouping);
                    }
                    if (objList.Count + num1 > sitemapSettings.SitemapIndexThreshold)
                    {
                        SplitSitemap(sitemapSettings, stringList, objList); //New SplitSitemap function is called
                        objList.Clear();
                        num1 = 0;
                    }
                }
                if (objList.Any<Item>())
                {
                    SplitSitemap(sitemapSettings, stringList, objList); //New SplitSitemap function is called
                }
                sitemap = new SitemapContent()
                {
                    Values = stringList
                };
            }
            else
            {
                List<string> stringList = new List<string>();
                SplitSitemap(sitemapSettings, stringList, items); //New SplitSitemap function is called. This is the case where the URL count limit is already satisfied but we check for the content size limit here and split it if neeeded
                sitemap = new SitemapContent()
                {
                    Values = stringList
                };
            }
            this.SetRefreshDate(site);
            return sitemap;
        }

        //Existing code for GetSettings() and RenderSitemap() here. 
        
        //Following are the new functions to split the Sitemap based on the content size limit.

        private void SplitSitemap(SitemapSettings sitemapSettings, List<string> stringList, IList<Item> objList)
        {
            string originalSiteMap = this.RenderSitemap((IList<Item>)objList, sitemapSettings);
            List<string> listOfSiteMap = SplitSitemap(originalSiteMap);
            stringList.AddRange(listOfSiteMap);
        }

        public List<string> SplitSitemap(string originalSitemap)
        {
            //return the same original sitemap back if its size is within the given limit 
            List<string> sitemapSegments = new List<string>();
            if (Encoding.UTF8.GetBytes(originalSitemap).Length <= SitemapMaxSizeLimit)
            {
                sitemapSegments.Add(originalSitemap);
                return sitemapSegments;            
            }

            //If not within the size limit, split it.
            StringBuilder currentSegment = new StringBuilder();
            long currentSize = 0;

            using (StringReader stringReader = new StringReader(originalSitemap))
            using (XmlReader xmlReader = XmlReader.Create(stringReader))
            {
                while (xmlReader.Read())
                {
                    if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "urlset")
                    {
                        if (currentSegment.Length > 0)
                        {
                            // Close the previous <urlset> tag
                            currentSegment.AppendLine("</urlset>");
                            sitemapSegments.Add(currentSegment.ToString());
                            currentSegment.Clear();
                            currentSize = 0;
                        }
                        currentSegment.AppendLine("<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>");
                        currentSegment.AppendLine("<urlset xmlns:xhtml=\"http://www.w3.org/1999/xhtml\" xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">");
                        // Update the size with the length of the new elements
                        currentSize += Encoding.UTF8.GetBytes(currentSegment.ToString()).Length;
                    }
                    else if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "url")
                    {
                        if (currentSegment.Length == 0)
                        {
                            throw new InvalidOperationException("Invalid sitemap structure");
                        }

                        string urlElement = xmlReader.ReadOuterXml();
                        // Calculate the size of the new URL element, including existing elements
                        int urlSizeInBytes = Encoding.UTF8.GetBytes(urlElement).Length;
                        long newSize = currentSize + urlSizeInBytes;

                        if (newSize + "</urlset>".Length > SitemapMaxSizeLimit)
                        {
                            // Close the previous <urlset> tag
                            currentSegment.AppendLine("</urlset>");
                            sitemapSegments.Add(currentSegment.ToString());
                            currentSegment.Clear();
                            currentSize = 0;

                            // Start a new <urlset> tag
                            currentSegment.AppendLine("<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>");
                            currentSegment.AppendLine("<urlset xmlns:xhtml=\"http://www.w3.org/1999/xhtml\" xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">");
                            currentSize += Encoding.UTF8.GetBytes(currentSegment.ToString()).Length;
                        }

                        currentSegment.AppendLine(urlElement);
                        currentSize += urlSizeInBytes;
                    }
                }
            }

            if (currentSegment.Length > 0)
            {
                // Close the last <urlset> tag
                currentSegment.AppendLine("</urlset>");
                sitemapSegments.Add(currentSegment.ToString());
            }

            return sitemapSegments;
        }
    }
}

Provide the following Sitecore patch config – CustomSXA.Foundation.SiteMetaData.config.

<?xml version="1.0"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <services>
      <register patch:instead="register[@implementationType="Sitecore.XA.Foundation.SiteMetadata.Services.SitemapManager, Sitecore.XA.Foundation.SiteMetadata"]"
        serviceType="Sitecore.XA.Foundation.SiteMetadata.Services.ISitemapManager, Sitecore.XA.Foundation.SiteMetadata" implementationType="CustomSXA.Foundation.SiteMetaData.Services.SitemapManager, CustomSXA.Foundation.SiteMetaData" lifetime="Singleton"/>
    </services>
    <settings>
      <setting name="CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit" value="50MB" />
    </settings>
  </sitecore>
</configuration>

Update CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit to fit your needs. Use KB, MB, or Bytes. If in bytes, provide only the numeric value. Sitecore.StringUtil.ParseSizeString(), used to process this setting, can also take GB, so please make necessary validation in the code if needed. Update the namespace also with yours.

Consider installing the Nuget packages from this list. Build the solution and deploy it.

Demo

Default SXA Sitemap.

Default SXA Sitemap Output

Default SXA Sitemap Output

With the custom SitemapManager code. For demo purposes, we have set the “Sitemap Index Threshold” (URL count limit) to 5 in the Sitemap setting item, and have set the “CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit” (content size limit ) to 1000, i.e., 1000 Bytes. Please update these settings as per your requirements.

SXA Sitemap is split into multiple Sitemaps based on the size and URL count limit

SXA Sitemap is split into multiple Sitemaps based on the size and URL count limit

SXA Sitemap split if the URL count is within the given SitemapIndexThreshold and if the size exceeds the given size limit.

SXA Sitemap split if the URL count is within the given SitemapIndexThreshold and if the size exceeds the given size limit.

Good to read the Sitecore documentation Prioritize a page in the search engine sitemap to manage sitemap settings at the page item level.

Hope this helps. Happy Sitecore Learning!





Source link

Social media & sharing icons powered by UltimatelySocial
error

Enjoy Our Website? Please share :) Thank you!