In my first weeks at Atlassian, I was seeking for a quick-win to get a stronger foothold within the company and evangelize SEO. So, I audited our main site and noticed it didn’t have an XML sitemap. What an easy win!
I went to the devs and asked them to activate it in the CMS. To my surprise, they told me that it wasn’t possible; I was baffled.
After some thinking, I remembered that Screaming Frog had an XML sitemap function, so I scraped the site and uploaded the crawl as an XML sitemap. Google ate it within a few seconds, and we saw a noticeable impact on our traffic in the following days.
The moral of the story is that XML sitemaps are important and sometimes underrated.
Here is everything I'm going to cover in this articleWhat XML Sitemaps Are and Why You Need to Have One HTML vs. XML Sitemaps Different Types of XML Sitemaps XML Sitemap Minimum Requirements XML Sitemap Tips for Large Sites XML Sitemap Best and Worst Practices XML Sitemap Tools and Generators
What XML Sitemaps Are and Why You Need to Have One
XML sitemaps are digital maps that help Google discover important pages on your site and how often they are being updated.
Google states on its help center page:
A sitemap tells the crawler which files you think are important in your site, and also provides valuable information about these files: for example, for pages, when the page was last updated, how often the page is changed, and any alternate language versions of a page.
According to Gary Illyes, XML sitemaps are the second most important source of URLs to be crawled by Googlebot after hyperlinks and previously discovered URLs. That’s massive and shouldn’t be underestimated!
Sitemaps are the second Discovery option most relevant for Googlebot @methode #SOB2019
Enrique Hidalgo (@EnriqueStinson) June 15, 2019
Google started using XML sitemaps in 2005 and shortly after was joined by search engines like MSN or Yahoo. Nowadays, they use them for even more than just URL discovery.
Every website should have an XML sitemap. They are especially important for:
Sites with lots of orphaned pages
Sites that use lots of images and videos
Whereas the robots.txt helps you to exclude parts of your site from being ranked in search engines, XML sitemaps do the opposite. They help search engines discover new pages — even when they are not linked from the main site.
Sitemaps come in XML format that Google can quickly parse to find new URLs. XML — eXtensible Markup Language — is lightweight and portable between devices and was made to store data.
The easiest way for you to check if your site has a sitemap is to look in Google Search Console or in Bing Webmaster Tools under “sitemaps.” Most search engines, such as Google or Bing, look for the “Sitemap: <sitemap_location>” entry (or entries) in your site’s robots.txt file. Alternatively, you can also ping your sitemap directly to Google, Baidu, Bing, and Yandex.
XML sitemaps in the Bing Webmaster Tools. 1: Sitemaps report. 2: Adding new sitemap paths. 3: existing sitemaps Bing found.
XML sitemaps in Google Search Console. 1: Sitemaps report. 2: Adding new sitemap paths. 3: existing sitemaps Google found.
HTML vs. XML Sitemaps
There are two types of sitemaps: HTML and XML. What is the difference?
1. You will notice the format.
HTML is obviously different from XML. But that implies even more: while HTML sitemaps are visible to site users, XML sitemaps are feeds for search engines.
You could argue that HTML sitemaps are also created for search engines, but while they can be valuable to users, XML sitemaps cannot.
2. They serve the same purpose but in different ways.
Both help search engines discover new URLs, whether pages, videos, or images.
XML sitemaps are custom feeds that help search engines understand the priority of URLs to crawl, how often they change, and which new ones were added to the site. That is especially helpful for search engine schedulers because they can better estimate when and how often to recrawl a URL.
HTML sitemaps also help search engines discover new URLs but through the discovery of links they follow. That means HTML sitemaps can only be an effective URL discovery tool if they are being crawled and if the links are followed. You can understand this by looking at your log files.
3. They have different side-benefits.
XML sitemaps have meta-attributes like <changefreq> or <lastmod> to indicate how the state of URL changes. They can also carry extensions for videos, images, and news.
HTML sitemaps distribute PageRank throughout a site, and that is what they are nowadays mainly used for, aside from the navigational value for users. Since HTML sitemaps are often linked in the footer of a site, they are usually linked from every page and might distribute that incoming PageRank to other pages with weaker internal linking.
Different Types of XML Sitemaps
Even though XML sitemaps can be submitted in RSS, mRSS, Atom 1.0 or text format, the “type” of a sitemap refers to its content or “media type”:
As I will further specify below, you can create sitemaps that contain only one specific media type or integrate them into your regular XML sitemap.
XML Sitemap Minimum Requirements
For your XML sitemaps to work optimally, you have to meet the standards. An XML sitemap should:
Contain only canonical URLs with a 200 status code.
Include up to 200K URLs per sitemap and up to 50K sitemaps per index sitemap.
Be referenced in the robots.txt.
Be compressed in .gz format.
Be no larger than 50mb or contain 50,000 URLs (whatever you hit first).
But there is more you can and should do to get the most out of XML sitemaps. You can signal to Google which URLs are important by including only important pages in XML sitemaps, and by updating it often.
Most CMSs have a function to automatically update sitemaps when a new URL is created or an existing page changes. For Google, the update frequency of the sitemap itself and the lastmod tag of pages can be a signal of freshness. Whether that is important for its ranking depends on the page and the context.
XML Sitemap Tips for Large Sites
There is more you can do to elevate your sitemap game, beyond meeting the standard requirements.
Large sites like news publishers, for example, should make use of index sitemaps, which contain (up to 50,000) normal sitemaps, and should also not be heavier than 50mb. They are like the XML sitemap mothership that carries lots of smaller sitemaps. Large sites need them because they can’t fit into a single sitemap. You shouldn’t try to fit everything into a single sitemap, anyway.
You can make the most out of these sitemaps by structuring them either per page type or topic. In practice, you would create dedicated XML sitemaps per subdirectory or page template to get an understanding of technical and indexing problems with your site.
There are specialized XML sitemaps for specific purposes. Sites that operate heavily around rich media (think: Pinterest or YouTube) benefit a lot from image or video sitemaps. Publishers should have news sitemaps.
Image sitemaps increase your site’s chance to be found in Google image search. You don’t have to have a dedicated image sitemap; you can also use image extensions in your regular sitemap.
This is what image extensions look like ( XML specifications):<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"> <url> <loc>http://example.com/sample.html</loc> <image:image> <image:loc>http://example.com/image.jpg</image:loc> </image:image> <image:image> <image:loc>http://example.com/photo.jpg</image:loc> </image:image> </url> </urlset>
Video sitemaps function after the same principle: either create a dedicated sitemap or add extensions to your regular one:<url> <loc>https://example.com/mypage</loc> <video> ... information about video 1 ... </video> </url>
But be careful with the meta-data you add to video sitemaps or extensions.
Google states, “Google might use text on the video landing page rather than the text you supply in your sitemap if the page text is deemed more useful than the information in the sitemap.” They are speaking about the text delivered through the description. Besides a description, you can feed Google a thumbnail, video length, rating, family-friendliness, and more ( full list of video XML sitemap meta-data). For sites that heavily use video, this certainly makes sense. For all others, it is relatively optional.
News sitemaps are different in that you should always have a separate news XML sitemap. Google doesn’t recommend (or offer) extensions in this case. News sitemaps help Google discover and rank new articles, which is especially challenging in the publishing industry because it produces a lot of content. Even though Google states that publishers with news sitemaps are not favored, it does help to get hot news ranking in Google News faster.
News sitemaps have special requirements:
Include articles not older than 2 days.
Don’t add more than 1000 new entries to an existing sitemap at a time.
Update existing sitemaps for article updates.
You can also use XML sitemaps to define and indicate certain meta-tags for Google. One example is hreflang; you can add as an extension to a sitemap (full guidelines):<url> <loc>http://www.example.com/english/page.html</loc> <xhtml:link rel="alternate" hreflang="de" href="http://www.example.com/deutsch/page.html"/> <xhtml:link rel="alternate" hreflang="de-ch" href="http://www.example.com/schweiz-deutsch/page.html"/> <xhtml:link rel="alternate" hreflang="en" href="http://www.example.com/english/page.html"/> </url>
Google ignores the priority attribute in XML sitemaps but does pay attention to lastmod, according to John Mueller. Google determines the priority of your pages itself, probably by popularity and authority. Lastmod, however, is a tag that indicates when the URL has changed the last time, which is really interesting to Google.
The URL + last modification date is what we care about for websearch.
? John ? (@JohnMu) August 17, 2017
Also, you don’t need to add XML sitemaps for AMP URLs, according to John Mueller.
@Kfowler325 No need for sitemaps for AMP pages — the rel=amphtml link is enough for us.
? John ? (@JohnMu) October 13, 2016
XML Sitemap Best and Worst Practices
At Atlassian, we solved the missing XLM sitemap functionality of our CMS with a 3rd party XML sitemap provider, and it worked just fine.
Even though the format is text-based instead of XML, it works.
The New York Times references its sitemaps in the robots.txt and separates formats like videos or news. It goes even a step further and has sitemaps for specific categories, such as cooking or elections.
It makes sense to have dedicated XML sitemaps to timely events as a publisher because you need to understand how fast Google picks the content up and if everything can be indexed without problems.
Walmart has a similar split by categories that makes a lot of sense for ecommerce sites. It has Master XML sitemaps for topics and categories.
As you can see in the screenshot below, the topic split allows Walmart to see how Google indexes different areas of the site like fashion or entertainment.
If you have a site that is split into topic, categories, or both, creating specific XML sitemaps for each is recommendable. There is no known disadvantage of having the same URLs in different sitemaps.
Semrush Tip: With the Semrush Site Audit tool, you can audit any website and check for six specific issues related to XML sitemaps. The tool will first check for if a sitemap.xml is present or not, and then it will look for formatting errors, incorrect pages in the sitemap, and other issues that could be impacting the clarity of your sitemap.
XML Sitemap Tools and Generators
Most content management systems come with prepackaged functions that allow you to create an XML sitemap automatically. But some don’t, and in this case, you need a third-party tool.
You might also want to read: 10 of the Best Sitemap Generator Tools...
These are my personal picks for XML sitemap generators.
n/aDrag and drop builder Custom page type inclusion Import text file Cloning Batch editing Highly customizable User permissions Custom branding
200K URLs per crawlMonitor URLs in sitemaps in Google Analytics Highly customizable Custom page type inclusion Workflow management URL tagging Sitemap filtering User permissions Custom branding
n/aCustomizable Custom groups Drag and drop builder
3 sitemaps free
Free to 500 URLs
n/aNot made for XML sitemaps but good workaround for technical restrictions
15K pagesNot made for XML sitemaps but can export a feed into XML format
$4.99 for 1K pages
$189.99 for 1.5m pages
1,5m pagesImage and video sitemaps Email notifications Mobile app Detects broken links
Free for 500 pages
Simple Wp Sitemap
n/aHTML and XML sitemap Dynamic sitemaps
Google Sitemap by BestWebSoft
n/aHreflang support Customizable
Google XML Sitemaps
n/aDynamic sitemaps Customizable
free (premium available)
n/aBasic, dynamic sitemap
WordPress XML Sitemap Plugin
All in One SEO Pack
n/aBasic, dynamic sitemap
XML Sitemap & Google News
n/aBasic, dynamic sitemap Customizable Updates lastmod automatically
Innovative SEO services
SEO is a patience game; no secret there. We`ll work with you to develop a Search strategy focused on producing increased traffic rankings in as early as 3-months.
A proven Allinclusive. SEO services for measuring, executing, and optimizing for Search Engine success. We say what we do and do what we say.
Our company as Semrush Agency Partner has designed a search engine optimization service that is both ethical and result-driven. We use the latest tools, strategies, and trends to help you move up in the search engines for the right keywords to get noticed by the right audience.
Today, you can schedule a Discovery call with us about your company needs.