Build an internal link recommendation API in 25 lines of code
Published on under the SEO category.Toggle Memex mode
A thoughtful, interconnected site structure is great for both people and search engines. There are many components of designing the structure of a site, from choosing the right URLs to creating breadcrumbs to help people navigate around your site. Linking to relevant content -- from similar articles that together form a linear track to linking to related editorial content -- helps people understand what you have to offer on your site, and helps search engines discover new pages on your site.
In my work managing large websites (in aggregate, representing over 300k web pages for indexing), I have found that at least 4-6 internal links to each page on a site is ideal for improving one's ability to rank for key terms. Priotizing "canonical" content, which represents your most comprehensive page on a given topic, is key, too.
I am working on a Python package called SEOtools with utilities useful for both web developers and SEOs looking to improve and understand the architecture of their websites. In 25 lines of code, you can build a web API that recommends internal links given a keyword. In this article, I will show you how to build your own internal link recommendation API. Let's begin!
Have questions about the API? Need help getting started? Email me at readers@jamesg.blog.
Step #1: Install Dependencies
First, we need to install two Python dependencies: Flask and SEOtools. We will use Flask for running our API, and SEOtools for indexing content in a sitemap and creating a recommendation system. Run the following command to install the requisite dependencies:
pip install flask seotools
With the required dependencies installed, we can start building!
Step #2: Build the API
Open up a new Python file and paste in the following lines of code:
from flask import Flask, request, jsonify
from seotools.app import Analyzer
analyzer = Analyzer("https://jamesg.blog/sitemap.xml", load_from_disk=True)
app = Flask(__name__)
@app.route("/analyze")
def analyze():
query = request.args.get("query")
allowed_directories = request.args.get("allowed_directories", "")
if not query:
return jsonify([])
if allowed_directories:
allowed_directories = allowed_directories.split(",")
recommendations = analyzer.recommend_related_content(query, allowed_directories)
return jsonify(recommendations)
if __name__ == "__main__":
app.run(debug=True)
In this code, we:
- Import the libraries we are using (
flask
andseotools
). - Create an Analyzer() object, which we will discuss below.
- Create a new
/analyze
web endpoint that takes in a query and a list of allowed directories. The allowed directories option lets you restrict recommendations to links that contain any string in a list.
The Analyzer() object crawls the specified sitemap (and all sitemaps linked in the sitemap) and builds a link index. You can do a lot more with this Analyzer() object. You can read more about the Analyzer() object in the SEOtools documentation. Every page in the link index is crawled, and all of the text on the page is "embedded." An "embedding" is a numeric representation of a document that contains semantic information about the document.
When you first run an analysis, all of the results will be saved to JSON files on your computer. The load_from_disk=True
argument lets you load those files instead of re-crawling the specified sitemap every time your application runs. Thus, an index of content will be built once, then used in future. To recreate the index, run Analyzer("https://example.com/sitemap.xml)
without the load_from_disk
argument.
The analyzer.recommend_related_content
function, used above, accepts a text query, embeds the query, then compares the embedding to all of the articles in your sitemap. The most similar results are returned.
To run the API, use the following line of code:
python3 app.py
The API will be available at http://localhost:5000
.
We can generate a recommendation for a keyword by visiting the /analyze
route on the API:
http://localhost:5000/analyze?query=search engine
Here are the top ten results for the query when run on an index of pages on this site (jamesg.blog
):
[
"https://jamesg.blog/2021/09/10/search-engine-direct-answers",
"https://jamesg.blog/2021/08/06/weighing-search-results",
"https://jamesg.blog/2022/01/10/scaling-indieweb-search",
"https://jamesg.blog/2022/08/25/the-chase-firefox",
"https://jamesg.blog/2021/09/20/thoughts-on-building-a-search-engine",
"https://jamesg.blog/2021/08/04/how-my-search-engine-works",
"https://jamesg.blog/2021/07/20/building-a-blog-search-engine",
"https://jamesg.blog/2023/04/05/search-engine-position",
"https://jamesg.blog/2021/09/06/indieweb-search",
"https://jamesg.blog/2021/07/22/search-engine-textrank"
]
All of the top ten results are related to our query! Our recommendation API is working as expected.
To limit our search to specified directories, we can specify the allowed_directories
parameter. This will limit our search to all pages in the /2022/
directory:
http://localhost:5000/analyze?query=search engine&allowed_directories=/2022/
Here are the results:
[
"https://jamesg.blog/2022/01/10/scaling-indieweb-search",
"https://jamesg.blog/2022/08/25/the-chase-firefox"
]
Our API has successfully limited the results to the specified directory. You can specify as many directories as you want. Separate each directory with a comma character (i.e. /2022/,/2023/
).
Conclusion
In this guide, I have walked through how to create an internal linking API with the SEOtools package. You can use this API to recommend content related to a query. This is useful for creating "Related articles" sections on your website given a keyword on a page, or for generating candidates for pillar content that you want to link to on all pages.
For more SEO utilities, check out the SEOtools package documentation. I also made a package called getsitemap, which makes it easy to retrieve all of the URLs in a sitemap. getsitemap has a website, getsitemapurls.com, which lets you retrieve URLs in sitemaps and export them into CSV files for further processing.
Responses
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at readers@jamesg.blog.