About a year ago, I was tasked with greatly expanding our url rewrite capabilities. Our file based, nginx rewrites were becoming a performance bottleneck and we needed to make an architectural leap to that would take us to the next level of SEO wizardry.
In comparison to the total number of product categories in our database, Stylight supports a handful of “pretty URLs” – those understandable by a human being. Take http://www.stylight.com/Sandals/Women/ – pretty obvious what’s going to be on that page, right?
Our web application, however, only understands http://www.stylight.com/search.action?gender=women&tag=10580&tag=10630. So, nginx needs to translate pretty URLs into something our app can find, fetch and return to your page. And this needs to happen as fast as computationally possible. Click on that link and you’ll notice we redirect you to the pretty URL. This is because we’ve found out women really love sandals so we want to give them a page they’d like to bookmark.
We import and update millions of products a day, so the vast majority of our links start out as “?tag=10580”. Googlebot knows how dynamic our site is, so it’s constantly crawling and indexing these functional links to feed its search results. As we learn from our users and ad campaigns which products are really interesting, we dynamically assign pretty URLs and inform Google with 301 redirects.
This creates 2 layers of redirection and doubles the urls our webserver needs to know about:
- 301 redirects for the user (and search engines): ?gender=women&tag=10580&tag=10630 -> /Sandals/Women/
- internal rewrites for our app: /Sandals/Women/ -> ?gender=women&tag=10580&tag=10630
So, how can we provide millions of pretty URLs to showcase all facets of our product search results?
The problem with file based, nginx rewrites: memory & reload times
With 800K rewrites and redirects (or R&Rs for short) in over 12 country rewrite.conf files, our “next level” initially means about ~8 million R&Rs urls. But we could barely cope our current requirements.
File based R&Rs are loaded into memory for all 16 nginx workers. Besides 3GB of RAM, it took almost 5 seconds just to reload or restart nginx! As a quick test, I doubled the amount of rewrites for one country. 20 seconds later nginx was successfully reload and running with 3.5GB of memory. Talk about “scale fail”.
What are the alternatives?
Google searching for nginx with millions of rewrites or redirects didn’t give a whole lot of insight, but digging through what I found eventually led me to OpenResty. Not being a full-time sysadmin, I don’t care to build and maintain custom binaries.
My next search for OpenResty on Ubuntu Trusty led me to lua-nginx-redis – perhaps not the most performant solution, but I’d take the compromise for community supported patches. A
sudo apt-get install lua-nginx-redis gave us the basis for our new architecture.
As an initial test, I copied our largest country’s rewrites into redis, made a quick lua script for handling the rewrites and made my first head-to-head test:
I included network round trip times in my test to get an idea of the complete performance improvement we hoped to realize with this re-architecture. Interesting how quite a few URLs (those towards the bottom of the rewrite file) caused significant spikes in response times. From these initial results, we decided to make the investment and completely overhaul our rewrite and redirect infrastructure.
The 301 redirects lived exclusively on the frontend load balancers while the internal rewrites were handled by our app servers. First order of business would be to combine these, leaving the application to concentrate on just serving requests. Next, we set up a cronjob to incrementally update R&Rs every 5 minutes. I gave the R&Rs a TTL of one month to keep the redis db tidy. Weekly, we run full insert which resets the TTL. And, yes, we monitor the TTLs of our R&Rs – don’t want all them disappearing over night!
The performance of Lua and Redis
We launched the new solution in the middle of July this year – just over three months ago. To give some historical perspective, here’s a look at our pageviews over the last 18 months:
And our average response time during the same period:
As you can see, despite rapidly growing traffic, we saw the first significant improvements to our site’s response time just by moving the R&Rs out of files and into redis. Reload times for nginx are instant – there are no more rewrites it to load and distribute per worke – and memory usage has dropped below 900MB.
Since the launch, we’ve double our number of R&Rs (checkout how the memory scales):
Soon we’ll be able to serve all our URLs like http://www.stylight.com/Dark-Green/Long-Sleeve/T-Shirts/Gap/Men/ by default. No, we’re not quite there yet, but if you need that kinda shirt…
We’ve got a lot of SEO work ahead of us which will require millions more rewrites. And now we have a performant architecture which will support it. If you have any questions or would like to know more details, don’t hesitate to contact me @danackerson.