I am passing along a tip i practiced today which resulted in impressive performance gains on my website. I ended, at least temorarily, site robots rumaging through obsolete urls. the gains have been impressive thus far.
As back ground, i have a website which i have operated off and on for over 10 years. certain robots, such as google's, cling to its urls like some people cling to elvis - they just can't let them go. I have seen google bots crawl links from Catalook which I haven't used in years.
I used to think that the activity was harmless, but i discovered that it isn't the case. When 404 errors are generated, there is an expensive process in DNN to handle it, and i have taken up that issue in a separate forum - i don't want to address it here. At one point 404s were taking up 5% or more of my site requests. While that doesn't seem like much, it has a snowballing effect. I noticed that my website would become intermittently slow and molassasy. While the overall performance numbers were quite good, these flies in the ointment ruined the picnic.
I have been measuring my performance with LeanSentry's server tool which is designed to monitor, detect, isolate, troubleshoot, and diagnose iis / asp.net application performance issues - down to the line of code causing the problems (although i can't afford that edition). while running it, i noticed that it was reporting 404 errors at 4.5% or higher of all requests. So i started looking around, and lo and behold i found google search console where my website was registered years ago. it has a feature to remove or suspend website crawling by google's spiders. there of course is the old stand by robots.txt, but in this case i wanted to go to the source.
after submitting many stale links, the googlebots died down - within the same day actually, and satisfaction score has soared from 87.x% to 94.1% as of now. the number of sluggish to slow pages has dropped from 9.5% to 5.5% and i expect more gains to come over the next 12 hours. These seemingly small % changes have resulted in big improved quantifiable and palpable performance gains in the website. 95% of my requests are now handled within 1 second or less and that makes a huge difference in website experience. (1 second is the threshold i set in LeanSentry as satisfactory, although it defaults and recommends 2 seconds)
This 404 information is in the dnn administrator logs as well, but i kept saying manana to it because there was no correlation in my mind to the failures and the intermittent but persistent and annoying slowdowns on the site. with LeanSentry penalizing me for them, it goaded me into taking action.
whether you use the google search console, robots.txt, or redirects or all of the above, resolving 404 responses is time well spent because they are expensive to process. my advice is do not spend any money on hardware or software fixes until your 404s due to spiders are at a bare minimum. you can't get rid of all of them, but they should be managed.
Interesting, thanks.
We have the SEO-Redirect module on github to help out resoliving 404's, but I'll definitely give it a thought to see if it could be improved with this experience.
Posted By Stefan Kamphuis on 08 Feb 2021 08:34 AM Interesting, thanks. We have the SEO-Redirect module on github to help out resoliving 404's, but I'll definitely give it a thought to see if it could be improved with this experience.
I would love a utility or module feature which would take 404s and put them in a file formatted as Robots.txt Disallow: format. The urls could either be added a la copy and paste to the live Robots.txt, or used to create redirects to the new url. my situation is that a lot of search engines have a bunch of stale and old urls which are not elligible for redirects. the search engines need to stop pestering me about those urls. they are never coming back. get over it.
I probably could select the 404s from one of the log tables to get what i need for transformation to Disallow: directives
Maybe this isn't the best solution, but it is the one i am following now and it is for site performance reasons as much as it is for seo.
Right, hte SEO Redirect module works a lot faster than normal processing, since it just takes the url and see if it needs to be redirected.
I can imagine we could add an option to that to
I'll ask my colleagues to share their thoughts on this too...
Greetz, Stefan
another use case would be to take 404s and route them to a specific page where i could offer options on how to proceed. that would probably still need to at least return a 301 which is a non-200 - not sure how dnn handles that. but if your module does that, i would be a buyer. however, i have uncovered a deeper cause to my performance issues and it may not really be dnn error processing that much. from an article at infragistic's website, i got the tip to increase the number of worker threads which by default is very low and may have been a collaborator in my sluggish response times during error conditions. i increased the number to 100 minimum and 200 maximum in the .net 4 machine.config. it ate a lot of memory but so far has been well worth it. fortunately i could easily allocate another gigabyte of memory, and more if needed. slow and sluggish request counts have dropped off dramatically even in the face of 404 errors. i also increased iis cache size - not sure how meaningful that was.
PS - the vast majority of my 404s are due to spiders - not to people.
Posted By tony bonn on 09 Feb 2021 02:10 PM another use case would be to take 404s and route them to a specific page where i could offer options on how to proceed. that would probably still need to at least return a 301 which is a non-200 - not sure how dnn handles that. but if your module does that, i would be a buyer.
another use case would be to take 404s and route them to a specific page where i could offer options on how to proceed. that would probably still need to at least return a 301 which is a non-200 - not sure how dnn handles that. but if your module does that, i would be a buyer.
That's exactly what our module does and there's no need to "buy" it :-)
My suggestion would be to take on the root cause: tell Google not to index old pages. https://www.google.com/webmasters/tools/url-removal
Posted By Tycho de Waard (SU) on 11 Feb 2021 09:25 AM My suggestion would be to take on the root cause: tell Google not to index old pages. https://www.google.com/webmasters/tools/url-removal
thank you - i found that page a couple of days ago and have been using it. unfortuntately google is by no means the only crawler of stale links, so i add to the robots.txt file in the hopes that some of the other crawlers will honor it.
These Forums are dedicated to the discussion of DNN Platform.
For the benefit of the community and to protect the integrity of the ecosystem, please observe the following posting guidelines:
Awesome! Simply post in the forums using the link below and we'll get you started.