DNN Forums

Ask questions about your website to get help learning DNN and help resolve issues.

DNN CPU Spikes Investigation - Need Community Input

 12 Replies
 3 Subscribed to this topic
 18 Subscribed to this forum
Sort:
Page 1 of 212 > >>
Author
Messages
New Around Here Send Private Message
Posts: 5
New Around Here

## Issue Description

The site experiences consistent CPU spikes to 100% every 10 minutes. While normal CPU usage hovers around 50%, these spikes cause server response times to increase up to 10 seconds. Users who access the website during these spike periods experience significant performance degradation and slow page loads. Although not all users are affected (depending on their timing of access), the impact on user experience during these spikes is substantial. This pattern seems indicative of an underlying issue that scaling alone might not solve.

## Environment

- DNN Version: 09.13.03 (0)

- Hosting: Azure App Service

- Plan: Standard S1 (ACU 100, vCPU 1, 1.75GB Memory)

- Scale: Up to 2 instances (issue persists with single instance)

## Site Details

- Single site with approximately 100 tabs and 700 modules

- Less than 100 unique daily users

- Cache Configuration:

  - Caching Provider: SimpleWebFarmCachingProvider

  - Module/Page Output Cache Provider: Memory

  - Cache Setting: Moderate

  - Cacheability: ServerAndNoCache (both authenticated/unauthenticated)

## Investigation Steps Taken

1. Ran DNN_Site_Evaluation.sql script - all metrics within recommended ranges

2. Disabled all schedulers

3. Disabled EventLogBuffer via HostSettings

4. Set httpRuntime.fcnMode="Disabled" on web. config (Note: Unable to disable file system watching to test if this is related)

5. Added logging to different namespaces but there was nothing alarming or properly aligned with the spikes' timeframe.

## Questions

1. Has anyone encountered similar periodic CPU spikes?

2. Which background tasks could be triggering the spikes?

3. What additional diagnostics would help identify the root cause?

4. Could moving from local file system to Azure storage be beneficial? The logs show extensive file watching activity across all application subdirectories, and I'm wondering if shifting to Azure storage might reduce CPU overhead from I/O operations and potentially resolve these spikes. Logs show even /Portals/_default/Logs directory being watched.

4. Any other recommendations to help pinpoint the cause?

Thank you in advance for any insights or suggestions!

Veteran Member Send Private Message
Posts: 546
Veteran Member
Most likely, this is caused by a scheduler job.
Please check History tab in PersonaBar > Administration > Scheduler
Growing Member Send Private Message
Posts: 116
Growing Member

The plan you are running this on is under powered. Are you sure this "700 modules"  is correct?

New Around Here Send Private Message
Posts: 7
New Around Here
Since you've already disabled all scheduler jobs, I would suggest the issue is either an Application Pool crash (and respectively a recycle) or a particular page/module/feature causing high CPU usage.

The first case is usually caused by stack overflow exceptions or exceptions that occur on background threads that some modules or DNN would run.
There are easy to spot in OS Event Log but more difficult to debug and trace down.

The second case would also divide in 2
- things that run on the main thread which would be visible to the user, i.e. the page would not complete loading until the CPU goes down.
- again, background jobs that do heavy work - for example older versions of our Search Boost module would run a background job every X minutes to discover new content to index.

If you can share the list of modules and versions that you have installed on that instance, it might help pinpoint some of the usual suspects.

Finally, since you are on Azure, did you try to configure Azure App Insights?

Advanced Member Send Private Message
Posts: 133
Advanced Member
As an idea to try, once a client setup a link incorrectly and failed to add the protocol to the href. This results in a reload of the page with added params and hence loops until it fails. When a user clicks the links it doesn't cause a big issue with CPU, but we found when a robot goes over the site, the CPU was peaking to 100% for a long time and slowing the site.

To check for this situation, look at you logs for a very long line entry and then check the links on that page.
New Around Here Send Private Message
Posts: 5
New Around Here

My apologies, I may have used the wrong term. I was refering to the Modules/TabModules tables. The ModulesDefinition count is 32.

Here is the result from the DB evaluation script:


------- General Table Sizes -------
1 rows in the Portals table       -- Usually less than 100 rows
11 rows in the PortalAlias table       -- Usually between 1-2 times the number of portals
1 rows in the PortalLanguages table       -- Usually between 1-3 times the number of portals
0 rows in the PortalGroups table       -- Usually zero, take notice if it is not
6 rows in the Users table       -- Usually less than 5,000 rows
9 rows in the Roles table       -- Usually 10-20 times the number of portals
101 rows in the Tabs table       -- Usually less than 2,000 rows
30 rows in the DesktopModules table       -- Usually less than 200 rows
1 rows in the WebServers table       -- Usually less than 4 rows
 
------- Performance Related Table Sizes -------
341 rows in the EventLog table       -- Usually less than 10,000
0 rows in the SiteLog table       -- Usually zero
710 rows in the ScheduleHistory table       -- Usually less than 10,000 rows
1 rows in the ContentWorkflows table       -- At least 3 times the number of portals
0 deleted users       -- As close to zero as possible
0 deleted pages       -- As close to zero as possible
0 deleted modules from pages       -- As close to zero as possible
0 unsent email notifications       -- As close to zero as possible

As for the plan remark, what would you consider to me a minimum configuration to run an instance as such? I understand it's a very basic plan but most of the time (when not spiking) the usage stays below 60%.

New Around Here Send Private Message
Posts: 5
New Around Here

I've just set up App Service Platform logging so that I can confirm if the OS Logs can add to the investigation.

Here is a list of all installed modules:

DDRMenu Version 9.13.3
DotNetNuke.Modules.CoreMessaging Version 9.13.3
DNN_HTML Version 9.13.3
Journal Version 9.13.3
DotNetNuke.Modules.MemberDirectory Version 9.13.3
DNNCorp.RazorHost Version 9.13.3
Social Groups Version 9.13.3
DNNGo.Megamenu Version 1.0.6
DNNGo.SkinObject.Megamenu Version 1.0.6
DNNGo.SkinPlugin Version 1.1.0
20063-UnlimitedColorsPack-045-Skins Version 5.8.0
20063-UnlimitedColorsPack-045-Containers Version 5.8.0
DNNGo.SliderRevolution3D Version 1.1.2
DNNGo.PowerForms Version 5.8.0
DotNetNuke.HtmlEditorManager Version 9.13.3
SecurityAnalyzer Version 8.0.0
Skin.Xcillion Version 9.13.3
Container.Xcillion Version 9.13.3
DNNConnect.CKEditorProvider Version 9.13.3
33568_0_UnZip_DNNGo20063_Unlimited045_AllModules_2.3.0 Version 1.0.0
33568_0_UnZip_DNNGo20063_Unlimited045_AllModules_2.3.0 Version 1.0.0
DNNGo.xPlugin Version 4.0.2
DNNGo.SkinObject.xPlugin Version 4.0.2
DNN_IFrame Version 8.0.0
Dnn.EditBar.UI Version 9.13.3
Dnn.PersonaBar.UI Version 9.13.3
DotNetNuke.Console Version 9.13.3
SiteExportImport Version 9.13.3
Dnn.PersonaBar.Extensions Version 9.13.3
DotNetNuke.Newtonsoft.Json Version 13.0.1
Dnn.AzureConnector Version 9.13.3
AspNetClientCapabilityProvider Version 9.13.3
DotNetNuke.Providers.FolderProviders Version 9.13.3
DNNGo.DNNGallery Version 7.0.1
DotNetNuke.AspNetMvc Version 5.2.61129
DotNetNuke.AspNetWebApi Version 5.2.61129
DotNetNuke.AspNetWebPages Version 3.0.61129
DotNetNuke.MailKit Version 2.15.0
DotNetNuke.SharpZipLib Version 1.3.3
Microsoft.Extensions.FileSystemGlobbing Version 5.0.20
DotNetNuke.WebFormsMvp Version 1.4.5
ResourceManager Version 9.13.3
TelerikRemoval Version 9.13.3
DNN.Connectors.GoogleAnalytics4 Version 9.13.3
DNN.Connectors.GoogleAnalytics Version 9.13.3
DNN.Connectors.GoogleTagManager Version 9.13.3
jQuery.iframe-transport Version 10.7.0
jQuery Version 3.7.1
Knockout Version 3.5.1
Selectize Version 0.12.6
DotNetNuke.Providers.Caching.SimpleWebFarmCachingProvider Version 9.13.3
Dnn.ExchangeOnlineAuthProvider Version 9.13.3
Dnn.GoogleMailAuthProvider Version 9.13.3
HoverIntent Version 1.10.1
jQuery-Migrate Version 3.4.1
jQuery-UI Version 1.13.2
DnnPlugins Version 9.13.3
jQuery.Fileupload Version 10.7.0
AppInsights Version 4.0.2

Regarding Application Insights, it's connected and working but I can't see anything out of ordinary in the logs. It doesn't log background tasks, only requests/responses. I was able to get more details by including some namespaces in the log4net configuration (DNN.Trace, DotNetNuke.Web.Common.Internal.DotNetNukeShutdownOverload, DotNetNuke.Services.GeneratedImage, DotNetNuke.Web.Client, DotNetNuke.Services.Cache, DotNetNuke.Services.ModuleCache, DotNetNuke.Services.OutputCache) but it's almost like the logs go silent during the spikes. You can see logs before and after, never quite aligning with the spikes.

New Around Here Send Private Message
Posts: 5
New Around Here

David, I appreciate your suggestion but I'm not sure I understand what exactly I should look for in the logs. Is it logged under a specific namespace or with a with any specific message?

The only outstanding logs I can identify:

 

[WARN] DotNetNuke.Services.Localization.LocalizationProvider.LocalizeControlTitle(0) - Missing localization key. key:ControlTitle_.Text resFileRoot:/Portals/_default/Containers/20063-UnlimitedColorsPack-045//App_LocalResources/ threadCulture:en-US userlan:
=> Once I set up log4net to log other levels, I started seeing this. First the resFileRootPath showed show actual files (App_GlobalResources/GlobalResources.resx, /DesktopModules/HTML/App_LocalResources/HtmlModule.ascx.resx) but after doing some research it looked like it's not an actual issue. 

 

[ERROR] DotNetNuke.Services.Exceptions.Exceptions.LoadSkin(0) - DotNetNuke.Services.Exceptions.PageLoadException: Unhandled error loading page. ---> System.Web.HttpException: The file '/Portals/_default/skins/printerfriendlypage.ascx' does not exist.

   at System.Web.UI.Util.CheckVirtualFileExists(VirtualPath virtualPath)

   at System.Web.Compilation.BuildManager.GetVPathBuildResultInternal(VirtualPath virtualPath, Boolean noBuild, Boolean allowCrossApp, Boolean allowBuildInPrecompile, Boolean throwIfNotFound, Boolean ensureIsUpToDate)

   at System.Web.Compilation.BuildManager.GetVPathBuildResultWithNoAssert(HttpContext context, VirtualPath virtualPath, Boolean noBuild, Boolean allowCrossApp, Boolean allowBuildInPrecompile, Boolean throwIfNotFound, Boolean ensureIsUpToDate)

   at System.Web.UI.TemplateControl.LoadControl(VirtualPath virtualPath)

   at DotNetNuke.UI.ControlUtilities.LoadControl[T](TemplateControl containerControl, String controlSrc)

   at DotNetNuke.UI.Skins.Skin.LoadSkin(PageBase page, String skinPath)

   --- End of inner exception stack trace ---
=> I have no idea which module is trying to use the printerfriendlypage.ascx file. I've looked in DNN's installation package, in my skin package and I didn't find it. The stacktrace also doesn't give any clue on which module is triggering it.

Besides these, I can also see some image caching issues but they happen every couple of hours so I don't think it's correlated.

Advanced Member Send Private Message
Posts: 133
Advanced Member

Sorry, I was not very clear. I will try and explain a little more, but it's a rare situation.

I was talking about the IIS logs, but you're running in an Azure AppServcie. I'm unsure about how that deals with logs.

To explain more on the issue one of our clients had this URL (I cannot get this editor to show the raw html).

test

will cause a simple reload of the "blog" page as normal, with a param of test.

We found that if the link was written incorrectly with a URL like this.
test

We have the page displaying as normal, with a normal user seeing the blog page with a parameter. However it adds the duplicate params.

http://www.mysite.me/en-us/blog/blog/test

Which is not a problem for a users on a browser. However we found some robots kept looping on the page, by clicking the link and hence we get a long URL

http://www.mysite.me/en-u.../blog/blog/blog/test

This creates a CPU spike as the robot keeps looping on the page and creates a very long URL in the logs.





 

Advanced Member Send Private Message
Posts: 133
Advanced Member
Another thought is that the AppPool may have a maximum limit, which makes it spin a new AppPool. Maybe you are hitting the memory limit for an AppPool for your AppService and that is causing a reload of the AppPool, which will cause a CPU spike. I am not the Azure expect, I leave that to others. But, adding max limits to IIS will cause this if the threshold is crossed, I suppose Azure has the same functionality. Maybe the hosting plan just needs to be increased?
Page 1 of 212 > >>

These Forums are dedicated to the discussion of DNN Platform.

For the benefit of the community and to protect the integrity of the ecosystem, please observe the following posting guidelines:

  1. If you have (suspected) security issues, please DO NOT post them in the forums but instead follow the official DNN security policy
  2. No Advertising. This includes the promotion of commercial and non-commercial products or services which are not directly related to DNN.
  3. No vendor trolling / poaching. If someone posts about a vendor issue, allow the vendor or other customers to respond. Any post that looks like trolling / poaching will be removed.
  4. Discussion or promotion of DNN Platform product releases under a different brand name are strictly prohibited.
  5. No Flaming or Trolling.
  6. No Profanity, Racism, or Prejudice.
  7. Site Moderators have the final word on approving / removing a thread or post or comment.
  8. English language posting only, please.

Would you like to help us?

Awesome! Simply post in the forums using the link below and we'll get you started.

Get Involved