DNN Forums

Ask questions about your website to get help learning DNN and help resolve issues.

Noindexing

Sort:
You are not authorized to post a reply.
Page 1 of 212 > >>





New Around Here





    Hi,

    We have a webpage on our website that I have marked in the settings to present Google or other search to not crawl it. The webpage has to be on the site, just not searchable by a webcrawler. I have also asked Google to disallow indexing, which can be done temporarily. This is the webpage: https://www.nafo.int/Libr...rking-Papers/STACFAD

    I also included in the page header tags a no index metatag:

     <meta name="https://www.nafo.int/Portals/0/PDFs/Working%20Papers/STACFAD/STACFAD-2021/stacfadwp21-01.pdf" content="noindex">
    <meta name="https://www.nafo.int/Portals/0/PDFs/Working%20Papers/STACFAD/STACFAD-2021/stacfadwp21-05.pdf" content="noindex">
    <meta name="https://www.nafo.int/Library/Working-Papers/STACFAD" content="noindex">
    <meta name="googlebot" content="noindex">

    I have also coded in the config files/robots to disallow these pages:

    Disallow: /portals/0/Images/Secretariat/
    Disallow: /Working-Papers/STACFAD
    Disallow: /stacfadwp21-05.pdf

    Why do the images from these pages still show up in google, especially if I use the search term "NAFO LOGO"? The images show up. I do not want them to show up. I thought I had set everything up in these areas to prevent this from happening.

    Any suggestions are welcome. There are old images showing up as well that people seem to access and would prefer if they were not accessible via the internet.

    Alexis






    Senior Member





      First, it sounds like you're doing all of the correct things. 

      In my personal and anecdotal experience doing these things should still always be done when it makes sense.  However, there are nuances to this too.  I've also found that Google will still crawl and have record of everything it can, but "indexing" is treated differently over "knowing the content is there." 

      In the case of the images, specifically, that's very interesting.  It looks like you, again, doing the right things.  I'd recommend going into the Google Search Console to try and remove them from the index and search results.  






      Veteran Member





        For what it is worth, we run into some of the same issues with Google from time to time and it sounds like you are doing all the right things. Will's suggestions are great.

        David Poindexter


        Creator:







        Growing Member





          For exactly this situation two years ago, we wrote IIS Rewrite rules to redirect HTTP_USER_AGENT Googlebot for specific files to the 403 error page. It seems to have worked. We still needed to request removal of the images that had already been crawled and indexed. Also check what Internet Archive (waybackmachine) has indexed.





          Veteran Member





            Interesting and creative solution James - thanks for sharing!

            David Poindexter


            Creator:







            Growing Member





              I think that robots.txt ONLY requests "Please don't crawl this area". In my mind that's not "Please don't crawl this area AND delete anything you already have." So I assume that if those pages have EVER been crawled then, in principle, Google has them.

              It's never been entirely clear to me *why* we would block a crawler. The pages/images presumably are not secret/confidential. If they were then I'd expect they'd be behind a password challenge. Why not just let the crawlers crawl?
              Thanks,
              Richard
              www.dynamisys.co.uk





              Growing Member





                In our case there were jpg thumbnails of copyright pdfs. The pdfs were protected behind registration sign-up, the copyright holder did not want anything indexed by google (not even the thumbnail images), but we still had to show the thumbnails on the site to encourage people to sign up. 






                Veteran Member






                  Posted By RichardHowells on 2/9/2024 5:32 PM
                  I think that robots.txt ONLY requests "Please don't crawl this area". In my mind that's not "Please don't crawl this area AND delete anything you already have." So I assume that if those pages have EVER been crawled then, in principle, Google has them.

                  It's never been entirely clear to me *why* we would block a crawler. The pages/images presumably are not secret/confidential. If they were then I'd expect they'd be behind a password challenge. Why not just let the crawlers crawl?


                  There are many reasons to keep a page from being indexed (avoid duplicate content SEO issues, marketing landing pages, special purpose pages that should be visited only from a specific user journey, etc.).

                  David Poindexter


                  Creator:







                  Advanced Member





                    You certainly want to make sure whats'a indexed; you can type "site:yourdomain.com" in Google search and it'll show all the pages indexed by Google from the specifeid domain. I use this a lot whenever I've to double check what's public and what's not.

                    Ing. Marco Alvarado Gómez MSc | Globalode
                    Phone. +506 6049-1880 | WhatsApp. +506 6049-1880 | Email. [email protected]
                    Address. Costa Rica (A Pura Vida place!).





                    Advanced Member





                      Hi everybody! I just found this article that will help you delete an image from Google index, 

                      https://www.searchenginej...-search-index/508458

                      Ing. Marco Alvarado Gómez MSc | Globalode
                      Phone. +506 6049-1880 | WhatsApp. +506 6049-1880 | Email. [email protected]
                      Address. Costa Rica (A Pura Vida place!).
                      You are not authorized to post a reply.
                      Page 1 of 212 > >>

                      These Forums are dedicated to the discussion of DNN Platform.

                      For the benefit of the community and to protect the integrity of the ecosystem, please observe the following posting guidelines:

                      1. If you have (suspected) security issues, please DO NOT post them in the forums but instead follow the official DNN security policy
                      2. No Advertising. This includes the promotion of commercial and non-commercial products or services which are not directly related to DNN.
                      3. No vendor trolling / poaching. If someone posts about a vendor issue, allow the vendor or other customers to respond. Any post that looks like trolling / poaching will be removed.
                      4. Discussion or promotion of DNN Platform product releases under a different brand name are strictly prohibited.
                      5. No Flaming or Trolling.
                      6. No Profanity, Racism, or Prejudice.
                      7. Site Moderators have the final word on approving / removing a thread or post or comment.
                      8. English language posting only, please.

                      Would you like to help us?

                      Awesome! Simply post in the forums using the link below and we'll get you started.

                      Get Involved