SEO Tip: Watch What’s Getting Indexed
Search Engine Optimization hasn’t been my strong suit, but the more I look into it, the more I realize how sloppily my websites have been getting indexed. Just for fun, do a Google search for the following:
site:yourdomain.com
This will limit the search to whatever you replace “yourdomain” with and show every page that Google has indexed within that site. The results are often surprising, especially if you run a WordPress blog. When I first started taking SEO seriously on my own blog, I found that Google had indexed my login page, contact form, plugin directory, a bunch of dead test pages, and the comment feeds for several posts.
In fact, doing similar searches on other subdomains showed that Google had found websites of mine that weren’t even linked to from anything yet! I don’t know how they got to this stuff, but they’d indexed websites I was still working on and didn’t want the world to see.
When this happens, there is, fortunately, a way to take that junk out of search results. Using Google Webmaster Tools, you can submit a request to remove certain URLs or even an entire site from Google by going to Site Configuration > Crawler Access > Remove URL.
In order for a removal request to go through, though, you’ll have to make sure the URL is blocked by your robots.txt file. There’s already a ton of information available on how to write a robots.txt file, so I won’t repeat what’s already been established. I will, however, stress the importance of having one from the very start. Web crawlers can get access to some strange places, and if you don’t want those strange places showing up in Google, put a robots.txt file in place the moment you upload your site to the Internet.


Comment: