Posts tagged impact

Are you misusing Robots.txt?

 

Lindsay at SEO Moz explores the idea that many of the Internet’s best pages are being effectively blocked by robots.txt files and describes some of the more common flawed implementations of the strategy. Describing the history of robots.txt, Lindsay says, “The robots.txt protocol was established in 1994 as a way for webmasters to indicate which pages and directories should not be accessed by bots. To this day, respectable bots adhere to the entries in the file… but only to a point.”

What if…Your pages are still showing up in SERPs?

Though Google and other engines won’t index the content of a robots.txt file, they may still display the page itself in the index. See a couple of prominent examples below:

Ciscos’ Login Page

As you can see, Cisco’s page shows up for the search term “login” on Google. Since it’s a robots.txt file, it lacks a meta description as well as a text snippet in the description.

WordPress’ Next Blog Page

Shown below, WordPress’ Next Blog page is also indexed by Google but lacks a full SERP result. Clearly these examples show that robots.txt aren’t effective at keeping Google from indexing.

What if…Robots.txt blocks your inbound link juice?

When you use robots.txt to block indexing of your page’s content, you’re also signaling to Google that you don’t want any of the links on that page to pass any juice. Inbound links are also dead ended here, and you’re not using any of your links to the fullest potential and wasting any seo hosting products you’re using.

Here are a couple of the worst offenders of the robots.txt sort:

Digg.com

Digg blocked a page using robots.txt with an amazing 425,000 unique linking domains! Digg has since fixed the issue but Google has yet to catch up with their indexing, see below. A better solution would be to use NoIndex, like this:

<meta content=”noindex, follow”>

Blogger and Blogspot

These sites are losing juice between each other, and miraculously, they’re owned by Google! As Lindsay says,

Blogger.com is the brand behind Google’s blogging platform, with subdomains hosted at ‘yourblog.blogspot.com’. The link juice blockage and robots.txt issue that arises here is that www.blogspot.com is entirely blocked with the robots.txt. As if that wasn’t enough, when you try to pull up the home page of Blogspot, you are 302 redirected to Blogger.com.”

A better way to do this is to implement a 301 redirect from Blogger.com to Blogspot.com and get rid of the robots.txt altogether.

Better Ideas:

Noindex

301 Redirect

Canonical Tag

Password Protection

Two issues that make robots.txt even less effective…

Bad Bots – who don’t adhere to the “rule” of noindexing

Competitors – who are digging through your blocked content to see what they can uncover

And here’s what Lindsay has to say about Non HTML and and System Content:

  • It isn’t necessary to block .js and .css files in your robots.txt. The search engines won’t index them, but sometimes they like the ability to analyze them so it is good to keep access open.
  • To restrict robot access to non-HTML documents like PDF files, you can use the x-robots tag in the HTTP Header. (Thanks to Bill Nordwall for pointing this out in the comments.)
  • Images! Every website has background images or images used for styling that you don’t want to have indexed. Make sure these images are displayed through the CSS and not using the <img> tag as much as possible. This will keep them from being indexed, rather than having to disallow the “/style/images” folder from the robots.txt.
  • A good way to determine whether the search engines are even trying to access your non-HTML files is to check your log files for bot activity.

Summarized by Heather Hendrick

Google Makes Changes That Could Affect Rankings

This WebProNews article, originally from August 2010, focuses on some of the latest Google algorithm changes that may impact rankings. We’ve summarized it for you here so you and your SEO can make the best decisions accordingly.

Search is the keystone of Google and in the midst of all the social media, advertising, and video initiatives they also have going on at the moment.

Algorithm changes are some of Google’s most talked about items, especially when they make a big announcement beforehand as they did recently concerning the change that multiple pages will now be displayed from one domain for relevant queries. Google’s going to attempt to determine the intent behind certain queries and determine whether or not the SERP results are best populated with multiple pages from one site, which they’ve never done in the past. Not all SEOs and seo hosting companies are happy with this news as it means big sites may further dominate listings.

As with algorithm changes, Google’s experimentation is also much talked about. They’ve recently been spotted trying a format in which the autosuggest function more or less takes over all the SERP results which could definitely wreak havoc on search results as a whole. As we now know, though, this function is fully in play and called Google Instant. Habits are currently being evaluated by SEOs to determine the best ways to manage this change.

Google is also currently testing the idea of crawling from multiple ip hosting servers. This means that Googlebots are now able to spider sites faster than ever and with greater accuracy, and webmasters are monitoring this development closely to see how it affects algorithm changes.

Along with all these changes, Google is also acquiring companies such as Like.com. It’s yet to be told how these acquisitions will effect search as a whole.

Thanks to Heather Hendrick for the summary!

What impact does server location have on rankings?

Rob Lewicki from Toronto, Ontario asks: “What impact does server location have on Google rankings?”
Video Rating: 4 / 5

Go to Top