On Tags and Google

I wrote this absurdly long email the other day about the proper use of tags. Specifically, it discussed tags being indexed by the site’s search engine and Google. I realized that for the most part, none of the recipients would fully read it, so I decided to strip out the site-specific and share it with the rest of the world. It was initially about twice this long. At least now I don’t feel like I wasted my time.

Searching tags on the site

Tags are a filter mechanism. They provide an easy way to list all articles that are tagged with the same word. In this way, they serve as an ad-hoc extension to categories that allow websites to add additional category-level organization without altering the existing architecture. This is handy because sites generally do better with a condensed taxonomy. You don’t want to have it any more complex than necessary, so tags do a great job of adding the useful function of category-level navigation without mucking everything up. Because this is their intended use, in order to ensure that tag pages don’t alter Google’s understanding of the site’s taxonomy, tag pages are othe sister siteen intentionally excluded from Google’s index by the use of a noindex meta tag or the robots.txt file. You can’t search for tags in the search box for the same reason you generally don’t include the category in the search index.

Search algorithms are complex, and while I can’t speak intelligently on the existing algorithm’s logic, I’m sure it’s difficult to modify it do properly weight and rank for tags. My assumption, however, would be that in most cases the articles is tagged with a word that exists in the body, so that it will still show up in search, weighted appropriately.

Searching tags on the sister site

There are multiple reasons that you cannot search for the site’s tags on the sister site. This first is that the sister site uses a very intricate, custom built search mechanism. It’s fairly fragile and extremely complex. Modifying this to also index content from the site would be very difficult and risky. Search has been a contentious issue for years on the site, and the community is happy with the current setup. I would not want to risk disturbing the current search mechanism. The second reason has to do with the specific separation between the sister site and the site. Since the forum is so large, search is one of the primary means of navigation. If we insert site content into the sister site’s search results, we may likely receive pushback from the community. Finally, if we included site results in the sister site’s search results, we’d have to reciprocate. This would undoubtedly end up favoring the sister site in results over the site, for example, if I do a search for “[keyword]”, there might be a dozen site posts in the results versus 1,000 results from the sister site threads.  Also, reciprocation would require rebuilding the search mechanism of the site. The current implementation can’t handle the mass of the sister site.

Searching for tags on Google

Getting our pages to rank in Google is something that we go rather well, however, when it comes to getting specific terms to rank well in Google, it really depends on the term. For example, it’s easier to rank for “Idaho Scotch-Tape distributors” than it is for “Cheap Las Vegas Hotels”. One is just a more competitive space. Depending on what keywords we wish to target, we may or may not be able to affectively rank for them. Regardless, there are few people out there with the knowledge and understanding of search engine optimization out there then our team here.

We need to make a distinction between an individual article ranking for a specific term, and our tag page ranking for a specific term. Generally, tag pages don’t rank. This is because people don’t link to tag pages, they link to articles—and links drive rankings. This isn’t a bad thing, as we othe sister siteen optimize our articles for conversions, but rarely optimize our tag pages. The thought being “why would we optimize a ‘portal’ page?”.

My final though on Google ranking for keywords relates to the nature of tags. They are typically what we call “head terms”, specific one or two word phrases. Targeting head terms is always more difficult then targeting “tail terms”, which are phrases with three or more words in them. We generally see better results when targeting tail terms, as you can target multiple variants of a tail term in a single article, tail terms are less competitive, and visitors who enter for tail terms are often more engaged.

I want this shirt so bad

If anyone is looking for something to buy me for Christmas, I really really really want this shirt.
It’s supposed to be available at this site, but they appear to be closed. :-(

Johnny Utah

note: I posted this from my iPhone while having a conversation with Amanda on speakerphone. I totally wasn’t paying attention.

Eff MJ

I wasn’t shocked by the media frenzy about the death of Michael Jackson. He was a very famous man who unexpectedly died in mysterious circumstances. You know the news channels are going to go batshit crazy. Hell, even I was interested in the story.

What did end up shocking me was the outpouring of people talking about how awesome he was. People poured out into the streets to celebrate his talented live. It all kind of made me sick.

It really hit home when I heard someone say, “He may have been a pederast, but he was still the King of Pop.”

That statement couldn’t be more backwards. He might have been the King of Pop, but he was a pederast! I don’t care how well you dance or how many records you sell; you don’t do what he did. I don’t care about his childhood or his hardships; you don’t do what he did. That’s like saying, “Well, Hitler may have killed a lot of jews, but thatVolkswagen is one hell of a car, so he’s cool in my book.” Absolute and utter bullshit.

I’m by no means a religious person, but I do have a sense or morality. I’d never wish death on anybody and I’m sorry that we lost MJ; my feelings go out to his family. However, his legacy doesn’t exist to me. Everything good he’s done has been negated.

From my phone, for no good reason

I’m posting this from my phone for no good reason other than simply because I can. I realize that it’s retarded, but I upgraded Wordpress just so I could, and I feel obligated to do so.
Here’s what I’m looking at right now:

Blogging Sucks

I was blogging before blogging was as word. I wrote online as a my sole source of income for three years. Now, not only can I not find time to log in here once a week, but I can’t even remember to tweet more than once every couple days. It’s pathetic. I keep saying that it’s because I don’t have the time, or that I’m too busy, but that’s just a bullshit excuse. Truthfully, I just can’t seem to make it a priority. 

The only thing that I can seem to give a shit about is diving, but with Chris just having a kid and the weather being crappy, I haven’t been wet in two weeks.

I guess what I’m saying is that I think I’ll be taking this site down. Unless I can get motivated enough to do something with it, there isn’t much of a point in it.

Edit: I have got to give a quick shout out to my buddy Jeff’s moving company, Moving Ahead Services.

An Idiot’s Intro to Robots.txt

When I started this blog, I intended on using it to document my exploits on the web. I have been building websites for quite a long time now, and while I can’t say that I know everything, I’m pretty darn good at my job.

Last July I took a position as a Project Manager at a very big web development company. I was given a handful of sites to manage– sites that were in pretty bad decay– and tasked with fixing them up. Along the way, I’ve learned a hell of a lot of stuff… but most importantly, that there are an amazing amount of “webmasters” out there who don’t understand the basic fundamentals of search engine optimization. So today, we’re going to address the highly ignored and forgotten robots.txt.

The robots.txt file sits on your server and tells spiders where they’re allowed or not allowed to go on your site. It doesn’t exactly stop them from going there, but it tells them where you’d like them to not go. This is important to remember… anything you put in there doesn’t necessarily do anything, since a spider can decide to completely ignore the robots.txt file. 

Now that we know what the robots.txt file does, why would anyone want to block spiders from accessing parts of their site? Well… Maybe we should start with the wrong reasons people block spiders, and why those reasons are wrong.

1.) I have private sections of my site that I don’t want everyone to see. If you have something that you want to keep private, the last place that you want to put them is on the internet. Even if you password protect the area, someone out there can get in to see it. In fact, the more juicy the area is, the more inclined people will be to try to get in. Still, all the time you see people not password protecting sections of their site and listing them in the robots.txt file. When you do this, you’re basically telling everyone exactly where you keep your private information. This is a hard lesson learned a long time ago by The White House. If you want to keep your private data private, keep it off the web, or at least password protect it and don’t broadcast to everyone where it’s hiding.

2.) I don’t want spam bots stealing my email address. Remember how I said that spiders can choose to ignore the robots.txt file? As it turns out, the people who build those shady email address stealing bots…. well they just happen to be shady people! And, what do shady people allow their shady spiders to do? You guessed it… They allow their shady spiders to ignore your robots.txt file. Basically, when you block these spiders, you do nothing to stop them and just clutter up your robots.txt file. In fact, if you want to find a place for your email-harvesting to start gathering, a simple Google search will tell you everyone who’s got something to hide.

3.) Spiders use too much of my bandwidth. Nowadays, bandwidth is cheap, and if you’re site is seriously being bogged down by search engine bots, you should really look into upgrading your hosting. If your servers can’t handle the spiders, they definitely can’t handle a significant volume of productive traffic… which you’ll be guaranteeing never to see in mass if you block the spiders. Upgrade your hosting package and take a class on monetizing traffic.

4.) Google image traffic is garbage and I don’t want it. This one is hard to argue against, because Google image traffic pretty much is garbage. Still, I shy away from anything having to do with banning Google from your site, regardless of the capacity. There is, however, something to say about garbage traffic. In volume, garbage traffic costs you money in hosting and bandwidth, but it’s rarely worthless. If you’re site is properly set up to monetize, the quality of traffic becomes less important (for example, if you can sell your ads on a CPM basis, free garbage traffic can make big money). By catering to Google images you also open yourself up to great site branding opportunities– domain names are speckled all over the Google image results pages. Finally, if Google sends you 10,000 garbage visits a month, and from that you get one bookmark, incoming link, or registered user, aren’t you at least a little better off?

Now that we know why not to block spiders with robots.txt, why would we ever want to? It really comes down to optimizing the time spiders spend on your site, and making sure that the pages they index are the pages you want them to index. Let’s use a couple of examples:

Read More »

Got Wet

Man, I really didn’t think that finding a little time to write on here would be so damn hard. I figured I would give me a nice relief from working on other people’s sites and help release those creative juices! It’s just that after staring at a computer screen for 10 hours a day, the last thing I want to do is sit down here and work on this garbage.

Anywho… I ran into this article last week and had to post it. Seems that Hyde is still the hardest club to get into. Thanks again Sarah; I feel like Puff Daddy.

I did get to finally go diving this weekend. I have never dove on the West Coast, and for one reason: it’s cold. After my amazing time diving in the keys last spring, I decided I had to give it a shot. I went and bought a 7mm wetsuit and all the cold water gear that I need…. and then I sat on my ass and looked at it for 4 months. Well, I finally worked up the nerve and got wet last Saturday. One thing’s for certain: It’s cold. Still, it was nice to get back in the water.

I went out with a guy I work with, who’s very experienced with the area, to Veteran’s Park in Redondo Beach. It’s a popular dive site for beginners because it’s very accessible. You throw your gear on at your car and walk down 68 steps onto the sand and into the water. There’s not much to look at– it’s a sandy bottom with not a ton of fish. It’s also very easy to go deep; we ended up at 101 ft pretty damn fast. As far as dives go, it was fairly uneventful, but as I said earlier, it’s nice to get wet. Maybe next time I’ll try not to run out of air on the way up. (I think Chris was trying to kill me.)

The Best Pic Ever

Dick Bradley at Hyde

I’ll admit that I thought it was cool as shit when Sarah took us to Hyde the other night. Who hasn’t seen the videos of all the celebs getting denied entrance into the place? Yeah, like I’m the only one who reads TMZ… Don’t act shocked I read that garbage, I am one of the founders of Vanity Spy. What made it even cooler is that we pretty much got right in. In the end, I was amazingly underwhelmed with the place; it was the size (and shape) of a quonset hut with a tiny bar in the back. The perimeter is a long ass couch with wanna-be models dancing on top of it. Sounds cool, but it was 85% guys with their shirt off, and I haven’t seen so many fedora’s since my grandfather’s funeral. I’m shocked that this is where all the famous people want to go. Still, it was kinda cool.

Trying to be Pimpy

My old friend Sergey was in town this weekend from Ohio. His GF lives out here, so he flies out whenever he gets a chance, and I love seeing him. It’s like a piece of Ohio, without the bullshit of being in Ohio. Sergey’s woman is a club promoter, so when he’s out here we go out to Hollywood and pretend we’re famous. 

I’m waiting for some pics, so I’ll save the stories for then. I guess I don’t actually have a point in writing this then. Oh well, thanks for the killer weekend, Serge and Sarah! Owww… and if the hot french chick from the limo reads this, hit me up.

Class over Cheese

I’ve finally got some kind of layout up here. It took the better part of an hour to get something up that was fairly acceptable. I had made two header images the other day and I passed them around to a couple of friends. The ones I knew would pick this one, did. The ones I knew would pick the other one, did. This is what we’re going to go with, at least for this part…. Class over Cheese. For anyone interested, here’s the other logo I designed: