Archive for Search

Today’s link roundup: Nielson’s UK search data, Scholarly Search from MS, whois.sc changes name, Orkut big in Brazil, certified email changes tack, Google using what to gauge quality?, Web 2.0: Underpants Gnomes business model

Nielsen’s UK Search Data
“It is important not to forget that Google’s phenomenal success has had implications and benefits for others far beyond Google itself. Take the university sector for example – Google drove over 5.6 million visitors to university websites in January alone, which represented 62% of the entire audience for that sector. This means that only the military sector (63%) owed a greater percentage of its entire audience to Google.”
Report
Article

New Scholarly Search from Microsoft
“Unlike Google Scholar, which crawls the web for academic content, Windows Live Academic Search works closely with publishers and uses structured feeds to build its index. As such, all content accessed through the service comes directly from a trusted source—namely, the publisher of a scholarly journal. The new service addresses two needs of the academic community that have traditionally been under-served, according to Danielle Tiedt, general manager of Windows Live Premium Search. Academic users want tools to help them fine tune search results, and are interested in getting more information on a search result before clicking off to specific article.”

Whois.sc Changes Name To DomainTools.com
“One of ResourceShelf’s favorite WHOIS domain name resources has changed it’s name and has a new look. Result pages are full of data. Also, all of the inexpensive ($15/month) but powerful services like historical WHOIS info (back to 2002) and the Mark Alert domain name tracking service available.”

Orkut explodes in Brazil
“About 11 million of Orkut’s more than 15 million users are registered as living in Brazil — a remarkable figure given that studies have estimated that only about 12 million Brazilians use the Internet from home.”

Certified email not intended to reduce spam after all
“Rather than helping to reduce spam Gingras claimed that the point is to allow users to verify who important messages are really from, like a message from your bank or credit card company. “To suggest that the introduction of CertifiedEmail is going to prevent spammers from sending spam or phishers from trying to phish — we have not said it, nor would any expert say it,” Gingras told DM News. The Goodmail CEO’s statement at the hearing, however, differs from recent remarks by AOL concerning the program’s benefits. “As we get ready to testify at the hearing … we are also working diligently to protect our members’ safety and security by preparing implementation of the anti-spam, anti-phishing CertifiedEmail program,” AOL spokesman Nicholas J. Graham said in a March 30 report on DMNews.com.”
Bill comments: I find this interesting as it indicates to me that perhaps trust is becoming a larger issue than spam. When a big fish like this changes stance so radically, they’ve smelled something in the wind. Or something is broken. Hehe.

Google using traffic analysis and user feedback to gauge quality?
‘I don’t think Google would want to base a ton of the overall relevancy algorithm on site popularity (and clearly they don’t since the top results are not always the most popuar sites), but they can and may use traffic patterns and searcher feedback to filter out junk sites. And it may help certain types of link spam stick out (ie: a site that just picked up 50,000 backlinks but few of them drive any traffic) may be a red flag for spam.”

Web 2.0: Underpants Gnomes Business Models
“Here’s the audio file if you missed the South Park episode that’s from, but basically it’s the following business model:
1. Collect underpants.
2. ????????
3. Profit!
It’s become way too clear now that a lot of new web companies are following this business model, and the majority of them will probably fail at some point. For example:
1. Share photos
2. ??????
3. Profit!”

Comments

Today’s link roundup: Adversarial Information Retrieval workshop, ACM SIGIR conference, Searcher Behavior Research Update, DNS cache poisoning and link hijacking are the new black!

AIRWeb (Adversarial Information Retrieval) Call for Papers
The workshop will be held on 10 or 11 August, during the ACM SIGIR Conference on Research and Development on Information Retrieval on 6-11 August in Seattle.
“The attraction of hundreds of millions of web searches per day provides significant incentive for many content providers to do whatever is necessary to rank highly in search engine results, while search engine providers want to provide the most accurate results. The conflicting goals of search and content providers is adversarial, and the use of techniques that push rankings higher than they belong is often called search engine spam. Such methods typically include textual as well as link-based techniques, or their combination. This, the second AIRWeb workshop, builds on last year’s successful meeting in Chiba, Japan as part of WWW2005. This year we solicit submissions on any aspect of adversarial information retrieval on the Web.”
Matt Cutts article
AIRWeb workshop site

29th Annual International ACM SIGIR - Conference on Research & Development on Information Retrieval
“SIGIR is the major international forum for the presentation of new research results and the demonstration of new systems and techniques in the broad field of information retrieval. The 29th Annual International ACM SIGIR Conference will be held at the University of Washington Campus in Seattle, WA, August 6-11, 2006.”

Searcher Behavior Research Update
“Two new studies examining how people search show that internet users are becoming more discriminating, with important implications for search marketers. As part of ongoing work conducted by Jupiter Research and sponsored by iProspect, “The iProspect Search Engine User Behavior Study” found that 62% of search engine users click on a search result within the first page of results, and a full 90% of users click on a result within the first three pages of search results. These figures were just 48% and 81% in 2002, based on similar research iProspect did at the time.”

DNS cache poisoning is the new black?
“Are you open for DNS Cache Poisoning? Are your SERPs being redirected because an attacker has compromised your DNS and is now stealing your traffic? And possibly destroying your domain in the process?”

Or maybe link hijacking is the new black…
“I was notified by a person in New Jersey (I’m in Washington state and my web site is hosted in Virginia) that when he clicked on my link in the search engine results sometimes he would be sent to my site, but other times he would be sent to an adult site. Sometimes the adult site would come up first and other times it would take several clicks on the same link, but the adult site would come up eventually.”

Comments (2)

Today’s link roundup: a week of Battelle links, new text search algorithm for Google. Google health?, Keyword volume data, Gary Flake, MSN search down, MSN master of the obvious?

John Battelle’s Friday Update
Link roundup from John Battelle.

Google gets new text search algorithm
“Orion finds pages where the content is about a topic strongly related to the key word. It then returns a section of the page, and lists other topics related to the key word so the user can pick the most relevant. The results of the query are displayed immediately in the form of expanded text extracts, giving the searcher the relevant information without having to go to the website - although there is still that option.”
SearchEngineWatch comments
Local news article
WebMaster World comments

Google Health, maybe
“Health has been an area of interest at Google for some time. We (including Adam) have been doing a variety of research in this area, including how to improve the quality of health-related search results.”

Keyword search volume data
“SEO Question: Are you aware of a tool or a service that can provide reliable search volume history for certain keywords? Like how many searches there were for “keyword phrase” in each month of 2005.”

A Frank Interview with Gary Flake
“The format is simple - I send an opening question to the luminary in question via email, they respond, and we go from there. First up is Gary Flake, a veteran of Overture, Yahoo and now Microsoft’s vaunted research labs (he’s founder and director of the new “Live Labs.”) Gary and I have known each other since I first began work on the book, and he’s always had a refreshingly frank outlook. I expected that to be tempered by a year at the world’s largest (and oft-criticized) software company, but I was wrong. If anything, Gary has become more outspoken. I’ve bolded the really juicy bits, but see for yourself….”

MSN Search down for four hours
“Microsoft spokesman said the outage began at about 8 a.m. Pacific time. At noon, the service was still down, but shortly afterward began working again.
The spokesman said company technicians have been working on the problem, but have not yet concluded what caused the shutdown in the first place.”

MSN, master of the obvious?
“Well, it turns out if you have clearly computer generated content with exactly the same number of links on each page, all pages the same size and over a million pages from 116,654 hosts that all share the same IP, they can detect you. Wow! That’s some pretty sophisticated spam detection technology they’ve got there.
In addition, this revolutionary new search technology can detect:
1. Corn in Nebraska
2. If a politician is lying”

Comments

Job satisfaction

Comments

Today’s link roundup: a year of John Battelle, the last name test, Zeitgeist inaccurate?, Tim!, Google Related Links, April Fool’s on the web, and porn, porn, porn

John Battelle: a year of news in one post
“Here’s my first cut take on things which mattered in search, media, and technology over the past year. What have I missed?!”

The Last Name Test
“Turns out, for my name, there is arguably a more relevant result - a well known research and development company bearing the same name. It may not be more relevant for *me* - but web search, at present, is still a brute force application. The question is not whether it’s relevant for me alone (though it should be…), the question, at present, is whether it’s most relevant for the *most* people. Thanks to how Ask works, what I’ve learned is that for the query “Battelle,” more people find the institute relevant than my blog.”

Just How Accurate Is The Google Zeitgeist?
“Accuracy of Google Zeitgeist over at our Search Engine Watch Forums is a nice “what gives” about some oddities in the international version of the Google Zeitgeist, where it gives you a rundown on search behavior in various countries. For instance, why is viagra so hot in Singapore — and why do links from the Zeitgeist actually bring up Google South Africa? It got me to give the lists for each country a second look, and I was scratching my head as well. For example, “national lottery” is the top popular query for the United Kingdom in Feburary 2006? Really? Somehow, I doubt it.”

More Tim Berners-Lee than you can shake a router at
“Tim Berners-Lee, inventor of the Web, gave a stirring speech at the Oxford Internet Institute that makes subjects related to Internet freedom accessible to non-geeks and geeks alike. The audio is available as an MP3.”
Tim Berners-Lee has a blog

Google Related Links
“The service puts a little box on your page where Google analyzes the content to show related searches, news and web pages to your visitors. And for helping promote Google in this way, you get …. a little box to put on your page.”
Article
Site

Every single April Fool’s day internet joke
Truly the good, the bad, and the ugly.
The short list (less complete, more funny)
The full list (more complete, less funny)
April Fool’s in the world of search

Onanism 2.0
“In a long and thoughtful article on the internet pornography phenomenon, Adrian Turpin notes that the number of porn pages on the web reached 260 million in 2003, up from 14 million in 1998, and that online porn sales hit $2.5 billion in 2005, well over double the sales of music downloads.”

Comments

Today’s link roundup: the Spider of Doom, WSJ picks Ask.com over Google, Google accused of “bio-piracy”, is Google gunning for eBay?

The Spider of Doom
“Things went pretty well for a few days after going live. But, on day six, things went not-so-well: all of the content on the website had completely vanished and all pages led to the default “please enter content” page. Whoops. Josh was called in to investigate and noticed that one particularly troublesome external IP had gone in and deleted *all* of the content on the system. The IP didn’t belong to some overseas hacker bent on destroying helpful government information. It resolved to googlebot.com, Google’s very own web crawling spider. Whoops.”

Wall Street Journal prefers Ask.com to Google
“Ask Jeeves, a largely failed search service, has been overhauled and renamed Ask.com. I’ve been testing the new Ask.com against the search champ, Google. I’ve found that in terms of relevant results and ease of use, Ask holds its own with Google, and even beats the champ on some searches. It has some very nice features Google lacks, including previews of the sites it finds, an easy way to narrow or broaden your search results, and frequent top-of-the-screen answers that lead you directly to core information.”

Google accused of “bio-piracy”
“Search giant Google has been accused of being the “biggest threat to genetic privacy” for its alleged plan to create a searchable database of genetic information. […] Biopiracy refers to the “monopolisation of genetic resources” according to the show’s organisers. It is also defined as the unauthorised use of biological resources by organisations such as corporations, universities and governments. According to the award’s Web site, Google is guilty of biopiracy because plans for a searchable database could make it easier for private genetic information to be abused.”

Is Google Gunning For EBay?
“When Google began testing a payment system in February for Google Base, its virtual catalog product, Internet pundits assumed the company was moving ahead with efforts to go head-to-head with eBay, the online marketplace giant. Not so, Google executives claimed. That’s standard operating procedure for team Google (nasdaq: GOOG - news - people ), which is constantly rolling out new products while insisting they are not meant to compete with established players. But in this case, the company may be more ingenious.”

Comments

kaboom

First comment spam today, on a post from yesterday afternoon. Damnit, I remember the good old days when spam only showed up on posts that were over a month old and only if you used a popular keyword somewhere in that post.
Hmm, betcha it’s from saying “britney noodz” the other day. AND I JUST SAID IT AGAIN! OH NOES!
Can’t sleep, spam will eat me… can’t sleep, spam will eat me…

While I’m having a few issues with WordPress in general, I must say that I’m a fan of their comment moderation setup - the first comment from an unknown person gets moderated, after which that person is “known” and can comment unmoderated. So any spammers show up as “unknown” and are immediately flagged for moderation and thus their links are never live on my site or, more importantly, transferred to the index of any search engine crawling this site. Yay.

Comments

Today’s link roundup - too big for metatags in the subject line

Indexing Speed from the Major Search Engines
“I thought it would be interesting, 24 hours after we launched the new domain (0awards.org) yesterday, to see how many search engines had us indexed (along with the few thousand links & mentions that popped up). Here’s the results:”

Yahoo Search Index Update & Increased Slurp Activity Expected
“The Yahoo blog announced yesterday that there was a new index update this past weekend.”

Search share
“From a Bear Stearns report on comScore data, Google continues to gain ground in search share in the US. Given all that’s going on in search and related media, that’s impressive.”

Aaron Wall’s favorite paper about search (from 1945)
“A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted.
Our ineptitude in getting at the record is largely caused by the artificiality of the systems of indexing. … Having found one item, moreover, one has to emerge from the system and re-enter on a new path.
The human mind does not work this way. It operates by association. … Man cannot hope fully to duplicate this mental process artificially, but he certainly ought to be able to learn from it. In minor ways he may even improve, for his records have relative permanency.”
The paper.

The internet is for porn - .xxx domain blocked
“What would the Internet be today without porn? Nothing. It wouldn’t even exist!
ARPNET was only invented so some pocket-protector-wearing, sexually frustrated, tech geeks at the DOD could figure could figure out how to send ASCII babes like this back and forth to each other. Before that, there was just no interest in the project.”

DOJ subpoenaed 34 companies
“In its effort to uphold the Child Online Protection Act, the U.S. Department of Justice is leaving no stone unturned. In addition to America Online, MSN, and Google, the government has demanded information from at least 34 Internet service providers, search companies, and security software firms, InformationWeek learned through a Freedom of Information Act request. ”

Matt Cutts answers a few questions
“Q: “What about the problem of directories and shopping comparison spam overriding real pages?”
A: Fair feedback. I heard that recently from a Googler, too. Sometimes we think of spam as strictly things like hidden text, cloaking, etc. But users think of spam as noise: things that they don’t want. If they’re trying to get information, fix a problem, read reviews, etc., then sites that like aren’t as helpful.”

Google aggregate search
“People can: publish chunks of data. People may not need to: publish websites.”
The mon(k)ey shot.

A sneak peek at the new Google UI
“…the changes are minimal, but they give some insight into Google’s plans. The biggest change is the relocation of Google’s search categories. Originally on top of the page in a horizontal layout, Google has now placed them on the left alongside visual representations of the search query’s relevance in these other categories.
Google’s intentions aren’t clear, but in the last day of using this interface, I’ve noticed myself repeatedly looking directly at the leftmost column. It’s where the results used to be, and perhaps more importantly, it’s a natural place to start scanning the page for left-to-right language types. Since the relocation surely serves a purpose, I’ll take a stab at what that purpose is.”

AdsBlackList Offers AdSense Filter URL List
“Nathan Weinberg reports on a service named AdsBlackList.com. The service provides a list of predefined MFAs (Made for AdSense sites) that generate low quality click throughs and low CPCs. I have discussed at SER how adding URLs to the AdSense filter URL list can help increase your daily income with AdSense. That is the whole premise behind a central location for publishers to go and fetch a list of URLs to block. AdsBlackList.com is just that list and it hopes to become a community effort.”

Comments

Today’s link roundup: interview with Tim Berners-Lee, cooking with Google, most expensive AdSense keywords, and 42!

An interview with Tim Berners-Lee
“Sir Tim Berners-Lee, long considered the father of the Internet, is a Distinguished Chartered member of BCS, is the director of the World Wide Web Consortium, senior researcher at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory, and professor of computer science at Southampton ECS. Here are his thoughts on software patents, US and ICANN control of the Internet, and browser security changes. The interview was conducted by Brian Runciman from the Britsh Computer Society. ”

Cooking with Google
“I couldn’t find a good recipe for Chinese Beef & Broccoli at Epicurious this weekend, so I headed to Google, where, lo and behold, they have what appears to be another new feature - query refinement via recipes.”

Highest-paying AdSense keywords
“Google has released a great tool to search for the current CPC for keywords which can be found here. I have used this tool to compile an updated list of some of the current highest paying keywords. It seems that lawyers are still paying the most out of all. It’s a bit concerning that some of the highest paying keywords are for “Wrongful Death”, and “DUI”, but oh well..”

The answer to Life, the Universe, and Everything *is* 42!
“This unexpected connection with physics has given us a glimpse of the mathematics that might, ultimately, reveal the secret of these enigmatic numbers. At first the link seemed rather tenuous. But the important role played by the number 42 has recently persuaded even the deepest skeptics that the subatomic world might hold the key to one of the greatest unsolved problems in mathematics.”

Comments

Today’s link roundup: Google as the web, Google as website host, Google and valid HTML, BigDaddy almost go

Web 3.0: Google as the Web
“The trick for Google as they consume verticals is for them to find the balance of what they can take while fostering relevant efficient business models (ie: turning legacy publishing business models into always on web friendly models). Until legacy models are reformed or displaced Google will promote some trashy stuff as a casualty on the way to their end goal. Each new market Google creates will have holes that act as a marketing mechanism to market the marketplace.”

Google Pages launched
“Google released the first public beta of its Google Pages service Wednesday, allowing users who signed up for the service in January and February to begin creating personal websites using an easy-to-use, browser-based tool. The service gives each user 100 MB of free storage space on Google’s servers.”

Valid HTML - Does Google Care?
“I decided to test whether valid HTML can actually help your rankings in Google. A lot of website owners talk about how their non-compliant websites do well in Google and how their complaint sites may not be doing as well. The implied suggestion here is that Google either simply did not care about errors in HTML, or even more extreme, that Google preferred non-compliant websites - a charge that would certainly be puzzling if it were true.”

Bigdaddy status update: almost there
“We’re down to just 1-2 data centers left in the switchover to Bigdaddy. It’s possible that the Bigdaddy switchover will be complete in the next week or two.”

Comments (2)

« Previous entries ·