Monday, May 21, 2007

SEO and Latent Semantic Indexing Analysis

Latent Semantic Analysis

Latent semantic analysis, otherwise commonly called as LSA, is actually referred to as a technique that SEO uses to be able to optimize a target keyword or keyphrase for the purpose of getting relevant results for search engine indexing purposes. Latent semantic analysis technique is used to generate synonyms and related texts for a specific target keyword to optimize for a page. Most SEO never puts this factor as one of the necessary considerations for targeting and optimizing for a certain keyword or keyphrase although i have noticed lately that this important step is also a big factor for optimizing keywords in SEO. "This may also be referred to as analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms," according to wikipedia.

A good example of the latent semantic analysis is that, for example, java could mean, either java coffee or the java script related to computing language. But when you search for "java" on google, java script comes up as the result. In the SEO world that we live in, this theoretically points out that, JAVA SCRIPT is more java than the java coffee. This is a good example that if we optimize for a certain keyword that we are targeting to rank high in the 3 major SEs (Google, Yahoo and MSN), we ought to also be using the semantic analysis technique. Anyone can usually see this technique used in most SEO contest.

Latent Semantic Indexing

The latent semantic indexing was patented by a group of people namely, Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter way back in 1988. By application use, the latent semantic analysis is commonly referred to as LSI or latent semantic indexing. One major use of this type of technique is to avoid keyword stuffing and over optimized pages and contents that could result to keyword density overflow or exeeding the limits of appearance of target keywords in a content or document that may result of being analyzed by search engines as spammy content.

LSA and The Term-Document matrix

Although this is a clear argument, many webmasters have mistaken this technique as only referring to keywords and documents not included. This is very wrong! In my opinion, as per the example that i have mentioned in this post above, The LSA has two matrix which may either correspond to a term or a document.
Latent Semantic Indexing Applications (by wikipedia)
  • Compare the documents in the concept space (data clustering, document classification).
  • Find similar documents across languages, after analyzing a base set of translated documents (cross language retrieval).
  • Find relations between terms (synonymy and polysemy).
  • Given a query of terms, translate it into the concept space, and find matching documents (information retrieval).


Enter your email address:

Delivered by FeedBurner