×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Trending Low-Volume Google Searches with Gootrude

CmdrTaco posted more than 4 years ago | from the stuff-to-play-with dept.

37

michaelrash writes "The Google Trends project provides some visibility into how popular search terms like 'Myspace' or '2008 Election' change over time and points out relevant news articles that create jumps in search volume. This is a handy tool, but there are many search terms that Google Trends does not display any results for. Such terms (such as 'Linux Firewalls' — with the quotes) have insufficient search volumes to display graphs according to the error message that Google Trends generates. Fair enough. Google sets an internal threshold on search volume, and this threshold could be set for reasons that range anywhere from Google Trends is still experimental to Google not wanting to provide data on how it builds its massive search index for emerging search terms. Either way, I would like a way to see search term trends that Google doesn't currently make available to me. So, I've released an open source project called 'Gootrude' to do just this. For the past year Gootrude has collected a set of low-volume search terms and interfaced with Gnuplot to visualize them."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

37 comments

Graph colors (-1, Redundant)

Anonymous Coward | more than 4 years ago | (#23810129)

Ugh, why would he choose to use a red fill with green lines for the graphs? Color blindness aside, that's horrible to try to look at.

Neat project, though. Definitely has some potential.

Re:Graph colors (0, Flamebait)

Ihmhi (1206036) | more than 4 years ago | (#23815519)

How can this be Redundant? It's the first post and it damn well makes some sense. My vision is fine and the red/green color scheme burns my retinas.

But I'm sure some fanboys of this project/Google will mod me down because the project is ignoring little things like aesthetics or making the data viewable to someone without sunglasses.

wow (2, Insightful)

Gewalt (1200451) | more than 4 years ago | (#23810141)

wow, um...congrats I think? I mean, after you get over your pat on the back, can anyone explain why this matters?

Re:wow (0, Redundant)

Gewalt (1200451) | more than 4 years ago | (#23810321)

It's not a troll. His data is not what google trends reports, and isnt even remotely comparable to what google trend reports. In short, his results do not have any use at all. So really, can anyone explain why this matters?

Re:wow (0)

Anonymous Coward | more than 4 years ago | (#23813921)

I guess the work is informative in that it shows that the number of results reported by Google is rather inaccurate. My guess is that there is a deliberate part of the number coming from /dev/random as well.

Why the author of TFA fails to reach this conclusion is strange. Maybe somebody should tell him.

HB.

Impressive (0)

SplatMan_DK (1035528) | more than 4 years ago | (#23810151)

I took the time to look through the work - looks impressive for a "hobby project".

The only thing I feel is missing is more options to narrow the searches and statistics on geographical information.

Does anybody have some thoughts on how reliable this tool is? And what the terms for using (read: distributing the data/results) the data is?

- Jesper

Re:a few different results... (2, Informative)

lpq (583377) | more than 4 years ago | (#23813657)

Just did searches on all of the terms the author mentions and got a few different numbers:

1. "iptables attack visualization" -- 19 results (~35) (close)
2. "single packet authentication" -- 93 (1,300) -- off by more than 1 magnitude
3. "linux firewalls attack detection" - 9290
3a. "Linux Firewalls Attack Detection" - 9240 (~9000) (close)
4. cipherdyne -- 85,200 (~70,000) ~off a bit
4a.Cipherdyne -- 84,500 (~70,000)
5. gpgdir (same)
6. fwsnort (same)
-------
Note...caps vs. no caps made no difference on 1, 2 and 5. But for terms 3 & 4, caps made a slight difference ... anyone know why? I thought caps were supposed to be ignored?

Most were close, but cipherdyne had about a 15% difference, but the worst was "single packet authentication" -- That one was off by more than 10x! Wonder what's up with that.

Interesting curiosities...

It it only me.... (4, Insightful)

vidarh (309115) | more than 4 years ago | (#23810165)

... or does the author of this tool seemingly not realize that Google Trends reports volume of searches, while what he's tracking is amount of documents indexed for a search term, and that there's no basis for assuming the two are correlated in a meaningful way?

Re:It it only me.... (5, Interesting)

Gewalt (1200451) | more than 4 years ago | (#23810235)

I find it highly unlikely that someone who can make the page in question would not be smart enough to also understand what it is that google/trend is really doing, and as such, I choose to believe instead that the author is being intentionally deceptive.

Re:It it only me.... (1)

Idimmu Xul (204345) | more than 4 years ago | (#23810581)

The perspective he seems to be taking is not so much 'what users search for' but more 'what users post about or publish' with a view to studying the correlation of a large site publishing something and then the number of other websites or pages picking it up and running with it.

I'm pretty sure he understands what he's doing, the article summary is just a bit twisted.

--
Free Playstation 3, XBox 360 and Nintendo Wii [free-toys.co.uk]

Re:It it only me.... (1)

kestasjk (933987) | more than 4 years ago | (#23811855)

I find it highly unlikely that someone who can make the page in question would not be smart enough to also understand what it is that google/trend is really doing, and as such, I choose to believe instead that the author is being intentionally deceptive.
It's a trap!

Re:It it only me.... (3, Insightful)

aleph42 (1082389) | more than 4 years ago | (#23810403)

Agreed, the summary is misleading, as is the comparaison (from TFA) to googletrends.

This aside, the interest of "gootrude" is that it's not porvided by google, and so it's part of the many efforts to reverse engineer how goole comes up with his numbers.

Specificaly, it appears from TFA that the "number of results" stated by google is a wild guess for low numbers (1,000-10,000), with very sharp variations which hint at an iterative process.

So as I get it, it's not a tool for you and me, rather for google specialists.

But wait... thats not it at all (0, Redundant)

Gewalt (1200451) | more than 4 years ago | (#23810179)

Google trends plots how popular a search phrase is. This mashup of google results is not that at all. it is nothing more than a mashup of the count of pages in google's database. it has nothing to do with how often a phrase is searched for.

Different data (2, Informative)

UnHolier than ever (803328) | more than 4 years ago | (#23810251)

Google Trends plots the frequency of queries, i.e. the number of times information is asked about a subject. Gootrude plots the number of pages found, or the quantity of information google can retrieve on this subject. These are completely different.

Re:Different data (1)

alnicodon (685283) | more than 4 years ago | (#23813957)

Many thanks for making this clear : this is also what I had fathomed from the very clear summary, but wasn't too sure.

Well.. we might actually be the two wrong ones :)

Al.

Singular works okay. (1, Informative)

palegray.net (1195047) | more than 4 years ago | (#23810265)

Such terms (such as "Linux Firewalls" â" with the quotes) have insufficient search volumes to display graphs according to the error message that Google Trends generates.
Try Linux Firewall [google.com] in quotes as the search term for some results.

Spore (0, Offtopic)

Chemisor (97276) | more than 4 years ago | (#23810277)

Have you noticed how "spore demo" is the 77th top search? On the WHOLE INTERNET! :)

Re:Spore (0)

Anonymous Coward | more than 4 years ago | (#23810819)

have you noticed that "hot asian luv" is 99th? with 99% of the searches coming out of Mcallen, TX?

Not at all the same! (0, Redundant)

molo (94384) | more than 4 years ago | (#23810279)

Google trends measures what people are seaching for, while Gootrude measures how many results are in the google database for a given term. These are not even remotely the same thing.

-molo

Re:Not at all the same! (0)

Anonymous Coward | more than 4 years ago | (#23811153)

Right. Not only is it not the same, the data is clearly teaching us very little except the quirks of Google. For example, he shows us a graph of number of search results for a particular query, going up and down in a periodic fashion. This makes absolutely no sense as real data, because on today's Internet, pages mostly get added and very few are deleted. The one example with the huge spike makes even less sense (and he admits to it) - what, 50,000 pages got created on some topic and then got deleted? No - this just probably shows us Google quirks: the already famous "Google Dance" (Google switching between indices), crawler bugs, and so on. Maybe it's an interesting topic to discuss, but it has nothing to do with Google Trends, and doesn't tell me much about the trend of a given word (well, except being able to tell us when a hype on a given word *started*).

Not allowed by google (3, Informative)

swarsron (612788) | more than 4 years ago | (#23810545)

Besides not being the same as google trends, this tool is not allowed by the TOS of google. Automatic querying of their services without prior permission is forbidden by google. But since it probably won't put any noticeable load on their network they most likely won't care

Re:Not allowed by google (0, Offtopic)

vrmlguy (120854) | more than 4 years ago | (#23811621)

Since I'm always forgetting to log my business driving, I've got a program that uses Google maps to figure out the driving distance between various pairs of points. It uses two files, one consisting of about 250 lines like this:
    home, office, client-a, restaurant-x, client-b, home
    home, client-b, restaurant-y, client-b, home
and the other listing street addresses for everyone. I'm sure it's a big violation of Google's ToS, but it tries to play fair: it caches the distances that it discovers (e.g. so that the distance from client-b to home is only requested once), it waits one-to-two minutes between queries, and I only use it once a year at tax time when I'm calculating my business expenses.

Re:Not allowed by google (1)

icyslush (1162497) | more than 4 years ago | (#23812811)

Google has a relatively simple API you can apply for to allow for a fixed number of automated queries of their system. It doesn't actually give you new functionality but does make automated queries of their databases "authorized". Without the API license key, you run the risk of getting noticed by them and ban-hammered if they think your just a bot scraping their data, something they do NOT like. I think this article just got in because it had both Google and Open Source as subjects. If they have figured a clean way to find SEARCH volume (which is hard) as opposed to RESULTS volume (which is stupidly easy), get back to me. :)

Re:Not allowed by google (2, Informative)

swarsron (612788) | more than 4 years ago | (#23813595)

Google doesn't give out any more keys for this api, only old keys continue to work. So if you don't already have a key you're out of luck

Time for me... (1)

jalet (36114) | more than 4 years ago | (#23811103)

to do something similar with my parody of google [librelogiciel.com] where search terms can be looked at in real time (empty or spammy search terms are replaced with fake words on display, but not in the history).

Privacy anyone? (0)

Anonymous Coward | more than 4 years ago | (#23811255)

So, nobody really likes the amount of data that Google collects on everybody, and there's a constant trickle of scandal about "anonymized" search results not being anonymous enough. I myself have stopped using Google as much as possible due to these shenanigans....

But then stuff like this gets written AND slashdotted. What's the deal? I'd much rather know NOTHING about Google's web search trends than inch even one micrometer closer to living in a panopticon.

In a funny coincidence, my CAPTCHA was "lynched", so, flame on!

Suggestions for improvments (0)

vrmlguy (120854) | more than 4 years ago | (#23811281)

Everyone has already noted that this only tracks hits, not searches. I'd like to suggest a few code improvements.

At a high level, use RRD (http://search.cpan.org/~nicolaw/RRD-Simple-1.43/lib/RRD/Simple.pm [cpan.org]) for the underlying database. RRD is used by MRTG to track time-varying data over multiple time scales, keeping details for recent data and summaries for historical data. RRD also comes with its own plotting module, although you could keep using Gnuplot if you wish.

In the code itself, there are places where there are "elsif" clauses without an "else" clause. One seems alright, but should have a null "else" so document that fact. The other, however, is testing keywords from the config file, and should flag any that are unrecognized.

Finally (and this is probably nit-picking), instead of this:
    return unless EXPR;
    do something;
    return;
I'd use this:
    if EXPR {
        do something;
    }
    return;

Over 2 hours (1)

TheCycoONE (913189) | more than 4 years ago | (#23812251)

This article has been on /. for almost 3 hours and "Linux Firewalls" still isn't a significant enough search query for Google Trends? Well THAT is surprising.

OT : Moving average and graphs (1)

4D6963 (933028) | more than 4 years ago | (#23813671)

Everytime I see graphs with a moving average, be it in TFA or some stock market graph it makes me cringe. OK, the moving average isn't the best filtering out there, there's a whole range of finite impulse response filters that have a more desirable frequency response than a moving average (which is convolution a rectangle, which means its frequency response is essentially a sinc function, which means a shitload of ripples), but why on Earth don't they compensate for the delay induced by the convolution?

Why do they let it have half the rectangle's width in delay when they could just compensate it so that the curve wouldn't look offset compared to the original data. And most mind-blogglingly, why on Earth do the same sort of people add another curve that is the difference between the original data and the delayed moving average?? Why oh why? It's senseless, as if the moving average was compensated then you could call it a high-pass filter and directly look at the high frequency components of the original data without adding any parasite low frequency component which doesn't match to anything desirable.

Someone enlighten me please.

Re:OT : Moving average and graphs (0)

Anonymous Coward | more than 4 years ago | (#23814837)

I was told there would be no math.

Privacy? (2, Insightful)

Temporal (96070) | more than 4 years ago | (#23816627)

Google sets an internal threshold on search volume, and this threshold could be set for reasons that range anywhere from Google Trends is still experimental to Google not wanting to provide data on how it builds its massive search index for emerging search terms.
Or maybe for privacy reasons? Some search queries implicitly reveal the identity of the person making them. Such queries are naturally low-volume, so refusing to show low-volume queries is an effective way to protect the privacy of the searchers.

michaelrash (1)

michaelrash (715609) | more than 5 years ago | (#23818859)

I have updated my original post to address some of the comments made here on Slashdot. Peer review is always good, and thank you all for the insights.
Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...