SiliconBeat’s Michael Bazeley featured Glenbrook Networks co-founders Julia and Edward Komissarchik, and the Glendor showcase, in a great piece about “Deep Web” search and information extraction. Michael summarized it quite well:
Komissarchik and her father, Edward Komissarchik, say they have figured out how to analyze the forms on Web pages and understand the type of information the sites are looking for. Then, Glenbrook's Web crawlers use artificial intelligence to walk themselves through sometimes complex Web forms, answering questions, such as the location of their desired job, in the same way a human would.
Julia Komissarchik likens the process to cracking a safe.
``The way to think of it is, you case the joint,'' she said. ``The scout goes through the form and tries a few options to see what the results will be. Then you have a mastermind or safecracker who gets all this information from the scout and devises a method to open the forms.''
Finally, she said, the ``harvesters'' spring into action to gather up all the information.
Just to clarify: the “safe” analogy does not imply that the company is breaking passwords, and accessing private information. It relates to getting a machine to access generically information stored beyond interactive forms.
I posted about the launch of the Glendor showcase a couple of month ago. This features the first
(and still I guess, only) mashup involving jobs listings positioned on GoogleMaps (second coming from our friends at SimplyHired).
Longer post about the concept of “web trawling” implemented by the company on its way.
Thanks to all of you who emailed us since this morning, we are grateful for reports of issues with different browser/OS combination - working on fixing these issues, sorry we are not hiring at this time, and yes we can build large scale custom search and aggregation data solutions (feel free to send me a note at jeff [dot] clavier [at] gmail [dot] com.
And we are delighted that you like this showcase.
Disclosure: I am a shareholder and consultant of Glenbrook Networks, and I am the editor of the Glendor.com blog (my first Wordpress blog, Wordpress really rocks).
Update: Gary Price, who was also quoted by Michael, posted an analysis on Search Engine Watch, that I wanted to briefly comment on. First Glenbrook’s technology does not (and can not) extract information directly from corporate databases, it goes through the public, manual, interface that companies have setup to access that data.The innovation lies in a suite of algorithms that figure out automatically the parameters to be used to extract that data, not requiring any templating of the sites to be targeted.
On server load, queries are made in a sensible way to avoid overloading servers based on response times, etc. And data can be refreshed daily, and maybe multiple times a day if the dataset is small enough. But extracting and caching data that change too frequently would not be appropriate.
On usability and searchability of the data, this is actually where the aggregation of structured data delivers its value: being able to apply on a position, a location, across a wide range of sources (in this case, jobs listings across companies).
Delighted to show you the technology at your convenience Gary…
Tag: Glendor
Jeff,
Congratulations! I saw this earlier today.
When will TechCrunch get a green light on a profile? :-)
Posted by: Michael Arrington | August 17, 2005 at 12:56 PM
congrats on both the company & the story jeff.
love the 'case the joint' soundbite, altho perhaps an analogy involving russians and theft isn't the most desirable online startu positioning... still, no bad PR i guess? ;)
(ps - not the only jobs/mapping mashup in town anymore, tho i do give you credit for getting out there first)
- dave mcclure
www.simplyhired.com
Posted by: DaveMc500Hats | August 18, 2005 at 01:49 AM