Have a tip?
Want to advertise?
Contact the editors:

Matt and I attended a small demo at Powerset's offices last Thursday night. We drank their beer, ate their hor'dourves and learned what the 70 person company was all about.
Powerset has licensed their natural language technology from Xerox. They are not building this thing from scratch. Powerset's secret sauce, aside from the NLP, is their indexing strategy. It is also important to point out up front that their solution will be NLP results with a keyword-based index to fall back on.
Powerset has taken an approach of radical transparency to combat their hype problem. Their goal for the night was to be as open as humanly possible and they succeeded. We learned that their processing time for a single sentence with the PARC technology is down to around one second, which is impressive. Powerset is building its own datacenters in addition to utilizing computing power from Amazon's EC2 web service. At the moment they have around 1000 cores, and since they are bound by CPU time, their process was seeing a 30% boost from using the latest Intel processors on their servers. COO Steve Newcomb stated that they were buying the latest and greatest in CPU technology to deal with their computational needs, and that their standing will only improve as processor technology advances. This is an interesting point because it directly opposes Google's theory on scaling, where piece of shit commodity machines rule the world.
We got a chance to talk with a group of Powersetters in-depth about their technology after the demo. They are using the Hadoop project for most of their distributed computing. Hadoop is an open source implementation of the Map/Reduce paradigm that is prevalent throughout Google's infrastructure. Powerset is a heavy user of and major contributor to the project. They are using Hadoop Map/Reduce jobs to do their indexing across the grid. The startup has also been a heavy user of Amazon's EC2 service, running 800-900 instances at one point. Newcomb cautioned us about extreme EC2 use and said that while EC2 is a great utility, at a certain point there are diminishing returns. The best solution for Powerset appears to be datacenter-first deployment, with EC2 as a fallback in a pinch.
A good part of the evening was spent discussing Powerset's social network: PowerLabs. Yes, they have a social network. Its role is to allow users to dictate the direction of Powerset through a conglomeration of Digg and Facebook-like features. PowerLabs allows users to "digg" product ideas and earn a reputation for good contributions to Powerset. A reputation is quantified by a point score based on contributed feedback, including votes on search result comparisons between Powerset and Google. Powerset is trying to harness the "wisdom of the crowds" through PowerLabs.
Contrary to popular Web 2.0 belief, we don't buy into the "wisdom of the crowds" theory. PowerLabs seems to be a stretch on Powerset's part to achieve uber-transparency. Not only are they letting everyone in on the process, but you can decide where the company goes! It's important to gather feedback and focus on quality, but I feel that they are relying on PowerLabs to deliver more than it should.
Have an opinion about the software you're building. Don't throw your hands up in the air and say that PowerLabs will direct where the software goes. At the end of the day, it's all about the product Powerset produces. If their search is a significant improvement over existing solutions and saves people time, the product will get noticed. It doesn't need a social network to drum up support.
Powerset's demo showed the value of natural language search over statistical approaches. Their index size at the moment is minuscule compared to where it needs to be. It should be interesting to see how well they deal with billions of documents, spam and internationalization. Powerset is shooting for a full release in September.
Powerset and Meebo have both raised $12 million to fund their operation. Natural Language Search vs. AJAX+Gaim chatting. Which investment seems more justified to you?
Recent Butthurt
16 hours 24 min ago
16 hours 39 min ago
1 day 11 hours ago
1 day 16 hours ago
1 day 20 hours ago
1 day 23 hours ago
2 days 9 min ago
2 days 9 hours ago
2 days 10 hours ago
2 days 10 hours ago