The time-consuming tedium of file search is San Francisco-based startup Peruse’s jumping-off point. It’s aiming to simplify locating files by changing how people search so they don’t have to remember exactly where to look or exactly what the file was called.
Peruse’s fix for this age-old tech problem is to use a natural-language question-and-answer interface, rather than a narrow keyword search, to allow for contextually relevant info to simplify and speed up your search. It’s launching its SaaS enterprise search product today onstage at TechCrunch Disrupt NY.
Founder Luke Gotszling describes this as “more of a human approach” to search. So you could, for instance, ask Peruse to locate “PowerPoints I edited last week” or “PDFs that Matthew Panzarino sent me” — saving a whole lot of guesswork about what the file was named and likely also reducing the amount of manual combing of keyword file search results you have to do.
Peruse plugs into existing cloud storage services, rather than requiring you install (yet) another piece of software. You can type your search into Peruse’s interface, or speak it if you prefer — whatever’s quicker and easier.
“File search is very unsatisfying,” Gotszling tells TechCrunch. “I’ll type in a few keywords and then whatever those keywords happen to match I will just get a list of files. That’s basically where things were in 1995 and that’s also basically where things are nowadays.
“For me that’s really unsatisfying. And we can go beyond that: as people that’s not how we think about information. I don’t think about these 20 keywords that will match this file exactly. We think about things like ‘oh this PowerPoint I edited last week.”
“Anyone who has worked in a medium-sized company or above typically finds out how difficult it is to locate information,” he adds. “People are used to spending hours a month just looking for information.”
Peruse’s natural language file search works for business documents of any file type, albeit the NLP tech only currently works for the English language. The service is also initially limited to documents stored in either Box or Dropbox cloud storage repositories — but it intends to expand to integrate with more such services.
That’s the first plank of Peruse. The second core feature it has in the works takes these NL search capabilities further — or rather, deeper — by allowing users to search for specific facts from inside a document, rather than having the tech simply point to the file itself.
The latter “deep insights” feature is not yet launched, but Peruse is now offering a waiting list for interested sign-ups. Initially the feature will work only with spreadsheets. Searches run in real-time but the spreadsheets themselves are background-indexed by Peruse, hence why it’s operating a wait-list.
Who is this for? Gotszling gives the example of a benefits company that deals with a lot of small businesses, and needs to find the annual hourly labor cost from a restaurant’s spreadsheet. Instead of locating the file, loading it and using a control-F approach to try to locate the “annual hourly labor cost” figure, Peruse’s natural language tech parses the indexed spreadsheet data itself to return an answer to that specific query without the user having to dive into the file at all.
So, in other words, this is a machine that reads spreadsheets so you don’t have to. (The system does also specify where in the spreadsheet the data is located, so a human can go in and check.)
“You could say something like ‘what were the cocktail sales at the bar on Tuesday,'” says Gotszling, detailing the kind of complex spreadsheet search query Peruse can tackle. “No one is going to do this with control F because they’re not going to expect that to be a label somewhere. In fact if you were looking for this information and using the traditional search methodology it may be difficult for you to figure out do I search for ‘cocktail sales,’ do I search for ‘bar,’ do I search for ‘Tuesday,’ do I search for ‘sales’?
“In this case it’s so multifaceted that it may be hard for you to even know where to begin. And then if you pick something that’s too generic, then the next thing you know is you have 100 results and you’re paging through a 22-page spreadsheet for this information.”
There are obviously limits to this spreadsheet reader software. A spreadsheet comprising only numbers, without any text labels sign-posting the data, isn’t going to be intelligently parseable to anyone — not even a machine. And spreadsheets with limited labeling may also trouble it.
Data represented in two different formats within a spreadsheet can also cause confusion — with Gotszling noting the system can serve up multiple possible answers to a query (requiring the human steps back in and applies their intelligence to figure out that one answer is actually a proportion and the other is an absolute value, say).
Above all, the data you want to locate also has to be contained within a spreadsheet in the first place, although Gotszling says Peruse is considering expanding this feature to support regular documents and PDFs too. Other areas it’s also contemplating include team chat logs, emails and calendar entries. Nothing is confirmed as yet though.
Also on the slate for Peruse’s future: an Apple Watch app. “I think something like this can be particularly well suited to that form factor,” he says. “Especially when you’re not looking for a whole file but when you’re looking for a piece of information inside of that file… that can be easily presented in a sentence then I think this would work really well.”
“Right now it’s just going to work with spreadsheets,” he adds. “We built software that tries to understand this type of content in the same way that a person would understand it… To try to imitate the visual understanding because one thing we don’t want people to have to do is to reformat stuff so that it works for this.
“Perhaps ironically the more machine readable it is, the less understandable it is by our system because we built it so that it parses this information in the way that a human would.”
Gotszling says Peruse hasn’t been building the software with a particular industry in mind but argues it’s well suited to analysis-focused businesses, such as hedge funds and HR and benefits companies, i.e. those which are dealing with large amounts of information and often need to pick out a particular piece of data from large haystacks of info. (The NSA springs to mind on that — but clearly has its own specialized data-mining software.)
“I’ve had one discussions with a person who said that just scrolling the spreadsheet is something that takes them minutes of time. So generally people that deal with huge volumes of information — it has resonated more with them,” says Gotszling.
In terms of the competitive landscape beyond plain old keyword search, he points to Microsoft’s Power BI product, specifically for “spreadsheets intelligence,” although he also suggests that product is more focused on visualization than data retrieval. He also couches IBM’s Watson as another potential rival, but adds: “I haven’t really seen them tackle business documents with that technology but I could certainly see that being competitive.”
Gotszling was employee No. 1 at About.me, where he worked on infrastructure, including its structured search feature — so he’s evidently feeding some of that expertise into his new venture. He’s been bootstrapping and building the Peruse prototype since late October 2014, ramping up on the hiring front with core team members in the past month.
Questions & Answers
Maqubela: I’m curious if you thought about going into mail? One of the things I find interesting here is how commonly I encounter this use-case.
Gotszling: Actually mail is our most requested integration next, and then Google Drive after that
Maqubela: How are you going to acquire customers?
Gotszling: We’ll be available on the official app repositories for Box and Dropbox. So naturally people who are looking for this and who use Box and Dropbox will be able to find us. And of course we can target people who are users of these services through advertisers, other methods or build an internal sales team if needed
Turck: There’s a long history around search companies. From individual search, which used to be desktop search, from Google, that they discontinued… there’s all those companies of the ’90s and 2000s, all the way to enterprise search, the Endecas… and in the middle you had a bunch of companies trying to do cloud search. How do you position? It seems to be in between those very large enterprise companies and individual search. Who are your early customers?… Is it a tool for me, is it a tool for an SMB? Who uses it?
Gotszling: I guess the great thing about this is you don’t need to be a hedgefund analyst to use it. Everyone has quetions about information in their files. SMBs are obviously a good starting point because it’s easier for them to be able to use this but there’s nothing from stopping a large company from adopting it internally. Obviously the sales process for that is a lot different.
Vardi: Why only cloud? Why not servers, desktops etc?
Gotszling: We feel it’s a good starting point. We can potentially in the future index your desktop as well but we feel like with millions of users, and over 190,000 businesses on those services, it’s a very good starting point for us. But obviously we can expand in the future.
Vardi: Tell us a little bit about yourself.
Gotszling: I was the first employee at About.me. I actually did the search product for About.me. I’ve done some research on neural enetworks. I have an engineering background. And I’ve built a team of three super talented people. Including people that have won international algorithm competitions for content extraction. An NLP expert. Ex-Google engineer.
Vardi: From the heads of the 20 most revered Internet companies who is your personal model?
Gotszling: I would probably say Google — the founders.
Vardi: You want to be the next Google?
Gotszling: Potentially. Obviously it would be great to be the next Google but I’m trying to think about the short term.
Tantoco: Can you talk a bit more about the natural language processing piece of it, the AI part of it. That’s something that a lot of companies are now starting to talk about.
Gotszling: We’re just getitng started with NLP. Like you said there are other companies doing it. I think that’s also one of the things that differentiates us. A lot of that technology we built in house and so it’s not available off the shelf. You can’t just grab it from somewhere, there’s no open source product to do that kind of stuff. I think that’s really a big part of what makes us, us is the NLP. In terms of specifics, I don’t really want to talk about particular implementation methods
Maqubela: What’s your starting price point?
Gotszling: We’re looking around a typical SaaS model so maybe $10 per user per month with up-charges for enterprise features or analytics. Or if you have any particular stringent security requirements we can provide those assurances to you as well.
Turck: I’m still trying to hone in on what makes you special in this long history. Would you say that it’s the ability to do structured data search as opposed to unstructured — an ability to use NLP to run searches in database?
Gotszling: Yes potentially we can do that. We’re starting with just regular files — file searches is live, and then the question and answering service which we’re starting for spreadsheets… There’s other data repositories that we can tap into. It would be nice to be able to ask a question in some giant data warehouse type of product without having to learn the language or email somebody in your company.
Turck: I think it would be nice to be able to find that one little thing that makes you — in search so many companies are trying to penetrate that market you really need to find that tip of the spear to get you into the market. I’m curious — you’re doing file search like everybody sort of claims to do this, maybe not well. But I’m just curious to hear from you how you’re going to conquer the market? Which market are you trying to conquer?
Gotszling: Initially search for businesses. But eventually it will depend on what our initial beta customers want. They’re going to drive the product forward. Ultimately if they say we really need this for this data warehouse then we’re going to build it for them.
Jason Kincaid: Do you have to train the software? It seems super cool but I would be wary of trusting that it knows where to look for this sort of thing without telling it explicitly.
Gotszling: We didn’t have to train it. The software ran on that spreadsheet without ever having seen it before. I was able to extract all that information… It’s still not perfect. It still will occasionally make mistakes depending on the formating but we didn’t have to train it. It ran completely unsupervised. And we were able to ask those questions on that content.