This week Google and Yahoo announced that over 10 years after web users were first haunted with flash intro splash screens, they will finally be able to index the content of SWF files in their search engines. Adobe Flash is the most prevalent web platform today, available on 98% of desktop browsers, yet content locked up in binary SWF files has been part of a big black hole in the web that search engines and other services have not been able to read and understand.
The solution offered here from Adobe to both Google and Yahoo (and probably offered to that other search providing company) is a special ‘flash player’ that allows the search engine to dive into existing SWF files. It might be akin to a decompiler, in that the raw objects are extracted and then the text is parsed out (decompiling Flash 9 is very possible).
What Google and Yahoo have now is simply access to the text-based content within Flash applets – it does not guarantee that the search engines will treat it equally with well-formed text-based markup. While text can be extracted, the contents still do not have the same structure and context as a text-based page, such as a header, metadata, inbound links, headings, other markup tags and everything else. Futher, if your SWF files use graphic-based text, the search engines still won’t be able to see it.
There seems to be a lot of misunderstanding about just what this means and the importance of it. First of all, in the context of web applications, search engine optimization is not important when offering a private user application view. In that case, such as with an email application, there is no public search or index. The important part here is in public-facing flash applications (or websites) where the main site content is locked up in a binary container running on a proprietary runtime/virtual machine. In these cases, up until now most site owners have replicated that same content with a proper URI structure in HTML to gain the most out of search engine indexes and referrals. This is a more ideal solution as it gives sites and content more structure that the crawlers from Google and Yahoo readily understand and can interpret: the addition of being able to grep out the text components of a SWF file add little by way of structure or organization to the web.
I strongly believe that it is almost impossible to build a true semantic web within binary file formats and proprietary virtual machines. We can hack some way towards it, but it will never be close to what plain text markup can offer.