Keywords : Data Mining
One of the first things I thought when I learned about Google Print was "Cool! They will be able to improve their automatic translation technology.".
Why is that?
Most of the recent advances in Natural Language Processing have been based on statistical approaches. Given enough data, you can build statistical models to "learn" from the data instead of trying to encode complex rules to solve some of the NLP difficult problems.
So the reason why I think that Google might improve their automatic translation technology is because they will have access to a huge collection of books with translations in different languages (something referred as parallel texts). You can see this as some sort of Rosetta Stone. Given a big enough collection of books with different translations of the same text, you should be able to have enough good quality data to improve automatic translation.
I did not hear anything from google about this (except may be in one of Peter Norvig's talk where he mentioned they will used Google Print to get better data) until today, with a post from .
Given Google's track record to create new markets based on their technology, I have now doubt they will be able to get a piece of the multi-billion translation market. It would be cool to enable search on sites where I do not understand the language and have the sites translated on the fly. Something Yahoo is also working on.
Index (tags)
|
I recommend...Automatic Translation
|