I think this is a great first stab at the problem, but for two reasons I think a robust solution needs more work:
- The first is that, as someone else pointed out, Google is almost certainly logging your translation queries.
- Secondly, even if you do it offline (as someone else suggested) the approach itself might not work. Success in linguistic forensics isn't based (as we might naively assume) on catching obscure words that a particular individual has a tendency to overuse. It's based on subtle shifts in the relative frequency of functional words. Depending on the proximity of the source and target language, round-trip machine translation might not change this.
In forensic linguistics you typically measure a lot of metrics, not just word frequencies, use of punctuation and whitespace, sentence lengths and structures etc. Attribution also isn't the only use of forensic linguistics. You can also look at influences, deas, people, publications etc. For instance in order to infer something about the reader, analyze influence networks etc.
I got interested in forensic linguistics many years ago when an article in a somewhat shady publication mentioned me. I got curious and started reading anything I could find on the topic. I was eventually able to identify the author, but mostly by tricking him to admit it after I had a ranked list of candidates. He was second on a list of about 4-5 people (out of a candidate set of perhaps 300). Not half bad for the rather crude methods I used. I was rather pleased with myself.
I've used similar techniques later to look at influence networks in companies.
Translation history will soon only be available when you are signed in and will be centrally managed within My Activity. Past history will be cleared during this upgrade, so make sure to save translations you want to remember for ease of access later.
- The first is that, as someone else pointed out, Google is almost certainly logging your translation queries.
- Secondly, even if you do it offline (as someone else suggested) the approach itself might not work. Success in linguistic forensics isn't based (as we might naively assume) on catching obscure words that a particular individual has a tendency to overuse. It's based on subtle shifts in the relative frequency of functional words. Depending on the proximity of the source and target language, round-trip machine translation might not change this.