<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Α Try αt Hαskeλλ in Λinguistics</title>
	<atom:link href="http://haskell.krowland.net/?feed=comments-rss2" rel="self" type="application/rss+xml" />
	<link>http://haskell.krowland.net</link>
	<description>Functional programming and linguistics</description>
	<lastBuildDate>Mon, 24 Oct 2011 19:10:21 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>Comment on When God Created the Coffee Break by Koen Roelandt</title>
		<link>http://haskell.krowland.net/?p=494#comment-229</link>
		<dc:creator>Koen Roelandt</dc:creator>
		<pubDate>Mon, 24 Oct 2011 19:10:21 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=494#comment-229</guid>
		<description>Daniel,

First and foremost, thank you for your reaction.

&lt;strong&gt;The transformation rules&lt;/strong&gt;
I&#039;m still working on the transformation rules and especially the relation between their frequency and error reduction score. I based my post on the 1992 article &lt;a href=&quot;http://acl.ldc.upenn.edu/H/H92/H92-1022.pdf&quot; rel=&quot;nofollow&quot;&gt;A simple rule-based part of speech tagger&lt;/a&gt; (bold is mine): 
&lt;blockquote&gt;&quot;The initial tagger was trained on 90% of the corpus (the training corpus). 5% was held back to be used for the patch acquisition procedure (the patch corpus) and 5% for testing. Once the initial tagger is trained, it is used to tag the patch corpus. A list of tagging errors is compiled by comparing the output of the tagger to the correct tagging of the patch corpus. This list consists of triples &lt; taga, tagb, number &gt;, indicating the number of times the tagger mistagged a word with &lt;em&gt;taga&lt;/em&gt; when it should have been tagged with &lt;em&gt;tagb&lt;/em&gt; in the patch corpus.  &lt;strong&gt;Next, for each error triple, it is determined which instantation of a template from the prespecified set of patch templates results in the greatest error reduction&lt;/strong&gt;.&quot;&lt;/blockquote&gt;

The initial tagger Brill refers to in the quote is the frequency-based tagger. Brill&#039;s method differs from the one you (and I) use, but it seems quite clear that the list and score of the instantation rules is calculated using the patch corpus and not the original training corpus. This seems logical to me, because the rules have to correct the initial tagger and the mechanism of tagging unknown words as nouns. If you derive the rules from (tagging) the original training corpus, however, there will be no unknown words. Isn&#039;t that a problem, even if the scoring function will select the most effective rules?

&lt;strong&gt;Unknown words&lt;/strong&gt;
I have no problem whatsoever with tagging unknown words as &quot;NN&quot;. It&#039;s an efficient and elegant solution. At www.nlpwp.org, however, you seem to do the following:

1. Train the tagger using a training corpus
2. Run the tagger on the training corpus
3. Tag unknown words as &quot;NN&quot;
4. Distil transformation rules from the tagged file.

My point is that step 3 is unnecessary, because there are no unknown words. Every word in the traing corpus is - by definition - in the initial tagger. I tried this (but didn&#039;t publish the results) and calculating the transformation rules with or without your function &lt;code&gt;backoffTagger&lt;/code&gt; will yield exactly the same result, i.e. the same list of rules, in the same order, with the same frequency.

&lt;strong&gt;One or two files?&lt;/strong&gt;
I understand your decision to work with one corpus. But you do not create a set of rules that takes into account unknown words (cf. my previous point). I have no idea whether this influences the performance of the finished tagger, it&#039;s something I could try to find out along the way. But maybe you could use 95% of the corpus as a training file and 5% as a patch file? I&#039;m sure your readers won&#039;t mind...</description>
		<content:encoded><![CDATA[<p>Daniel,</p>
<p>First and foremost, thank you for your reaction.</p>
<p><strong>The transformation rules</strong><br />
I&#8217;m still working on the transformation rules and especially the relation between their frequency and error reduction score. I based my post on the 1992 article <a href="http://acl.ldc.upenn.edu/H/H92/H92-1022.pdf" rel="nofollow">A simple rule-based part of speech tagger</a> (bold is mine): </p>
<blockquote><p>&#8220;The initial tagger was trained on 90% of the corpus (the training corpus). 5% was held back to be used for the patch acquisition procedure (the patch corpus) and 5% for testing. Once the initial tagger is trained, it is used to tag the patch corpus. A list of tagging errors is compiled by comparing the output of the tagger to the correct tagging of the patch corpus. This list consists of triples &lt; taga, tagb, number &gt;, indicating the number of times the tagger mistagged a word with <em>taga</em> when it should have been tagged with <em>tagb</em> in the patch corpus.  <strong>Next, for each error triple, it is determined which instantation of a template from the prespecified set of patch templates results in the greatest error reduction</strong>.&#8221;</p></blockquote>
<p>The initial tagger Brill refers to in the quote is the frequency-based tagger. Brill&#8217;s method differs from the one you (and I) use, but it seems quite clear that the list and score of the instantation rules is calculated using the patch corpus and not the original training corpus. This seems logical to me, because the rules have to correct the initial tagger and the mechanism of tagging unknown words as nouns. If you derive the rules from (tagging) the original training corpus, however, there will be no unknown words. Isn&#8217;t that a problem, even if the scoring function will select the most effective rules?</p>
<p><strong>Unknown words</strong><br />
I have no problem whatsoever with tagging unknown words as &#8220;NN&#8221;. It&#8217;s an efficient and elegant solution. At <a href="http://www.nlpwp.org" rel="nofollow">http://www.nlpwp.org</a>, however, you seem to do the following:</p>
<p>1. Train the tagger using a training corpus<br />
2. Run the tagger on the training corpus<br />
3. Tag unknown words as &#8220;NN&#8221;<br />
4. Distil transformation rules from the tagged file.</p>
<p>My point is that step 3 is unnecessary, because there are no unknown words. Every word in the traing corpus is &#8211; by definition &#8211; in the initial tagger. I tried this (but didn&#8217;t publish the results) and calculating the transformation rules with or without your function <code>backoffTagger</code> will yield exactly the same result, i.e. the same list of rules, in the same order, with the same frequency.</p>
<p><strong>One or two files?</strong><br />
I understand your decision to work with one corpus. But you do not create a set of rules that takes into account unknown words (cf. my previous point). I have no idea whether this influences the performance of the finished tagger, it&#8217;s something I could try to find out along the way. But maybe you could use 95% of the corpus as a training file and 5% as a patch file? I&#8217;m sure your readers won&#8217;t mind&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When God Created the Coffee Break by Daniël de Kok</title>
		<link>http://haskell.krowland.net/?p=494#comment-225</link>
		<dc:creator>Daniël de Kok</dc:creator>
		<pubDate>Sat, 22 Oct 2011 17:27:40 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=494#comment-225</guid>
		<description>&quot;you should end up with a pretty strong tagger&quot;

More recent research has shown that HMM, maxent, and SVM taggers outperform transformation-based tagging by a pretty wide margin. We chose to discuss a TBL tagger, because it is conceptually nice and simple, and allows us to show off some Haskell magic.</description>
		<content:encoded><![CDATA[<p>&#8220;you should end up with a pretty strong tagger&#8221;</p>
<p>More recent research has shown that HMM, maxent, and SVM taggers outperform transformation-based tagging by a pretty wide margin. We chose to discuss a TBL tagger, because it is conceptually nice and simple, and allows us to show off some Haskell magic.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When God Created the Coffee Break by Daniël de Kok</title>
		<link>http://haskell.krowland.net/?p=494#comment-224</link>
		<dc:creator>Daniël de Kok</dc:creator>
		<pubDate>Sat, 22 Oct 2011 17:22:50 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=494#comment-224</guid>
		<description>I think you are missing an important point: the initial tagger (which tags words by it most likely tag) and the transformation tagger are not overlapping. The transformation rules act as a corrections to the initial tagger. As such, it is not problematic to use the same training corpus for training the initial tagger and for learning the initial rules. In fact, this is what Eric Brill does in his 1991 paper:

&quot;This very simple algorithm has an error rate of about 7.9% when trained on 90% of the tagged Brown Corpus [...] The initial tagger was trained on 90% of the corpus (the training corpus).&quot;

With respect to tagging unknown words in the initial tagging state in our examples: we do this, because the user could have trained the most-likely word tagger using another (large) corpus. In such a case you&#039;d have to handle unknown words. If you use one training corpus for training the initial tagger and the contextual rules, that is obviously not necessary.</description>
		<content:encoded><![CDATA[<p>I think you are missing an important point: the initial tagger (which tags words by it most likely tag) and the transformation tagger are not overlapping. The transformation rules act as a corrections to the initial tagger. As such, it is not problematic to use the same training corpus for training the initial tagger and for learning the initial rules. In fact, this is what Eric Brill does in his 1991 paper:</p>
<p>&#8220;This very simple algorithm has an error rate of about 7.9% when trained on 90% of the tagged Brown Corpus [...] The initial tagger was trained on 90% of the corpus (the training corpus).&#8221;</p>
<p>With respect to tagging unknown words in the initial tagging state in our examples: we do this, because the user could have trained the most-likely word tagger using another (large) corpus. In such a case you&#8217;d have to handle unknown words. If you use one training corpus for training the initial tagger and the contextual rules, that is obviously not necessary.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The beft n-grams. Ever. by KrisDS</title>
		<link>http://haskell.krowland.net/?p=507#comment-179</link>
		<dc:creator>KrisDS</dc:creator>
		<pubDate>Thu, 22 Sep 2011 05:59:15 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=507#comment-179</guid>
		<description>You&#039;re welcome. :-)</description>
		<content:encoded><![CDATA[<p>You&#8217;re welcome. :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When God Created the Coffee Break by KrisDS</title>
		<link>http://haskell.krowland.net/?p=494#comment-165</link>
		<dc:creator>KrisDS</dc:creator>
		<pubDate>Thu, 08 Sep 2011 05:59:38 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=494#comment-165</guid>
		<description>Looking forward to seeing the Brill tagger in action.

May be a bit too soon, but could you not persist the rules discovered by the Brill tagger so that he can reuse them in future. I would think that if you keep repeating that you should end up with a pretty strong tagger. Except that previously learned rules may not be entirely correct (or ideal) and start dominating later rules...

Anyway, those nlpwp guys should consider hiring you to write their book! ;-)</description>
		<content:encoded><![CDATA[<p>Looking forward to seeing the Brill tagger in action.</p>
<p>May be a bit too soon, but could you not persist the rules discovered by the Brill tagger so that he can reuse them in future. I would think that if you keep repeating that you should end up with a pretty strong tagger. Except that previously learned rules may not be entirely correct (or ideal) and start dominating later rules&#8230;</p>
<p>Anyway, those nlpwp guys should consider hiring you to write their book! ;-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Taggart by Koen Roelandt</title>
		<link>http://haskell.krowland.net/?p=397#comment-55</link>
		<dc:creator>Koen Roelandt</dc:creator>
		<pubDate>Tue, 17 May 2011 17:17:44 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=397#comment-55</guid>
		<description>Interesting stuff. It took me a while to understand what a sound change applier does exactly :-), but &lt;a href=&quot;http://www.chrisdb.me.uk/redmine/projects/haskell-sound-change/repository/changes/doc/HaSCDoc.pdf&quot; rel=&quot;nofollow&quot;&gt;the friendly manual (pdf)&lt;/a&gt; helped a lot. I will certainly check it out more thoroughly in the future!</description>
		<content:encoded><![CDATA[<p>Interesting stuff. It took me a while to understand what a sound change applier does exactly :-), but <a href="http://www.chrisdb.me.uk/redmine/projects/haskell-sound-change/repository/changes/doc/HaSCDoc.pdf" rel="nofollow">the friendly manual (pdf)</a> helped a lot. I will certainly check it out more thoroughly in the future!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Taggart by Chris Bates</title>
		<link>http://haskell.krowland.net/?p=397#comment-54</link>
		<dc:creator>Chris Bates</dc:creator>
		<pubDate>Sat, 14 May 2011 09:41:34 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=397#comment-54</guid>
		<description>Hi, I&#039;ve also been using Haskell to do linguistics, although in a different area. Specifically, I&#039;ve been writing a program to perform sound changes (historical linguistics). I have some pages about the project here:

http://www.chrisdb.me.uk/redmine/projects/haskell-sound-change/repository</description>
		<content:encoded><![CDATA[<p>Hi, I&#8217;ve also been using Haskell to do linguistics, although in a different area. Specifically, I&#8217;ve been writing a program to perform sound changes (historical linguistics). I have some pages about the project here:</p>
<p><a href="http://www.chrisdb.me.uk/redmine/projects/haskell-sound-change/repository" rel="nofollow">http://www.chrisdb.me.uk/redmine/projects/haskell-sound-change/repository</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The crux of the biscuit by KrisDS</title>
		<link>http://haskell.krowland.net/?p=342#comment-49</link>
		<dc:creator>KrisDS</dc:creator>
		<pubDate>Tue, 03 May 2011 06:19:12 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=342#comment-49</guid>
		<description>Mmm, cookies! :-)</description>
		<content:encoded><![CDATA[<p>Mmm, cookies! :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on It&#8217;s the economy, stupid by Koen Roelandt</title>
		<link>http://haskell.krowland.net/?p=274#comment-12</link>
		<dc:creator>Koen Roelandt</dc:creator>
		<pubDate>Sun, 20 Mar 2011 21:03:19 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=274#comment-12</guid>
		<description>Thank you for the comment.

For the sake of clarity I would also stick to the normal sorting methods for the book. On the other hand, you could add a footnote with the information in your comment for the sake of completeness. That way, the reader has both a clear explanation and a pointer to a more efficient sorting method (in C).</description>
		<content:encoded><![CDATA[<p>Thank you for the comment.</p>
<p>For the sake of clarity I would also stick to the normal sorting methods for the book. On the other hand, you could add a footnote with the information in your comment for the sake of completeness. That way, the reader has both a clear explanation and a pointer to a more efficient sorting method (in C).</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on It&#8217;s the economy, stupid by Daniël de Kok</title>
		<link>http://haskell.krowland.net/?p=274#comment-9</link>
		<dc:creator>Daniël de Kok</dc:creator>
		<pubDate>Sat, 19 Mar 2011 20:03:41 +0000</pubDate>
		<guid isPermaLink="false">http://haskell.krowland.net/?p=274#comment-9</guid>
		<description>Normal sorting methods are not so efficient for suffix array, especially if some suffixes are very frequent. In such cases there are more efficient methods, for instance as that described by Manber and Myers. McIlroy and McIlroy provide a sample implementation in C:

http://www.cs.dartmouth.edu/~doug/sarray/

There is also more recent work, such as Kärkkainen et al., 2005.

For the book, when we have to choose between performance and clarity, the latter usually wins ;).</description>
		<content:encoded><![CDATA[<p>Normal sorting methods are not so efficient for suffix array, especially if some suffixes are very frequent. In such cases there are more efficient methods, for instance as that described by Manber and Myers. McIlroy and McIlroy provide a sample implementation in C:</p>
<p><a href="http://www.cs.dartmouth.edu/~doug/sarray/" rel="nofollow">http://www.cs.dartmouth.edu/~doug/sarray/</a></p>
<p>There is also more recent work, such as Kärkkainen et al., 2005.</p>
<p>For the book, when we have to choose between performance and clarity, the latter usually wins ;).</p>
]]></content:encoded>
	</item>
</channel>
</rss>

