<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>バカな火星人 &#187; Haskell</title>
	<atom:link href="http://martian.org/marty/tag/haskell/feed/" rel="self" type="application/rss+xml" />
	<link>http://martian.org/marty</link>
	<description>Marty was here!</description>
	<lastBuildDate>Wed, 11 Jan 2012 13:16:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Haskell sometime soon</title>
		<link>http://martian.org/marty/2010/05/12/haskell-sometime-soon/</link>
		<comments>http://martian.org/marty/2010/05/12/haskell-sometime-soon/#comments</comments>
		<pubDate>Tue, 11 May 2010 15:34:00 +0000</pubDate>
		<dc:creator>Marty</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://martian.org/marty/?p=215</guid>
		<description><![CDATA[Way back in October last year (but only 2 entries ago) I posted some Perl code and wrote that I&#8217;d post the Haskell sometime soon. Well, that is now; the code is below. I intend to rewrite it to use Parsec at some point, but I haven&#8217;t tried that yet since this little hacky script [...]]]></description>
			<content:encoded><![CDATA[<p>Way back in October last year (but only 2 entries ago) I posted <a href="http://martian.org/marty/2009/10/14/perls-xmltwig/">some Perl code</a> and wrote that I&#8217;d post the Haskell sometime soon.  Well, that is now; the code is below.  I intend to rewrite it to use Parsec at some point, but I haven&#8217;t tried that yet since this little hacky script works well enough; and look how long it has taken me to blog it!</p>

<pre><code>module Main where

import Text.XML.HaXml.SAX

data ParserState = FindEntry | FindKeb | FindText

scan :: ParserState -> [SaxElement] -> [String]
scan _ [] = []
scan FindEntry ( (SaxElementOpen "entry" _) : es ) =
    scan FindKeb es
scan FindKeb   ( (SaxElementClose "entry") : es ) =
    "(none)" : (scan FindEntry es)
scan FindKeb   ( (SaxElementOpen "keb" _) : es ) =
    scan FindText es
scan FindText  ( (SaxCharData "\n") : es ) =
    scan FindText es
scan FindText  ( (SaxCharData txt) : es ) =
    txt : (scan FindEntry es)
scan st ( _ : es ) = scan st es

findKebs :: String -> [String]
findKebs i =
    let (es, erc) = saxParse "<stdin>" i in
    scan FindEntry es
</stdin></code></pre>

<p>To understand how it works the most important line is the type declaration &#8220;<code>scan :: ParserState -> [SaxElement] -> [String]</code>&#8220;, which is not actually required by Haskell.  From that line we know that &#8220;scan&#8221; is a function that expects a ParserState as its first parameter and a list of SaxElements as its second parameter, and will then return a list of Strings.  Everything else is a simple matter of recursion and pattern matching :-)</p>
]]></content:encoded>
			<wfw:commentRss>http://martian.org/marty/2010/05/12/haskell-sometime-soon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beating down the XML</title>
		<link>http://martian.org/marty/2009/10/07/beating-down-the-xml/</link>
		<comments>http://martian.org/marty/2009/10/07/beating-down-the-xml/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 18:04:38 +0000</pubDate>
		<dc:creator>Marty</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://martian.org/marty/?p=199</guid>
		<description><![CDATA[XML is still a huge mess, but at least now I have managed to get a few programs that can handle it with reasonable-ish memory requirements. For Perl, as I had thought, the XML::Twig module gave me a pleasant interface and was able to easily handle the document. For Haskell it was a little bit [...]]]></description>
			<content:encoded><![CDATA[<p>XML is still <a href="http://martian.org/marty/2009/09/30/xml-is-a-huge-mess/">a huge mess</a>, but at least now I have managed to get a few programs that can handle it with reasonable-ish memory requirements.</p>

<p>For Perl, as I had thought, the XML::Twig module gave me a pleasant interface and was able to easily handle the document.</p>

<p>For Haskell it was a little bit trickier.  I used the SAX parser in HaXml, but it is not like a regular SAX parser, since Haskell is so unlike any regular language.  The parser returns a lazy list of SAX events, so I had to make sure I processed the list without evaluating the whole thing into memory.</p>

<p>Now that I&#8217;ve dealt with the memory issue it appears that I have a speed issue to deal with next.</p>
]]></content:encoded>
			<wfw:commentRss>http://martian.org/marty/2009/10/07/beating-down-the-xml/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>XML is a huge mess</title>
		<link>http://martian.org/marty/2009/09/30/xml-is-a-huge-mess/</link>
		<comments>http://martian.org/marty/2009/09/30/xml-is-a-huge-mess/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 16:27:28 +0000</pubDate>
		<dc:creator>Marty</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://martian.org/marty/?p=196</guid>
		<description><![CDATA[I have a 39 MB XML file that I wanted to process. I wasn&#8217;t expecting it to be so difficult. Writing the code, in multiple languages, was not difficult. But running the programs was a big problem. My first attempt was a simple Haskell program, but I had to kill it after it ate over [...]]]></description>
			<content:encoded><![CDATA[<p>I have a 39 MB XML file that I wanted to process.  I wasn&#8217;t expecting it to be so difficult.  Writing the code, in multiple languages, was not difficult.  But running the programs was a big problem.</p>

<p>My first attempt was a simple Haskell program, but I had to kill it after it ate over 1.3 GB (yes, 1.3 GB) of ram!</p>

<p>Haskell&#8217;s strings are known to be memory hogs, and the HaXml module I was using was making them even worse by not sensible decoding the UTF-8 text correctly.  I decided to write a leaner Haskell program later, and switch to Perl to get the job done.</p>

<p>At this point I also decided to set a limit to the amount of memory the programs could consume.  For a 39 MB file I hoped that 10 times that would be enough, so I rounded up and set the limit at 512 MB.</p>

<p>But Perl, using the XML::LibXML module, couldn&#8217;t process the file with that memory limit.  I also ran a quick one-liner in Erlang, just to watch it crash out of memory too.  I&#8217;m going to try some other languages to see if I can find one that can work in 512 MB.</p>

<p>My next useful step is to try the <a href="http://xmltwig.com/">XML::Twig</a> module in Perl.  I&#8217;ve had good experiences with it before.  It won&#8217;t be as fast as LibXML, but it probably has the best chance of surviving within my 512 MB limit.  For Haskell, I think I&#8217;ll have to resort to a SAX style parser.</p>
]]></content:encoded>
			<wfw:commentRss>http://martian.org/marty/2009/09/30/xml-is-a-huge-mess/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

