
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Finding Suspects Isn&#8217;t the Problem</title>
	<atom:link href="http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/feed/" rel="self" type="application/rss+xml" />
	<link>http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/</link>
	<description>Keeping politicians&#039; hands off the Net &#38; everything else related to technology</description>
	<lastBuildDate>Tue, 14 Feb 2012 19:26:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Tim Lee</title>
		<link>http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/comment-page-1/#comment-36468</link>
		<dc:creator>Tim Lee</dc:creator>
		<pubDate>Tue, 19 Dec 2006 12:54:50 +0000</pubDate>
		<guid isPermaLink="false">http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/#comment-36468</guid>
		<description>&lt;p&gt;Will,&lt;/p&gt;

&lt;p&gt;I suggest you take a look at the paper, which does a pretty good job of drawing some important distinctions here:&lt;/p&gt;

&lt;blockquote&gt;There are two loose categories of data  analysis that are relevant to this discussion: subject based and pattern based. Subject-based data analysis seeks to trace links from known individuals or things to others. The example just cited and the opportunities to disrupt the 9/11 plot described further above would have used subject-based data analysis because each of them starts with information about specific suspects, combined with general knowledge.

In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as &quot;predictive data mining.&quot;&lt;/blockquote&gt;

&lt;p&gt;If there are data analysis tools that allow investigators to rank potential suspects from a list of people or activities that are already under investigation, that could conceivably be useful. On the other hand, using large data sets to find brand new suspects is not likely to be useful, because even the best algorithm is likely to find thousands of false leads for every actual suspect it finds. And importantly, predictive data mining requires surveillance of millions of innocent Americans, whereas analysis of existing data doesn&#039;t.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Will,</p>

<p>I suggest you take a look at the paper, which does a pretty good job of drawing some important distinctions here:</p>

<blockquote>There are two loose categories of data  analysis that are relevant to this discussion: subject based and pattern based. Subject-based data analysis seeks to trace links from known individuals or things to others. The example just cited and the opportunities to disrupt the 9/11 plot described further above would have used subject-based data analysis because each of them starts with information about specific suspects, combined with general knowledge.

In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as &#8220;predictive data mining.&#8221;</blockquote>

<p>If there are data analysis tools that allow investigators to rank potential suspects from a list of people or activities that are already under investigation, that could conceivably be useful. On the other hand, using large data sets to find brand new suspects is not likely to be useful, because even the best algorithm is likely to find thousands of false leads for every actual suspect it finds. And importantly, predictive data mining requires surveillance of millions of innocent Americans, whereas analysis of existing data doesn&#8217;t.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Lee</title>
		<link>http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/comment-page-1/#comment-52148</link>
		<dc:creator>Tim Lee</dc:creator>
		<pubDate>Tue, 19 Dec 2006 12:54:50 +0000</pubDate>
		<guid isPermaLink="false">http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/#comment-52148</guid>
		<description>&lt;p&gt;Will,&lt;br&gt;&lt;br&gt;I suggest you take a look at the paper, which does a pretty good job of drawing some important distinctions here:&lt;br&gt;&lt;br&gt;&lt;/p&gt;

&lt;blockquote&gt;There are two loose categories of data  analysis that are relevant to this discussion: subject based and pattern based. Subject-based data analysis seeks to trace links from known individuals or things to others. The example just cited and the opportunities to disrupt the 9/11 plot described further above would have used subject-based data analysis because each of them starts with information about specific suspects, combined with general knowledge.&lt;br&gt;&lt;br&gt;In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as &quot;predictive data mining.&quot;&lt;/blockquote&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;If there are data analysis tools that allow investigators to rank potential suspects from a list of people or activities that are already under investigation, that could conceivably be useful. On the other hand, using large data sets to find brand new suspects is not likely to be useful, because even the best algorithm is likely to find thousands of false leads for every actual suspect it finds. And importantly, predictive data mining requires surveillance of millions of innocent Americans, whereas analysis of existing data doesn&#039;t.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Will,<br /><br />I suggest you take a look at the paper, which does a pretty good job of drawing some important distinctions here:<br /><br /></p>

<blockquote>There are two loose categories of data  analysis that are relevant to this discussion: subject based and pattern based. Subject-based data analysis seeks to trace links from known individuals or things to others. The example just cited and the opportunities to disrupt the 9/11 plot described further above would have used subject-based data analysis because each of them starts with information about specific suspects, combined with general knowledge.<br /><br />In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as &#8220;predictive data mining.&#8221;</blockquote>

<p><br /><br />If there are data analysis tools that allow investigators to rank potential suspects from a list of people or activities that are already under investigation, that could conceivably be useful. On the other hand, using large data sets to find brand new suspects is not likely to be useful, because even the best algorithm is likely to find thousands of false leads for every actual suspect it finds. And importantly, predictive data mining requires surveillance of millions of innocent Americans, whereas analysis of existing data doesn&#8217;t.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Will Dwinnell</title>
		<link>http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/comment-page-1/#comment-36467</link>
		<dc:creator>Will Dwinnell</dc:creator>
		<pubDate>Tue, 19 Dec 2006 09:18:14 +0000</pubDate>
		<guid isPermaLink="false">http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/#comment-36467</guid>
		<description>&lt;p&gt;Much of the on-line commentary surrounding data mining&#039;s use against terrorism has, in my opinion, taken the technically naive perspective that data mining output is &quot;right&quot; or &quot;wrong&quot;.  Some have dressed up this perspective in terms of &quot;false positives&quot; and &quot;false negatives&quot;.&lt;/p&gt;

&lt;p&gt;In practice, most classification systems yield probabilities, not simple classifications, which can be sorted to prioritize treatment.&lt;/p&gt;

&lt;p&gt;In marketing, for instance, business people are not interested in classifying prospects as &quot;purchasers&quot; and &quot;non-purchasers&quot;.  Seldom do real-world classification problems yield solutions of sufficient quality as to simply lump people into two groups.  Instead, potential customers are ranked by the predicted probability of purchasing, allowing the finite resource of treatment (advertising, for instance) to be directed at those most likely to respond.  It is common in many fields to assess such predictive models in terms of how many target class individuals (purchasers, for example) end up in the most likely 5%, 10%, etc.&lt;/p&gt;

&lt;p&gt;Given the scarce resource of investigation time that you identify, I suggest that data mining is one of a number of useful tools to be used to direct its application.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Much of the on-line commentary surrounding data mining&#8217;s use against terrorism has, in my opinion, taken the technically naive perspective that data mining output is &#8220;right&#8221; or &#8220;wrong&#8221;.  Some have dressed up this perspective in terms of &#8220;false positives&#8221; and &#8220;false negatives&#8221;.</p>

<p>In practice, most classification systems yield probabilities, not simple classifications, which can be sorted to prioritize treatment.</p>

<p>In marketing, for instance, business people are not interested in classifying prospects as &#8220;purchasers&#8221; and &#8220;non-purchasers&#8221;.  Seldom do real-world classification problems yield solutions of sufficient quality as to simply lump people into two groups.  Instead, potential customers are ranked by the predicted probability of purchasing, allowing the finite resource of treatment (advertising, for instance) to be directed at those most likely to respond.  It is common in many fields to assess such predictive models in terms of how many target class individuals (purchasers, for example) end up in the most likely 5%, 10%, etc.</p>

<p>Given the scarce resource of investigation time that you identify, I suggest that data mining is one of a number of useful tools to be used to direct its application.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Will Dwinnell</title>
		<link>http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/comment-page-1/#comment-52147</link>
		<dc:creator>Will Dwinnell</dc:creator>
		<pubDate>Tue, 19 Dec 2006 09:18:14 +0000</pubDate>
		<guid isPermaLink="false">http://techliberation.com/2006/12/18/finding-suspects-isnt-the-problem/#comment-52147</guid>
		<description>&lt;p&gt;Much of the on-line commentary surrounding data mining&#039;s use against terrorism has, in my opinion, taken the technically naive perspective that data mining output is &quot;right&quot; or &quot;wrong&quot;.  Some have dressed up this perspective in terms of &quot;false positives&quot; and &quot;false negatives&quot;.&lt;br&gt;&lt;br&gt;In practice, most classification systems yield probabilities, not simple classifications, which can be sorted to prioritize treatment.&lt;br&gt;&lt;br&gt;In marketing, for instance, business people are not interested in classifying prospects as &quot;purchasers&quot; and &quot;non-purchasers&quot;.  Seldom do real-world classification problems yield solutions of sufficient quality as to simply lump people into two groups.  Instead, potential customers are ranked by the predicted probability of purchasing, allowing the finite resource of treatment (advertising, for instance) to be directed at those most likely to respond.  It is common in many fields to assess such predictive models in terms of how many target class individuals (purchasers, for example) end up in the most likely 5%, 10%, etc.&lt;br&gt;&lt;br&gt;Given the scarce resource of investigation time that you identify, I suggest that data mining is one of a number of useful tools to be used to direct its application.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Much of the on-line commentary surrounding data mining&#8217;s use against terrorism has, in my opinion, taken the technically naive perspective that data mining output is &#8220;right&#8221; or &#8220;wrong&#8221;.  Some have dressed up this perspective in terms of &#8220;false positives&#8221; and &#8220;false negatives&#8221;.<br /><br />In practice, most classification systems yield probabilities, not simple classifications, which can be sorted to prioritize treatment.<br /><br />In marketing, for instance, business people are not interested in classifying prospects as &#8220;purchasers&#8221; and &#8220;non-purchasers&#8221;.  Seldom do real-world classification problems yield solutions of sufficient quality as to simply lump people into two groups.  Instead, potential customers are ranked by the predicted probability of purchasing, allowing the finite resource of treatment (advertising, for instance) to be directed at those most likely to respond.  It is common in many fields to assess such predictive models in terms of how many target class individuals (purchasers, for example) end up in the most likely 5%, 10%, etc.<br /><br />Given the scarce resource of investigation time that you identify, I suggest that data mining is one of a number of useful tools to be used to direct its application.</p>]]></content:encoded>
	</item>
</channel>
</rss>

