Commits

Mark Pilgrim  committed 5144c2d Draft

constrained writing ;-)

  • Participants
  • Parent commits 745fa32

Comments (0)

Files changed (7)

File case-study-porting-chardet-to-python-3.html

   </ol>
 </ol>
 <h2 id=divingin>Introducing <code class=filename>chardet</code>: a mini-<abbr>FAQ</abbr></h2>
-<p class=fancy>When you think of &#8220;text,&#8221; you probably think of &#8220;characters and symbols I see on my computer screen.&#8221;  But computers don&#8217;t deal in characters and symbols; they deal in bits and bytes. Every piece of text you&#8217;ve ever seen on a computer screen is actually stored in a particular <em>character encoding</em>. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk.
+<p class=fancy>Usually, when people talk about &#8220;text,&#8221; they&#8217;re thinking of &#8220;characters and symbols on the computer screen.&#8221;  But computers don&#8217;t deal in characters and symbols; they deal in bits and bytes. Every piece of text you&#8217;ve ever seen on a computer screen is actually stored in a particular <em>character encoding</em>. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk.
 <p>In reality, it&#8217;s more complicated than that. Many characters are common to multiple encodings, but each encoding may use a different sequence of bytes to actually store those characters in memory or on disk. So you can think of the character encoding as a kind of decryption key for the text. Whenever someone gives you a sequence of bytes and claims it&#8217;s &#8220;text&#8221;, you need to know what character encoding they used so you can decode the bytes into characters and display them (or process them, or whatever).
 <h3 id=faq.what>What is character encoding auto-detection?</h3>
 <p>It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It&#8217;s like cracking a code when you don&#8217;t have the decryption key.

File native-datatypes.html

   <li><a href=#extendinglists>Adding items to a list</a>
   <li><a href=#searchinglists>Searching for values in a list</a>
   </ol>
+<!--
 <li><a href=#sets>Sets</a>
-<!--
     <ol>
     <li>Creating a new set
     <li>Modifying a set
 <li><a href=#furtherreading>Further reading</a>
 </ol>
 <h2 id=divingin>Diving in</h2>
-<p class=fancy>A short digression is in order. Put aside <a href=your-first-python-program.html>your first Python program</a> for just a minute, and let's talk about datatypes. <a href=your-first-python-program.html#datatypes>Every variable has a datatype</a>, even though you don't declare it explicitly. Based on each variable's original assignment, Python figures out what type it is and keeps tracks of that internally.
+<p class=fancy>Cast aside <a href=your-first-python-program.html>your first Python program</a> for just a minute, and let's talk about datatypes. In Python, <a href=your-first-python-program.html#datatypes>every variable has a datatype</a>, but you don't need to declare it explicitly. Based on each variable's original assignment, Python figures out what type it is and keeps tracks of that internally.
 <p>Python has many native datatypes. Here are the important ones:
 <ol>
 <li><b>Booleans</b> are either <code>True</code> or <code>False</code>.
 <li>The <code>index()</code> method finds the <em>first</em> occurrence of a value in the list. In this case, <code>'new'</code> occurs twice in the list, in <code>a_list[2]</code> and <code>a_list[4]</code>, but the <code>index()</code> method will return only the index of the first occurrence.
 <li>As you might <em>not</em> expect, if the value is not found in the list, Python raises an exception. This is notably different from most languages, which will return some invalid index (like <code>-1</code>). While this may seem annoying at first, I think you will come to appreciate it. It means your program will crash at the source of the problem instead of failing strangely and silently later.
 </ol>
+<!--
 <h2 id=sets>Sets</h2>
 <p>FIXME
+-->
 <h2 id=dictionaries>Dictionaries</h2>
 <p>One of Python's most important datatypes is the dictionary, which defines one-to-one relationships between keys and values.
 <blockquote class="note compare perl5">

File porting-code-to-python-3-with-2to3.html

 </ol>
 </ol>
 <h2 id=divingin>Diving in</h2>
-<p class=fancy>Python 3 comes with a utility script called <code>2to3</code>, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. <a href=case-study-porting-chardet-to-python-3.html#running2to3>Case study: porting <code>chardet</code> to Python 3</a> describes how to run the <code>2to3</code> script, then shows some things it can't fix automatically. This appendix documents what it <em>can</em> fix automatically.
+<p class=fancy>Virtually all Python 2 programs will need at least some tweaking to run properly under Python 3. To help with this transition, Python 3 comes with a utility script called <code>2to3</code>, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. <a href=case-study-porting-chardet-to-python-3.html#running2to3>Case study: porting <code>chardet</code> to Python 3</a> describes how to run the <code>2to3</code> script, then shows some things it can't fix automatically. This appendix documents what it <em>can</em> fix automatically.
 <h2 id=print><code>print</code> statement</h2>
 <p>In Python 2, <code>print</code> was a statement. Whatever you wanted to print simply followed the <code>print</code> keyword. In Python 3, <code>print()</code> is a function &mdash; whatever you want to print is passed to <code>print()</code> like any other function.
 <p class=skip><a href=#skipcompareprint>skip over this table</a>

File regular-expressions.html

 <li><a href=#summary>Summary</a>
 </ol>
 <h2 id=divingin>Diving in</h2>
-<p class=fancy>Regular expressions are a powerful and standardized way of searching, replacing, and parsing text with complex patterns of
-characters. If you&#8217;ve used regular expressions in other languages (like Perl), the syntax will be very familiar, and you get by just reading the summary of the <a href=http://docs.python.org/dev/library/re.html#module-contents><code>re</code> module</a> to get an overview of the available functions and their arguments.
-<p>Strings have methods for searching and replacing: <code>index()</code>, <code>find()</code>, <code>split()</code>, <code>count()</code>, <code>replace()</code>, <i class=baa>&amp;</i>c. But these methods are limited to the simplest of cases. For example, the <code>index()</code> method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string <var>s</var>, you must call <code>s.lower()</code> or <code>s.upper()</code> and make sure your search strings are the appropriate case to match. The <code>replace()</code> and <code>split()</code> methods have the same limitations.
-<p>If your goal can be accomplished with string functions, you should use them. They&#8217;re fast and simple and easy to read, and there&#8217;s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with <code>if</code> statements to handle special cases, or if you&#8217;re combining them with <code>split()</code> and <code>join()</code> and list comprehensions in weird unreadable ways, you may need to move up to regular expressions.
-<p>Although the regular expression syntax is tight and unlike normal code, the result can end up being <em>more</em> readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them.
+<p class=fancy>Every modern programming language has built-in functions for working with strings. In Python, strings have methods for searching and replacing: <code>index()</code>, <code>find()</code>, <code>split()</code>, <code>count()</code>, <code>replace()</code>, <i class=baa>&amp;</i>c. But these methods are limited to the simplest of cases. For example, the <code>index()</code> method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string <var>s</var>, you must call <code>s.lower()</code> or <code>s.upper()</code> and make sure your search strings are the appropriate case to match. The <code>replace()</code> and <code>split()</code> methods have the same limitations.
+<p>If your goal can be accomplished with string methods, you should use them. They&#8217;re fast and simple and easy to read, and there&#8217;s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with <code>if</code> statements to handle special cases, or if you&#8217;re chaining calls to <code>split()</code> and <code>join()</code> to slice-and-dice your strings, you may need to move up to regular expressions.
+<p>Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and parsing text with complex patterns of characters. Although the regular expression syntax is tight and unlike normal code, the result can end up being <em>more</em> readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them.
+<blockquote class="note compare perl5">
+<p><span>&#x261E;</span>If you&#8217;ve used regular expressions in other languages (like Perl 5), Python&#8217;s syntax will be very familiar. Read the summary of the <a href=http://docs.python.org/dev/library/re.html#module-contents><code>re</code> module</a> to get an overview of the available functions and their arguments.
+</blockquote>
 <h2 id=streetaddresses>Case study: street addresses</h2>
 <p>This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don&#8217;t just make this stuff up; it&#8217;s actually useful.)  This example shows how I approached the problem.
 <pre class=screen>

File table-of-contents.html

   <li><a href=native-datatypes.html#booleans>Booleans</a>
   <li><a href=native-datatypes.html#numbers>Numbers</a>
   <li><a href=native-datatypes.html#lists>Lists</a>
+<!--
   <li><a href=native-datatypes.html#sets>Sets</a>
+-->
   <li><a href=native-datatypes.html#dictionaries>Dictionaries</a>
   <li><a href=native-datatypes.html#none><code>None</code></a>
   <li><a href=native-datatypes.html#furtherreading>Further reading</a>
   </ol>
 <li><a href=case-study-porting-chardet-to-python-3.html>Case study: porting <code>chardet</code> to Python 3</a>
   <ol>
-  <li><a href=case-study-porting-chardet-to-python-3.html#divingin>Introducing <code class=filename>chardet</code>: a mini-FAQ</a>
+  <li><a href=case-study-porting-chardet-to-python-3.html#divingin>Introducing <code class=filename>chardet</code>: a mini-<abbr>FAQ</abbr></a>
     <ol>
     <li><a href=case-study-porting-chardet-to-python-3.html#faq.what>What is character encoding auto-detection?</a>
     <li><a href=case-study-porting-chardet-to-python-3.html#faq.impossible>Isn't that impossible?</a>
     </ol>
   <li><a href=case-study-porting-chardet-to-python-3.html#divingin2>Diving in</a>
     <ol>
-    <li><a href=case-study-porting-chardet-to-python-3.html#how.bom><code>UTF-n</code> with a <abbr title=Byte Order Mark>BOM</abbr></a>
+    <li><a href=case-study-porting-chardet-to-python-3.html#how.bom><code>UTF-n</code> with a <abbr>BOM</abbr></a>
     <li><a href=case-study-porting-chardet-to-python-3.html#how.esc>Escaped encodings</a>
     <li><a href=case-study-porting-chardet-to-python-3.html#how.mb>Multi-byte encodings</a>
     <li><a href=case-study-porting-chardet-to-python-3.html#how.sb>Single-byte encodings</a>

File unit-testing.html

 <li>...
 </ol>
 <h2 id=divingin>(Not) diving in</h2>
-<p class=fancy>In previous chapters, you &#8220;dived in&#8221; by immediately looking at code and trying to understand it as quickly as possible. Now that you have some Python under your belt, you're going to step back and look at the steps that happen <em>before</em> the code gets written.
-<p>In this chapter, you're going to write, debug, and optimize a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in <a href="regular-expressions.html#romannumerals">&#8220;Case study: roman numerals&#8221;</a>. Now let's step back and consider what it would take to expand that into a two-way utility.
+<p class=fancy>How do you know that the code you wrote yesterday still works after the changes you made today? Every seasoned programmer has war stories of an &#8220;innocent&#8221; change that couldn't <em>possibly</em> have affected that other &#8220;unrelated&#8221; module&hellip; If this sounds familiar, this chapter is for you.
+<p>In this chapter, you're going to write and debug a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in <a href="regular-expressions.html#romannumerals">&#8220;Case study: roman numerals&#8221;</a>. Now step back and consider what it would take to expand that into a two-way utility.
 <p><a href="regular-expressions.html#romannumerals">The rules for Roman numerals</a> lead to a number of interesting observations:
 <ol>
 <li>There is only one correct way to represent a particular number as Roman numerals.

File your-first-python-program.html

 <li><a href=#furtherreading>Further reading</a>
 </ol>
 <h2 id=divingin>Diving in</h2>
-<p class=fancy>You know how other books go on and on about programming fundamentals and finally work up to building something useful?  Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it.
+<p class=fancy>Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it.
 <p class=download>[<a href=humansize.py>download <code>humansize.py</code></a>]</p>
 <pre><code>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
             1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}