Category Archives: python

https://www.activestate.com/blog/python-vs-java-duck-typing-parsing-whitespace-and-other-cool-differences/

Python Vs. Java: Duck Typing, Parsing On Whitespace And Other Cool Differences

Python has a lot to offer Java developers, and the languages are interesting both in their similarities and their differences. In a prior blog, I discussed the differences between Python and Java at a higher level. This time I’m diving slightly deeper and exploring some of the finer technical differences.

The biggest similarity is their “(almost) everything is an object” design and their reputation for excellent cross-platform support, as well as things like immutable strings and deep, relatively standard libraries.

There are big differences, too. At the community level, Java has always had a single large corporate sponsor. Python support is more distributed. Although both are well within the Algol-like family of languages, Python’s use of syntactically-significant whitespace sets it a little further apart from the mainstream than Java, which is comfortably familiar in its C-like use of braces and semi-colons.

Both languages are compiled down to bytecodes that run on virtual machines, although Python generally does this automatically at runtime and Java has a separate program (javac) that does it. The virtual machines largely isolate the languages from the vagaries of the underlying hardware. Many Java virtual machines (JVMs) have the ability to do just-in-time compilation of parts of the bytecode down to the native instruction set of whatever platform it happens to be running on, which can produce significant speed-ups.

Parsing whitespace puts some people off Python. As someone who worked in SGML-based text processing, I came to Python quite reluctantly because I believed “whitespace is not actually evil… it is just misunderstood”, and could not see how a language that depended on it was a good idea. Once I got used to parsing on whitespace it seemed the most natural thing in the world.

 

A Key Difference: Duck Typing

The biggest difference between the two languages is that Java is a statically typed and Python is a dynamically typed.

Python is strongly but dynamically typed. This means names in code are bound to strongly typed objects at runtime. The only condition on the type of object a name refers to is that it supports the operations required for the particular object instances in the program. For example, I might have two types Person and Car that both support operation “run”, but Car also supports “refuel”. So long as my program only calls “run” on objects, it doesn’t matter if they are Person or Car. This is called “duck typing” after the expression “if it walks like a duck and talks like a duck, it’s a duck”.

This makes Python very easy to write and not too bad to read, but difficult to analyze. Static type inference in Python is a known hard problem. The lack of type information in function signatures combined with support for operator overloading and just-in-time loading of modules at runtime means that the most common type inference algorithms have nothing to work with until the point in the program’s execution when the types are known anyway. The Hindley-Milner algorithm that is commonly used in functional languages like Haskell and ML depends on being able to know, for example, that certain operations are restricted to particular types. These languages also typically have function signatures that “seed” the algorithm with the type information for their arguments.

In Python, names have no strong binding to their type, and thanks to duck typing, function arguments can be used to pass in any object whose interface supports the operations required by the function. There is no reasonable way to determine the type of an argument in this case, which can be very powerful and convenient, and is a lot like how we use objects in the real world. In the real world I don’t generally care if I have a rock or a hammer: both have “hit()” interfaces that result in similar consequences when called.

Classes in object-oriented languages are meant to model concepts, but concepts are purely mental constructs that are essentially attitudes toward the concrete stuff of reality. Duck typing reflects this fact nicely. An object doesn’t have to “be” a particular type, it just has to be useable where a thing of that type might be useable. This can lead to surprises, but it’s a more accurate reflection of the categorical fluidity of human thought than is the rigid hierarchy imposed by more restrictive type systems.

 

A Downside Of Not Having Type Information

The downside is that not having type information means it can be hard to tell what is going on at any given place in the code, particularly when names are ambiguous, which they frequently are. To take an extreme case:

b.polish()

Does that do some kind of fine-tuning on the object, or translate it into a well-known Eastern European language? Or something else?

If “b” is ten layers down in a call stack it’s going to take a long time to answer that question, and if the upper parts of that call stack branch out and are called from many locations, any one of which could be passing in an object of a different type, you could be at it all day.

Java, in contrast, is statically typed. Names in Java are bound to types at compile time via explicit type declaration. This means many type errors that would result in a runtime error–and often a program crash–in Python get caught at compile time in Java. And you can tell at a glance what type of object a name is associated with in Java, which makes analysis by humans as well as compilers much easier.

The cost of this is that developers have to care about types. Java’s automatic type conversions are extremely limited, and the compiler insists that objects passed through interfaces be of a type convertible to the target type, either by inheritance or automatic type promotion. Java doesn’t permit the kind of implicit type conversion based on constructors that C++ does.

This means that Java depends critically on well-designed types, while Python requires very little type design. This is what makes Python a great prototyping language, and is also what makes it a good teaching language and a good language for people who aren’t software professionals. Professionals care about types, and actually enjoy threading the maze of arcane type rules imposed by strongly typed languages to create, clean, powerful systems that are provably type-safe. Everyone else just wants to get their job done.

 

Python Lets You Get The Job Done

Python lets that happen: it is a language that gets out of the way and lets you get the job done. It sometimes feels like it’s a helpful assistant handing you tools. Need a tool to “download stuff” from the Web? Got that. Parse the results? Sure. Run a singular value decomposition on a sparse matrix? No problem. Need antigravity? We’re working on it…

Used with appropriate discipline and testing, Python can comfortably scale to large applications and high-powered Web services.We know this because people have done it. It’s also a great “glue” language for bringing together everything from Fortran-era numerical libs to statistical algorithms in R.

Finally, the Java design has an interesting mix of atomic types built into the language. In Python this isn’t possible, but there is a superset of Python called Cython that allows users to specify types where having that information could result in faster code being emitted by the compiler. Careful use of this extended version of the language can result in considerable performance improvement.

Really finally, there is a version of Python called Jython that compiles to JVM byte-codes, allowing very simple integration of Python with Java, and which gives Python programmers access to Java’s deep and powerful libraries.

 

Worth A Look For Java Developers

Java was a big step forward in simplicity compared to C++, and many people rightly fell in love with it for that reason. Python is an even bigger step in the same direction, toward a simpler, more human-friendly tool for expressing our ideas in a form that machines can turn into reality.

Given all that, Java developers should give Python a look. It’s a great scripting language for automating boring and repetitive tasks, it’s a great embedded language for Java applications, and it’s a great alternative to Java in many situations. What’s not to love?

[Dynamic Programming] Making change using the fewest coins – recursive and iterative solutions.

http://interactivepython.org/runestone/static/pythonds/Recursion/DynamicProgramming.html

 

class Coins:
    def __init__(self, coinValueList):
        self.coinValueList=coinValueList
        self.minCoins=[]
        self.coinsUsed=[]

    def _recMkChange(self,change):
        minCoins = change+1
        if change in self.coinValueList:
            self.minCoins[change] = 1
            self.coinsUsed[change] = change
            return 1
        elif self.minCoins[change] > 0:
            return self.minCoins[change]
        else:
            for i in [ c for c in self.coinValueList if c <= change ]:
                numCoins = 1 + self._recMkChange(change-i)
                if numCoins < minCoins:
                    minCoins = numCoins
                    self.minCoins[change] = minCoins
                    self.coinsUsed[change] = i
        return minCoins

    def recMkChange(self,change):
        if change < 1: return (0,[])
        self.minCoins=[0]*(change+1)
        self.coinsUsed=[0]*(change+1)
        return (self._recMkChange(change), self._showChange(change))

    def _dpMakeChange(self,change):
        for cents in range(change+1):
            coinCount = cents
            newCoin = 1
            for j in [ c for c in self.coinValueList if c &amp;lt;= cents ]:
                if self.minCoins[cents-j] + 1 &amp;lt; coinCount:
                    coinCount = self.minCoins[cents-j]+1
                    newCoin = j
            self.minCoins[cents] = coinCount
            self.coinsUsed[cents] = newCoin
        return self.minCoins[change]

    def dpMakeChange(self,change):
        if change < 1: return (0,[])
        self.minCoins=[0]*(change+1)
        self.coinsUsed=[0]*(change+1)
        return (self._dpMakeChange(change), self._showChange(change))

    def _showChange(self,change):
        coin = change
        result = []
        while coin &amp;gt; 0:
            thisCoin = self.coinsUsed[coin]
            if thisCoin == 0: 
                break # infinite loop safeguard
            result.append(thisCoin)
            coin = coin - thisCoin
        return result

c = Coins([1,5,10,25])
print c.recMkChange(63)

print c.dpMakeChange(63)

 

Finding the Longest Palindromic Substring in Linear Time

https://www.akalin.com/longest-palindrome-linear-time
© Fred Akalin 2005–2015.

Finding the Longest Palindromic Substring in Linear Time
Fred Akalin
November 28, 2007
Another interesting problem I stumbled across on reddit is finding the longest substring of a given string that is a palindrome. I found the explanation on Johan Jeuring’s blog somewhat confusing and I had to spend some time poring over the Haskell code (eventually rewriting it in Python) and walking through examples before it “clicked.” I haven’t found any other explanations of the same approach so hopefully my explanation below will help the next person who is curious about this problem.

Of course, the most naive solution would be to exhaustively examine all (n2) substrings of the given n-length string, test each one if it’s a palindrome, and keep track of the longest one seen so far. This has complexity O(n3), but we can easily do better by realizing that a palindrome is centered on either a letter (for odd-length palindromes) or a space between letters (for even-length palindromes). Therefore we can examine all 2n+1 possible centers and find the longest palindrome for that center, keeping track of the overall longest palindrome. This has complexity O(n2).

It is not immediately clear that we can do better but if we’re told that an Θ(n) algorithm exists we can infer that the algorithm is most likely structured as an iteration through all possible centers. As an off-the-cuff first attempt, we can adapt the above algorithm by keeping track of the current center and expanding until we find the longest palindrome around that center, in which case we then consider the last letter (or space) of that palindrome as the new center. The algorithm (which isn’t correct) looks like this informally:

Set the current center to the first letter.
Loop while the current center is valid:
Expand to the left and right simultaneously until we find the largest palindrome around this center.
If the current palindrome is bigger than the stored maximum one, store the current one as the maximum one.
Set the space following the current palindrome as the current center unless the two letters immediately surrounding it are different, in which case set the last letter of the current palindrome as the current center.
Return the stored maximum palindrome.
This seems to work but it doesn’t handle all cases: consider the string “abababa”. The first non-trivial palindrome we see is “a|bababa”, followed by “aba|baba”. Considering the current space as the center doesn’t get us anywhere but considering the preceding letter (the second ‘a’) as the center, we can expand to get “ababa|ba”. From this state, considering the current space again doesn’t get us anywhere but considering the preceding letter as the center, we can expand to get “abababa|”. However, this is incorrect as the longest palindrome is actually the entire string! We can remedy this case by changing the algorithm to try and set the new center to be one before the end of the last palindrome, but it is clear that having a fixed “lookbehind” doesn’t solve the general case and anything more than that will probably bump us back up to quadratic time.

The key question is this: given the state from the example above, “ababa|ba”, what makes the second ‘b’ so special that it should be the new center? To use another example, in “abcbabcba|bcba”, what makes the second ‘c’ so special that it should be the new center? Hopefully, the answer to this question will lead to the answer to the more important question: once we stop expanding the palindrome around the current center, how do we pick the next center? To answer the first question, first notice that the current palindromes in the above examples themselves contain smaller non-trivial palindromes: “ababa” contains “aba” and “abcbabcba” contains “abcba” which also contains “bcb”. Then, notice that if we expand around the “special” letters, we get a palindrome which shares a right edge with the current palindrome; that is, the longest palindrome around the special letters are proper suffixes of the current palindrome. With a little thought, we can then answer the second question: to pick the next center, take the center of the longest palindromic proper suffix of the current palindrome. Our algorithm then looks like this:

Set the current center to the first letter.
Loop while the current center is valid:
Expand to the left and right simultaneously until we find the largest palindrome around this center.
If the current palindrome is bigger than the stored maximum one, store the current one as the maximum one.
Find the maximal palindromic proper suffix of the current palindrome.
Set the center of the suffix from c as the current center and start expanding from the suffix as it is palindromic.
Return the stored maximum palindrome.
However, unless step 2c can be done efficiently, it will cause the algorithm to be superlinear. Doing step 2c efficiently seems impossible since we have to examine the entire current palindrome to find the longest palindromic suffix unless we somehow keep track of extra state as we progress through the input string. Notice that the longest palindromic suffix would by definition also be a palindrome of the input string so it might suffice to keep track of every palindrome that we see as we move through the string and hopefully, by the time we finish expanding around a given center, we would know where all the palindromes with centers lying to the left of the current one are. However, if the longest palindromic suffix has a center to the right of the current center, we would not know about it. But we also have at our disposal the very useful fact that a palindromic proper suffix of a palindrome has a corresponding dual palindromic proper prefix. For example, in one of our examples above, “abcbabcba”, notice that “abcba” appears twice: once as a prefix and once as a suffix. Therefore, while we wouldn’t know about all the palindromic suffixes of our current palindrome, we would know about either it or its dual.

Another crucial realization is the fact that we don’t have to keep track of all the palindromes we’ve seen. To use the example “abcbabcba” again, we don’t really care about “bcb” that much, since it’s already contained in the palindrome “abcba”. In fact, we only really care about keeping track of the longest palindromes for a given center or equivalently, the length of the longest palindrome for a given center. But this is simply a more general version of our original problem, which is to find the longest palindrome around any center! Thus, if we can keep track of this state efficiently, maybe by taking advantage of the properties of palindromes, we don’t have to keep track of the maximal palindrome and can instead figure it out at the very end.

Unfortunately, we seem to be back where we started; the second naive algorithm that we have is simply to loop through all possible centers and for each one find the longest palindrome around that center. But our discussion has led us to a different incremental formulation: given a current center, the longest palindrome around that center, and a list of the lengths of the longest palindromes around the centers to the left of the current center, can we figure out the new center to consider and extend the list of longest palindrome lengths up to that center efficiently? For example, if we have the state:

<"ababa|??", [0, 1, 0, 3, 0, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]>

where the highlighted letter is the current center, the vertical line is our current position, the question marks represent unread characters or unknown quantities, and the array represents the list of longest palindrome lengths by center, can we get to the state:

<"ababa|??", [0, 1, 0, 3, 0, 5, 0, ?, ?, ?, ?, ?, ?, ?, ?]>

and then to:

<"abababa|", [0, 1, 0, 3, 0, 5, 0, 7, 0, 5, 0, 3, 0, 1, 0]>

efficiently? The crucial thing to notice is that the longest palindrome lengths array (we’ll call it simply the lengths array) in the final state is palindromic since the original string is palindromic. In fact, the lengths array obeys a more general property: the longest palindrome d places to the right of the current center (the d-right palindrome) is at least as long as the longest palindrome d places to the left of the current center (the d-left palindrome) if the d-left palindrome is completely contained in the longest palindrome around the current center (the center palindrome), and it is of equal length if the d-left palindrome is not a prefix of the center palindrome or if the center palindrome is a suffix of the entire string. This then implies that we can more or less fill in the values to the right of the current center from the values to the left of the current center. For example, from [0, 1, 0, 3, 0, 5, ?, ?, ?, ?, ?, ?, ?, ?, ?] we can get to [0, 1, 0, 3, 0, 5, 0, ≥3?, 0, ≥1?, 0, ?, ?, ?, ?]. This also implies that the first unknown entry (in this case, ≥3?) should be the new center because it means that the center palindrome is not a suffix of the input string (i.e., we’re not done) and that the d-left palindrome is a prefix of the center palindrome.

From these observations we can construct our final algorithm which returns the lengths array, and from which it is easy to find the longest palindromic substring:

Initialize the lengths array to the number of possible centers.
Set the current center to the first center.
Loop while the current center is valid:
Expand to the left and right simultaneously until we find the largest palindrome around this center.
Fill in the appropriate entry in the longest palindrome lengths array.
Iterate through the longest palindrome lengths array backwards and fill in the corresponding values to the right of the entry for the current center until an unknown value (as described above) is encountered.
set the new center to the index of this unknown value.
Return the lengths array.
Note that at each step of the algorithm we’re either incrementing our current position in the input string or filling in an entry in the lengths array. Since the lengths array has size linear in the size of the input array, the algorithm has worst-case linear running time. Since given the lengths array we can find and return the longest palindromic substring in linear time, a linear-time algorithm to find the longest palindromic substring is the composition of these two operations.

Here is Python code that implements the above algorithm (although it is closer to Johan Jeuring’s Haskell implementation than to the above description):

def fastLongestPalindromes(seq):
    """
    Behaves identically to naiveLongestPalindrome (see below), but
    runs in linear time.
    """
    seqLen = len(seq)
    l = []
    i = 0
    palLen = 0
    # Loop invariant: seq[(i - palLen):i] is a palindrome.
    # Loop invariant: len(l) >= 2 * i - palLen. The code path that
    # increments palLen skips the l-filling inner-loop.
    # Loop invariant: len(l) < 2 * i + 1. Any code path that
    # increments i past seqLen - 1 exits the loop early and so skips
    # the l-filling inner loop.
    while i < seqLen:
        # First, see if we can extend the current palindrome.  Note
        # that the center of the palindrome remains fixed.
        if i > palLen and seq[i - palLen - 1] == seq[i]:
            palLen += 2
            i += 1
            continue

        # The current palindrome is as large as it gets, so we append
        # it.
        l.append(palLen)

        # Now to make further progress, we look for a smaller
        # palindrome sharing the right edge with the current
        # palindrome.  If we find one, we can try to expand it and see
        # where that takes us.  At the same time, we can fill the
        # values for l that we neglected during the loop above. We
        # make use of our knowledge of the length of the previous
        # palindrome (palLen) and the fact that the values of l for
        # positions on the right half of the palindrome are closely
        # related to the values of the corresponding positions on the
        # left half of the palindrome.

        # Traverse backwards starting from the second-to-last index up
        # to the edge of the last palindrome.
        s = len(l) - 2
        e = s - palLen
        for j in xrange(s, e, -1):
            # d is the value l[j] must have in order for the
            # palindrome centered there to share the left edge with
            # the last palindrome.  (Drawing it out is helpful to
            # understanding why the - 1 is there.)
            d = j - e - 1

            # We check to see if the palindrome at l[j] shares a left
            # edge with the last palindrome.  If so, the corresponding
            # palindrome on the right half must share the right edge
            # with the last palindrome, and so we have a new value for
            # palLen.
            if l[j] == d: # *
                palLen = d
                # We actually want to go to the beginning of the outer
                # loop, but Python doesn't have loop labels.  Instead,
                # we use an else block corresponding to the inner
                # loop, which gets executed only when the for loop
                # exits normally (i.e., not via break).
                break

            # Otherwise, we just copy the value over to the right
            # side.  We have to bound l[i] because palindromes on the
            # left side could extend past the left edge of the last
            # palindrome, whereas their counterparts won't extend past
            # the right edge.
            l.append(min(d, l[j]))
        else:
            # This code is executed in two cases: when the for loop
            # isn't taken at all (palLen == 0) or the inner loop was
            # unable to find a palindrome sharing the left edge with
            # the last palindrome.  In either case, we're free to
            # consider the palindrome centered at seq[i].
            palLen = 1
            i += 1

    # We know from the loop invariant that len(l) < 2 * seqLen + 1, so
    # we must fill in the remaining values of l.

    # Obviously, the last palindrome we're looking at can't grow any
    # more.
    l.append(palLen)

    # Traverse backwards starting from the second-to-last index up
    # until we get l to size 2 * seqLen + 1. We can deduce from the
    # loop invariants we have enough elements.
    lLen = len(l)
    s = lLen - 2
    e = s - (2 * seqLen + 1 - lLen)
    for i in xrange(s, e, -1):
        # The d here uses the same formula as the d in the inner loop
        # above.  (Computes distance to left edge of the last
        # palindrome.)
        d = i - e - 1
        # We bound l[i] with min for the same reason as in the inner
        # loop above.
        l.append(min(d, l[i]))

    return l

And here is a naive quadratic version for comparison:

def naiveLongestPalindromes(seq):

    seq = seq.replace(' ', '')
    seq = seq.lower()    

    seqLen = len(seq)
    lLen = 2 * seqLen + 1
    l = []
    maxLen=0
    maxPos=0

    for i in xrange(lLen):
        # If i is even (i.e., we're on a space), this will produce e
        # == s.  Otherwise, we're on an element and e == s + 1, as a
        # single letter is trivially a palindrome.
        s = i / 2
        e = s + i % 2

        # Loop invariant: seq[s:e] is a palindrome.
        while s > 0 and e < seqLen and seq[s - 1] == seq[e]:
            s -= 1
            e += 1

        l.append(e - s)

        if e - s > maxLen:
            maxLen = e-s
            maxPos = len(l) / 2 + len(l) % 2;

    print maxPos, maxLen, seq[maxPos-maxLen/2-1:maxPos+maxLen/2]

    return l

print fastLongestPalindromes("Are we not drawn onward to new era?")

print naiveLongestPalindromes("Are we not drawn onward to new era?")

python alternative to grep sort uniq

#!/usr/bin/python
import os
import re
from operator import itemgetter
from collections import defaultdict

count_max=24000
mail_to="mailbox@domain"
mail_subj="spammers"
SENDMAIL = "/usr/sbin/sendmail"
data = defaultdict(int)
notify=False

for line in open('/var/log/mail/mail.log.0','r'):
p = re.compile('pickup.*<(w+)>');
s = p.search(line)
if s:
data[s.group(1)] += 1
if (notify==False and data[s.group(1)]>count_max):
notify=True

if notify==True:
m = os.popen("%s -t" % SENDMAIL, "w")
m.write("To: %sn" % mail_to)
m.write("Subject: %sn" % mail_subj)
m.write("n")
for line, count in sorted(data.iteritems(), key=itemgetter(1), reverse=True):
if count>count_max:
m.write("%7d %sn" % (count, line))
#print "%7d %sn" % (count, line),
m.close()