Saturday, May 30, 2009

Mailman - Python powered mailing list manager

This post can be entertained by people who are list administrators wanting to add Custom Handlers to their mailing lists or audience to learn what Mailman is all about.

First I would like to congratulate all the people at GNU for churning out such useful *and* free softwares( read Free Software Foundation ). Hats off to Stallman for the phenomenon called GNU and FSF!

So what is Mailman all about, it's a mailing list manager... something very similar to a google group. When you start a google group you get an account where you may make changes, moderate and do stuff... but you have a limit to customization! The more common and professional way to manage mailing lists is to use Mailman ( or other mailling list managers )! My version of setting up the server was a complicated one though... it included the following steps:

1. Deploying a Mail Transfer Agent Locally, i used Postfix, other choices- Exim, Sendmail
2. Installing mailman on the same server, there are lot of permission issues.... be really aware of them :D
3. Testing the server and the mailman installation, if everything is right do next!
4. Customizing the installation with Filters you would like to have, Mailman has an easy interface to write plugins in Python
5. Getting a Global Domain so that servers like google, yahoo can connect to me on port 25( SMTP )
6. Initializing lists and populating members through the Web Interface( the coolest part ;) )

I'll take up step 4 for you guys.... rest steps require more of installing capability and negotiating it out with your ISP to grant you a DNS entry :D ( our side ISP's are cool enough ;) ).

The architecture of Mailman has a feature called pipeline, if you lookout for the file Defaults.py in mailman/Mailman/Defaults.py somewhere in /usr/lib or /var/lib or /usr/local/lib. There is a setting variable called GLOBAL_PIPELINE. This is a generic setting of the process a Mail has to go through. Though there can be exceptions for lists or mails directly to the Admin. You just need to place your own Handler in some suitable place. No, no dont edit the Defaults file, do this rather:

in mm_cfg.py add:
GLOBAL_PIPELINE.insert(GLOBAL_PIPELINE.index('Hold'), 'MyContentHandler')

This adds a handler called MyContentHandler.py BEFORE hold. The corresponding file should be pasted in Mailman/Handlers/ folder.

[Remember whenver you make a change to the MyContentHandler.py file reload mailman, the workaround could be to compile it locally... do as you wish!]

I'll post the handler I wrote... :

#Author: Sanket Agarwal
#Email: sketagarwal@gmail.com

"""Determines whether the content of the message that has been passed to mailman conttains any 'restricted' words!"""

#All imports taken from Hold.py
#TODO:Eliminate the non necessary ones
import email
import Hold
from email.MIMEText import MIMEText
from email.MIMEMessage import MIMEMessage
import email.Utils
from types import ClassType
import re

from Mailman import mm_cfg
from Mailman import Utils
from Mailman import Errors
from Mailman import Message
from Mailman import i18n
from Mailman import Pending
from Mailman.Logging.Syslog import syslog

# First, play footsie with _ so that the following are marked as translated,
# but aren't actually translated until we need the text later on.
def _(s):
return s

#The custom error class
class ForbiddenContent(Errors.HoldMessage):
"""This class defines our custom error for filtered content"""
reason = _('The content posted was caught in the filters!')
rejection = _('This message cannot be accepted.')

# And reset the translator
_ = i18n._

#The malicious content filter

def is_illegal(thread_text):
""" This function Returns a true if it finds a piece of illegal/illicit content in the message text, the whole thread has to be parsed unfortunately, as the sender can easily modify the message in case we neglect the history of the thread """

find_match = re.match('\w*bad_word\w*',str(thread_text))
if find_match == None:
return True #Content is cool enough
else:
return False #Restricted content found


def process(mlist, msg, msgdata):
#Experimentation period!
thread_text = email.message.Message.get_payload(msg)[0]
if( is_illegal(thread_text) ):
raise Errors.RejectMessage, "Your messaage was perhaps marked as containing some restriced content... such strikes might ban you from the list! \n -Sanket \n List Admin"
Hold.hold_for_approval(mlist,msg,msgdata,ForbiddenContent)
else:
pass


Pretty cool huh ?

You can have a pigmented version at my website.... http://maillist-cse.iitkgp.ernet.in/kgp/cab/4/

So referring to the pigmented version lemme take you through the code:

Ln: 1-2 , Thank you guys :D
Ln: 6-22: Pretty much from the Hold.py, I had to hold the messages that were caught... so that's why Hold.py
Ln: 40-58: This is the logic which checks if the message text contains bad content... using regular expressions is the best bet! Have a look at the python re docs for details.
Ln: 30-33: Extends the Errors.HoldMessage, a hold of message means that the message will go for moderation!
Ln: 50: A must have method, every handler has it... so should you!
Ln: 52: Extracts the text part of the message, look for Pyhton email.message Docs for details
Ln: 54: Here you go! The message shall go to the list-owner for further approval!

I liked this plugin thing because it makes all the more sense to make my filters only for Mailman or for that reason, the script might be restricted to a particular mailing list( I'll give the link... keep reading ). U might not have the server write privelages to properly setup a SpamAssasin or some other mail filter, but when you've got the Power of Python and Source of Mailman. Go here for a more technically sound version... but you wont find the code newhere else ;)

http://wiki.list.org/pages/viewpage.action?pageId=4030615


"""May the source be with you"""

Friday, May 29, 2009

Mixed Bag

This is what happens when you think too much for the title of your post... you end up with such a gross one, "Mixed bad", eh?

But as I always talk about what I've done in the last few days or weeks, perhaps a fortnight... this time I really have some mixed stuff in store.

West Bengal.... the first thing that might come to your mind is Mr Karat, who lost the battleground to the Gandhi Family but what comes to my mind is not the Communist but something way more dreadful, something which you can't escape... not even in your sweetest of sleeps[non wet ;)], a nightmare and day killer.... the HUMIDITY!!!!

When you've got a 90-95% average humidity and rainfalls which do no good for just over an hour of their vicinity, life's real tough. But as you might have heard, Aila Cyclone hit West Bengal ... though it didnt cause any regretable damage... trails are still visible, and roads say it all... here are some pics of the same:-

Just outside my hall, RK Hall of Residence

Water cluttering the ground... things were much worse in night

One of the "not so common" areas of the campus!


The trails of the monster are visible, but as the campus is fairly scarcely populated nowadays, the aftermath was not a big deal to handle :D.

But things in life are more important than L[Aila], and I being a fraction of workaholic did some progress in terms of my present works.

If you followed my posts... I am presently working on setting a mailserver, a webserver and a NLP project.

Well today I am going to discuss a bit about how domains are maintained, which is of direct consequence to mail/web or any sort of networking!

Consider the familiar www.google.com, www.yahoo.com or the less familiar like, cse.mit.edu.
Whatever I've written above are common in terms of regular use... but the concepts which make them theoretically consistent are pretty interesting.

So we know that when we give a address in Firefox it translates them to IP addresses.... if you didnt now you know.... :). Every human readable URL( Universal Resuource Locater ) converts to the computer format of xxx.xxx.xxx.xxx eg: 203.9.8.17 or 77.45.32.22 etc. I use the tool called 'nslookup' to do such queries... lemme take you through some output:

###############################################
www.google.com canonical name = www.l.google.com.
Name: www.l.google.com
Address: 209.85.153.104
###############################################
Name: google.com
Address: 209.85.171.100
Name: google.com
Address: 74.125.45.100
Name: google.com
Address: 74.125.67.100
###############################################
Name: gmail.com
Address: 64.233.161.83
Name: gmail.com
Address: 74.125.79.83
Name: gmail.com
Address: 209.85.171.83
################################################
mail.google.com canonical name = googlemail.l.google.com.
Name: googlemail.l.google.com
Address: 209.85.153.83
#################################################

So I made four IP addr lookups... www.google.com, google.com, mail.google.com, gmail.com

For a n00b, www.google.com and google.com are the same... aren't they... they dont make a diff when put in a browser ;). So are mail.google.com and gmail.com!

But in fact the above results show that something peculiar is going on!

This is what actually happens for finding the ultimate IP address corresponding web addr. Suppose I put cse.iitkgp.ernet.in and try to find the IP address... my computer will contact the "in" Domain Name Server which will have all entries which end with .in. Eg: ernet.in, gov.in etc.

As our target URL has a ernet.in we'll get the IP address of the ernet.in domain.
###########################################
Non-authoritative answer:
Name: ernet.in
Address: 202.41.97.64
###########################################

The DNS( Domain Name Server ) of ernet.in will have all entries of URL's which end in ernet.in.. eg: iitkgp.ernet.in BUT NOT cse.iitkgp.ernet.in because it is two domains deeper!

So we goto: iitkgp.ernet.in

###########################################
Name: iitkgp.ernet.in
Address: 144.16.192.55
###########################################

Then we'll goto cse.iitkgp.ernet.in

###########################################
Name: cse.iitkgp.ernet.in
Address: 144.16.192.57
###########################################

This is the order of the traversal... the IP's are just to tell that the long names are not just a conincidence... suppose you have a domain myspace.com anything preceding it will be your domain and you'll decide who owns it or not! So a URL like user1.myspace.com will be owned by myspace.com, and user11.user1.myspace.com by user1... if user1 has bought the full domain( there are some caveats here ).

This explains the difference b/w google.com and www.google.com. google.com is under the DNS servers of com and www.google.com under DNS of google.com! It's not a compulsion that all sites which have a website start with www, it's just a standard... I am not sure how Firefox guesses if www is to be prepended but all that should be doing would be nothing more than mere manipulation!

Like DNS translate URL's they also translate MRL's . There's a specific entry in the DNS telling who handles the mail for a particular domain... eg gmail.com :

##############################################
gmail.com mail exchanger = 30 alt3.gmail-smtp-in.l.google.com.
gmail.com mail exchanger = 40 alt4.gmail-smtp-in.l.google.com.
gmail.com mail exchanger = 5 gmail-smtp-in.l.google.com.
gmail.com mail exchanger = 10 alt1.gmail-smtp-in.l.google.com.
gmail.com mail exchanger = 20 alt2.gmail-smtp-in.l.google.com.
##############################################

A domain might have many [M]ail [E]xchanger records to signify that if one goes down... other picks up the mantle.

The above description is not by any means theoretically perfect or practically sound, but it does give an idea about how the Internet is one big database... one domain contains the information for other domains beneath it in the hierarchy! The experience of setting up a Mailserver has enlightened me about the Internet structure of my institute... and as it is one of those "restricted" and "secure" networks which is handled in a very careful manner, working my way out with the ISP was not an easy task at all... my mailinglist server is up and running... you might catch a glimpse of the web interface at:

http://maillist-cse.iitkgp.ernet.in/cgi-bin/mailman/listinfo/

Suggestions or accolades on mailing list, please do comment[ i am looking for a content filters, in case you have an idea]!

Well the second thing on which I've been on a high is the website I've been developing with one of my "brilliant" friends... we've worked most of the stuff out... you might catch a glimpse of the
same at:

http://maillist-cse.iitkgp.ernet.in/kgp/

The URL's might not work properly, but I'll fix them by time you visit the link.... do have a look :)!




Saturday, May 16, 2009

Knock Knock...... it's the mailman!!

If you had a brush through my last post, you would have noticed that there are a lot of things I have been doing around. Though, being practical one can easily realise that doing all of that simultaneously is, GROSS! You just can't do all of that....... if you can, then most invited to do this post!

I took up setting up the Mailing list server as my first task. Lemme tell you something about what it is all about... all you guys interested abt techie talks, here we go:-

Mailman is a mailing list manager, well all of you must have heard of Google Groups... where you can post common mails on different threads. Mailing list is the more generic version of Google Groups, generally a more crude and uneasy interface. Though it is much more elegent in the terms that you can have MUCH more control over the people who post, how they post, setting up spam-filters, blocking out open relays and all sort of that crap.

I have done quite a bit of fiddling with this tool and have a copy of it running safe and sound on a local server. I used Postfix as the Mail Server( MTA ). There's a request for a global IP to be assigned to me so that I can test my postfix installation with the more common Gmail, Yahoo or Rediff servers on the NET.

It's more of a legacy that will be left behind, the main purpose for scripting this kind of server was to maintain a archive of all discussions that might have taken place somewhere on some topic. It is generally found that later these discussions can be helpful... maybe in some research or someone interested in exploring!

Let's hope i can finish it up ASAP :D!

The second thing that I have dived into is Django.... sincerely speaking it roxx! You cannot get enough, and when I had a completely working site with a Forum, Blog, Code Post Space and generic featues like Login/Sign Up..... I was taken aback. Well Sumit has been quite a handful... having done all the CSS and templates part he's been the guy to watch out for.

The MaximumEntropy Part........ ummm... havent really got into that much though Rohit has been pretty upbeat with it.... but i'll get to it once I finish the Mailing List business.

Chao :)

Friday, May 8, 2009

Maximum Entropy

The previous post had a nice time sitting at the top, many factors like End Semester, then pressure to get a project and the society work have lead to a lag in the blog roll! Let's let it roll :D

These few weeks, mabye 1 or 2, have been pretty random in terms of the activities I have done. Ranging for complete fart to some real research mathematics. But yeah I am not sure how things will converge till the month's end when I plan to leave Kgp and join my frnzz back home.

I'll take you through a detour of the activities I pursued during these few days, not chronologically but logically :P

1. Well the first thing that I always had in mind was to do something in Web development. I've been pretty much scared by it due to lack of information and having done no server programming! Having your own society and working for it is turning to be a cool detour. But as you might guess web was never a single handed task... you need to design the backend, the fu**** html code and ofcourse use a connecting interface like php!

Cut the crap pple, who does all the suff nowadays, and being a vigilant open source follower I had heard about a Python web Framework called django( www.djangoproject.com ). It's a MVC( Model View Controller ) type of architecture. MVC is a fancy Jargon which imposes a clean and modular way to build applications. In our case the Database( Model ), Html( View ) and the logic( Control ) part are force seperated from each other.

The coolest thing about Django -> you never need to see database programming<- totally :D. And the hottest thing... the admin interface is already there, a template good enough to populate your blogs or file upload lists. Of course, programmers are most reluctant to build cool admin interfaces for themselves :P, you cant have your cake and eat it too. Django comes to your rescue and gives u a default interface!


Django Admin Interface( database layer developed my me )
I am strongly going to use this framework for the future development of the site, python is the future and anything that flows with it becomes gold!


2. Voila I started poetry again, well this is a very unlikely part of me, but I do write some very gross poetry! I wont go into the details because poets have long and tragedic detours... anyways. So I have started again and thats what counts... though have written only one, I hope to gain pace as time progress... it's tough to get solace in hot and humid Kgp :P

3. Literature again, I am scoring pretty high ;), Dan Brown! Yes the master plotter is back with all his wits and enthralling display of writting skills. I have got hands over the "not" so celebrated book.... Digital Fortress... having completed 90% till date I hope to gobble the remaining part in nick of time. But he totally rox ;).

4. Mailing list and Admin work: If you have been a part of a google group or a "mailing list" to be more professional, you might have observed the simplicity of idea but the great utility. Well some of the department seniors had suggested to establish a local archive of mails and discussions via a mailing list server. I am onto it right now and can be found sitting in my cabin in the research lab of CSE department ;) ( Seriously pple ! ). Havent actually got a great hint of how exactly I would be doing it, but challenges is what I think are common and there's nothing you can do avoid them... so just live with them.... hehe...

But tweaking with the Dns and understanding the priciples behind the "Firewalls", that, so often stop breach of malicious code is very enlightening! Kudos to CSE deptt for being supportive( so far ) :D

5. Last but not the least... the reason I am rubbing my Ass on 45 deg celcius and 90% humidity. God yes a dammed project. Comming to the domain, it is in the field of Natural Language Processing. The work is pretty straight and I feel it'll turn out to be very interesting to implement the ideas. The best thing is that we've got a target and that too short termed. This helps up to see the direction and gain motivation.

But why did I name this post "Maximum Entropy". Interesting enough the project on NLP ( Nat Lang Procseeing ) is aimed on working on a method of modelling called MaxEnt( Maximum entropy Modelling ). And a brief definiton of the same will enlighten the reason for this post being named so :)


"When trying to model an environment, try to take all factors into account but assume nothing about what you are not sure. Thus you maximize the entropy, or freedom, of the system. Such a model is perfect as it'll never give strong and incorrect result. It might refuse to give a affirmative answer, but never an incorrect answer"

Well in my article I think I didnt concentrate on any one topic... rather jumped and bounced over Literature, Web Devel, Mailservers, Poetry and Mathematics! Thus gaining maximum entropy :)