Hi, I'm new to both Pythonista and to this forum. I'm slightly new to Python, spending the last 15 years or so in the Enterprise Java world. I am very interested in AI in general and computational linguistics/natural language analysis at the hobby or after work interest level.
I have seen the discussions on/about NLTK in this forum which I too am very interested in having support for. Since it is a pure Python library I thought to try it out in Pythonista. I put together a little test harness to try it out. You can see it or grab it here - NLTK Test script
The reason I post this (my first post to this forum) here in New Discussion rather than in the Share Code section is that the general discussion about NLTK is here in this section of the forum.
What I found from my little test script of interest was:
1 - You can run the NLTK data sets downloader, in non-graphical,commandline/ interactive mode right from Pythonista and that's how I downloaded my data
2 - You don't need to download all of the hundreds of copora/data, only the sets you are interested in and most are a few megabytes only. The ENTIRE set when unarchived is about 1.5 gig. I hava a 120 gig iPad 4 so this was not really an issue.
3 - You can put the data sets anywhere you like provided you set the NLTKDATA environment variable to the location of nltk_data. That means even on a non-jailbroken device there should be somewhere you can put them. For my test, I used the Pythonista app itself to locate the data since my device is jailbroken.
4 - I noticed that I only needed to run a script --that sets the NLTKDATA environment variable -- once. On subsequent times I could comment out that section of my test case and NLTK was able to still find the data. I even shut down the Pythonista process and started it again and ran the script without the explicit setting the variable and it still worked. This leads me to believe that NLTK is persisting the data path somewhere such that it seems feasible to have perhaps a separate script in your library just to set the data path for when you change or add to nltk_data in a new location.
5 - I tried some other more involved sample scripts that used the same (Brown) data set and I found the load and execute times very reasonable. I did not see any of the 30s load times described elsewhere in the forum.
6 - Although numpy and scipy -- and a number of others - support clearly extends the utility of NLTK, and we currently can only run pure Python libs, even NLTK itself provides me with the tools I need to construct some serious applications in Pythonista rather than mere toy apps. Numpy and SciPy will be welcome additions however.