JEdit For Mac



Once you've downloaded the Mac OS X package, open jEdit.dmg. When the jEdit drive image appears on your desktop, copy the jEdit folder on it to your Applications folder (or wherever you'd like install it). Simply run the jEdit application in the jEdit folder, and code away. Only available for Mac users; jEdit. As the name suggests, jEdit is a text editor written in Java, specifically for programmers. The GUI is built on the Swing toolkit, and offers users the ability to build their own ultra-powerful IDE—thanks to open source plugins. As a Java-scripted editor, it runs across Windows, Mac, Unix, and VMS.

Written by: Rasmus Wernersson

Background: data in plain text format


In bioinformatics it's very common to have the data hosted in simple plain text format. For example:
>pigeon_alpha-globin-D
ATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCACCCAGACTGTG
GAGCCGAGGCCCTGGAGAGGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACTT
GCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGGCCGCCTTGGGCAACGCTGTCAAG
AGCCTGGGCAACCTCAGCCAAGCCCTGTCTGACCTCAGCGACCTGCATGCCTACAACCTGCGTGTCGACC
CTGTCAACTTCAAGCTGCTGGCGCAGTGCTTCCACGTGGTGCTGGCCACACACCTGGGCAACGACTACAC
CCCGGAGGCACATGCTGCCTTCGACAAGTTCCTGTCGGCTGTGTGCACCGTGCTGGCCGAGAAGTACAGA
TAA

The same approach is usually also used for other kinds or data - listsor gene names, statitics on DNA patterns etc. The main idea is to keepevery thing simple and open. That way will be easy to usethe data as inpt for different kinds of programs, and write simplescripts (small programs) that reads some kind of input, performs somesort of analysis and outputs the result in a readable manner.

How difficult can it be? Text is text, right?

There are two main concerns when speaking about text files:
1) Plain text vs. Rich text /MS Word / Word Perfect / etc.
There exists a number of file formats that can contain text - usually in a nicelyformatted matter, with embedded graphics and other fancy features. Theproblem here is two fold:
  • A lot of irrelevant information is added (visualized below): Wesimply don't care if the DNA sequence is in BOLD or a fancy font.
  • Even worse there is no standard way to ignore this extrainformation meaning a MS Word fileCANNOT be used as input to our sequence anaysis programs.

2) Different interpretationsof 'plain text'.
In the most widely used type of text files ('old school' text) eachletter is represented by onebyte (8 bits) = 256 possible symbols. Howeach numerical value is interpreted can potantially be different, andthis is know as encoding.
Normally a derivate of ASCII/ANSI encoding is used - see the tablebelow. As can be seen from the table the text 'DNA' whould berepresented by the three numbers: 68,78, 65. If we wanted lower-case it would be 100, 110, 97.
Notice that the values0-31 is reserved for special purpose 'letters' that have no visualrepresentation (more on this later) :


Since ASCII is an american standard national characters like 'Æ','Ø' and 'Å' are NOT represented in the standard part ofthe alphabet - some of these characters are found in the range 128-255(see full set her: TheExtended ASCII Chart - the table above also originates from thispage). I will not go further into how the fullrange of national characters are handled (nor the UNICODE standard) -but rather give a short bit of advice: Whencreating sequence files always stick to the English Letters.While it might be tempting to name you sequence 'Æsel_Insulin' or 'ØrneDNA' there are nogarantee that it will work in all programs.
A second issue is that ofLine Endings ('newlines').
Since a text file is basically just a long string of valuesbetween 0-255, a special symbol must be reserved to split the text intoindividual line. This is done by appending an invisible (value 0-31) 'newline

Jedit Macro Loop

' character by the end ofeach line. Unfortunately three standards exist for this:
  • UNIX standard:
    • 10 - LF ('Line feed' char).
  • Old Mac (System 9 and before):
    • 13 - CR ('Carriage Return' char).
  • DOS/Windows:
    • 13, 10 - both CR and LF.
Any good text editor worth it's salt can handle all three standardstransparently. However, the most commently used Plain Text editor inWindows ('Notepad') CANNOT handle this issue:

FAILS: NotePad trying to open afile with UNIX newlines

WORKS: Same file, now withDOS/Windows style newlines.


(Wikipedia has a very long discription of the newline issue here: newline).

Installing and using jEdit

A large number of good plain text editors exists for various OperatingSystems - for example NEdit for UNIX type systems, BB Edit for the Macand UltraEdit for Windows - some editors exists for multiple platformslike the jEdit program we'll install and test in a moment.
Many of such text editors were originally developed with programming inmind, and contains a number of features that will make programmingeasier, such as syntax-highlighting that will show various part of theprogram being developed in different colors.
For our purpose we will just make use of the most basic functionalityfor viewing and editing DNA/Protein sequence files: The ability tohandle all kinds of newlines, a garantee of saving the files inplain text format and possible advanced search-and-replace whencreating/cleaning our own sequence files.

Download and Install jEdit

Obviously the fist task will be to install jEdit: Go to the jEditwebsite: www.jedit.org and locatethe lastest 'stable' release of jEdit for you platform of choice (forWindows pick the 'Windows installer'- for Mac pick the 'Mac OS X package').Download & install the program package.
Make sure you know where the programhas been installed, and where to find the short-cut to start it.

Taking jEdit for a test run

Download and unpack the following Zip archive which contains threedifferent versions of the same sequence file: SeqExamplesNewlines.zip.
Contents of the archive:
alpha_globin_OldMac.fsa
alpha_globin_Unix.fsa
alpha_globin_Windows.fsa

Jedit For Mac Download


In this case the files are in FASTA format (much more about FASTA inthe later exercises) and have the extansion '.fsa' - NOTICE: You can openany file with any extension in jEdit - as long as it contains text.
Open the files one by one in jEdit - they should look the same, andwhich line endings are used will be indicated by the letters 'U', 'W'or 'M' in the lower right hand corner (you can click the letter tochange the format) - if you are on the Windows platform, you can alsotry to open the files in 'Notepad'and see what happens.
QUESTION 1:

  • Note down the FILE SIZE (inbytes) of each of the three files (just use the Windows Explorer- right click -> properties/ Mac Finder + CMD i / Linux 'ls-l' command).
  • Are they all the same size? Why/Why not?

On file extensions and default programs.

Download and unpack the following Zip archive which contains the SAMEsequence information embedded in various popular document formats: SeqExamplesFormats.zip
Contents of the archive:
Open each of the file by clicking on them to launce the programassociated with the file extension (typically Word for .doc file, abrowser for .html file etc).
QUESTION 2:
  • Can we still find the same information (the DNA sequence) in eachof the files?
  • Note down the size of the files - do they differ much?
Now try to open each of the files injEdit - to see what's really in there.
QUESTION 3:
  • What kind of extra information has been added to the HTML and RTFfiles? (Is it 'Human readable'?).
  • What kind of extra information has been added to the DOC file?Any surprises here?

Search and Replace & Block selection


Normal - line based - selection

Block selection

From time to time it will be necessary to do a slight bit of editing inorder to clean up the data we want to work with. In the followingexample we will be working with the DNA sequence listed below. The taskis to clean it up - get rid of the numbers and spaces - and we want todo as little work as possible.
1 AACGGGCACG GGACGCATGT AGCTGGAACA GTGGCAGCCG TAAATAATAA TGGTATCGGA
61 GTTGCCGGGG TTGCAGGAGG AAACGGCTCT ACCAATAGTG GAGCAAGGTT AATGTCCACA
121 CAAATTTTTA ATAGTGATGG GGATTATACA AATAGCGAAA CTCTTGTGTA CAGAGCCATT
181 GTTTATGGTG CAGATAACGG AGCTGTGATC TCGCAAAATA GCTGGGGTAG TCAGTCTCTG
241 ACTATTAAGG AGTTGCAGAA AGCTGCGATC GACTATTTCA TTGATTATGC AGGAATGGAC
301 GAAACAGGAG AAATACAGAC AGGCCCTATG AGGGGAGGTA TATTTATAGC TGCCGCCGGA
361 AACGATAACG TTTCCACTCC AAATATGCCT TCAGCTTATG AACGGGTTTT AGCTGTGGCC
421 TCAATGGGAC CAGATTTTAC TAAGGCAAGC TATAGCACTT TTGGAACATG GACTGATATT
481 ACTGCTCCTG GCGGAGATAT TGACAAATTT GATTTGTCAG AATACGGAGT TCTCAGCACT
541 TATGCCGATA ATTATTATGC TTATGGAGAG GGAACATCCA TGGCTTGTCC ACATGTCGCC
601 GGCGCCGCC

Open a new jEdit window and paste in the entire block of text. In orderto get rid of the numbers we can use a handy feature of jEdit called Block Selection (the differencebetween 'normal' line selection and block selection is illustratedabove) - simply hold down Control (Windows+Linux) / CMD (Mac) whiledragging the pointer to select a block. Select the block containing thenumbers and hit delete.
Next we want to remove the spaces: Open the find dialog (Control F /CMD F). Notice that there are a ton of advanced options - we can safelyignore them for this simple purpose. Make sure that 'Search in' is set to 'JEdit For MacCurrent buffer' (alternatively youcan just select all the text and search in the selection). In the 'Search for' field simplyenter a single space - and hit 'Replace all

Download Jedit For Mac Os X

' to see all thespaces to disappear in a puff of smoke.
QUESTION 4: Paste in thecleaned up DNA sequence in you report.

Jedit Macro Examples

Conclusion

Jedit Macros

This concludes the short introduction to text-editors. When ever youwork with 'strange' sequence files during the course, remember that youcan always inspect them using jEdit, to find out what's really in there. The samehold true for other text based format such as the ones used forphylogentic trees, as we will see later.

jEdit Editor's Review

jEdit is a fully featured text editor which was written in Java.
I am impressed by the syntax highlight engine that has been implemented into this program. It can recognize more that 130 file types. Because of this reason this application is called: 'the programmer's text editor' by its producers.
The source code editing engine is simply remarkable and the syntax highlight can recognize also a lot of functions that are typical for all kind of programming languages. The fact that is a Java based program means portability and this is a good thing if you are programming on other platforms. On the other hand, it has slow startups because of the Java engine.
The keyboard shortcuts are implemented very well. I could complete most of the text editing jobs without the need of learning the shortcuts because they are common for most of the text editors.
Another nice thing is the support for unlimited clipboards. You won't have to limit yourself with a single clipboard especially when you have to paste many advanced structures.
The interface is quite well designed and it's pretty ergonomic. The only thing that I really miss is a tabbed interface because the drop-down menu with all the opened documents isn't that inspired.
The 'Search and Replace' engine is one of the best that has been implemented into a text editor. The functionality is simply remarkable.
Pluses:

Jedit For Mac

is a full featured text editor, it has a wide support for syntax highlight for more that 130 file types.
Drawbacks / flaws: it has a very slow startup speed because it runs into a Java Virtual Machine.
In conclusion: you can use this tool in order to complete your every day text editing job. You can also try some other Open Source alternatives such as Carbon Emacs or Smultron, or a commercial solution referred as 'the missing text editor on Mac OS X', TextMate.
version reviewed: 4.0.1