Origin
The First and Best ESL Grammar Checker |
Goal
Our goal from the outset with the Virtual Writing Tutor has been, above all, to enhance ESL pedagogy. Since one of the best ways to learn the structure of English is by writing in English, we figure that good pedagogy will be served through the development of a tool that supports teachers in their efforts to get students to write more. Automating corrective feedback could reduce the amount of time teachers spend giving corrective feedback to their students, and therefore should allow teachers to set more writing assignments for their students.Here is the logic. The amount of writing college students do in a course is usually constrained by the amount of time a teacher has for evaluation and feedback, and, as any teacher will attest, most students are loath to write a multi-paragraphed, structured text unless it "counts." However, when an assignment counts, there is also the expectation of corrective feedback on grammar, word choice, and punctuation on one or more drafts.
I cannot attest for other countries or provinces, but in the Quebec junior college system (a.k.a. CÉGEP system), teachers routinely teach between 125-150 students per semester. Since each writing assignment will normally result in a minimum of 5-10 minutes of underlining, error-coding, or explicit correction per student, each writing assignment adds 10-25 hours of correction time to a teacher's otherwise busy week.
125 Ss x 5 min = 625 min = 10 h 25 min
150 Ss x 10 min = 1500 min = 25 h
With such an impact on workload, teachers usually do not assign more than one or two 350-500 word writing assignments per semester. Automating corrective feedback on errors, should therefore free teachers from the most time-consuming element of writing evaluation and allow them to set additional writing assignments each semester. More assignments will lead to more writing practice, more corrective feedback, and better learning.
Good ESL pedagogy is also served when a student receives corrective feedback in a timely fashion. The very fastest a teacher can provide hand-coded corrective feedback to a class is the week after they have handed in an assignment. Automatic corrective feedback provided online through a website such as the Virtual Writing Tutor can provide at least as much or substantially more corrective feedback in about one second. That's 144000 times faster!
In short, our goal for the Virtual Writing Tutor is to be non-trivial.
Theory
Underpinning the development of a grammar checker for English-Second-Language learners is the assumption that second language writing errors are both predictable and recurrent. Any teacher who has spent more than a year teaching writing can tell you that they see the same errors again and again from the same student and across groups of students, year after year.Errors are understandable when we look at their causes. The most frequent errors in my students' writing tend to originate as translations of structures and sequences in the learner's first language. An example of one such error is the word choice error I
Another high frequency type of error comes from learners simply not knowing the correct form, so an incorrect form is used in its place. That seems to be what is going on in sentences such as We did a lot of sandwiches.French employs faire in contexts where English uses either do or make. The problem is that when earners do not have an extensive knowledge of word collocations, they will guess between the two English forms. The learner will guess correctly sometimes and incorrectly other times.
A third type of error comes from the learner using a false analogy. The learner may know how to write a sentence such as It's big or It's far, but then he or she will falsely assume the sentence It's depend is equally correct because the sentence shares the first word it. Obviously, the learner is not yet aware that big and far are adjectives while depend is a verb and needs to be conjugated.The learner has yet to analyze enough English to know the difference.
A forth type is the false belief. Sometimes learners think they have understood something about English when in fact they have got the wrong idea. For example, one student wrote, You can describe myself as impulsive. What he meant was You could describe me as impulsive. The writer seems to have concluded that myself is an object pronoun.
Other errors are recurrent but are not specific to second language learners. They are due rather to overly rapid typing, inattention or fatigue. An extra keystroke can result in a typo, and a distraction can cause you to type a word twice. While an online spell/grammar checker can be useful in helping locate these recurrent errors, they can occur for all writers writing in either their first language or second language. Here we might label these mistakes rather than errors. Simply having a classmate proofread a text will be enough to eliminate them. No specialized knowledge is required.
While all these errors are recurrent, they are not entirely predictable, are they? We cannot always be sure what forms learners will bring over from their first language because of the principle of homoiophobia. Nor can we ever know with absolute certainty what learners have forgotten from prior instruction or what false analogies and what false beliefs they will come out with.However, the past predicts the future in a probabilistic way. In other words, the more frequently an error has occurred in the past, the more likely we can expect the same error to be committed in the future.
Method
Since ESL writing errors tend to be somewhat recurrent in L2 writing, we assume that the best way to develop error detection rules is from authentic learner texts. One obvious place to start looking for high frequency errors is in learner corpora and our own learners' writing assignments. Of course, we would like our system to be useful to learners all over the world, so we equipped the Virtual Writing Tutor with a script to capture text submitted to the system and store it in a database for later rule development and refinement.
Some teacher-intuition comes into play as we review these texts for errors. High priority errors for us are those that relate to our lessons on English verb tense, aspect, and prepositions, and the error correction practice tasks we given them.
However, we do not rely on teacher-intuition alone to validate our rules. Collocations are checked in a 3+million word native speaker corpus in an attempt to create reliable rules and avoid false positives. Two things have become apparent in the process. First of all, 3 million words is not enough to capture the range of expressions and communicative functions that English has to offer. The corpus often comes up short when queried with what seems to me to be a common enough expression. Secondly, memory or teacher-intuition is not as reliable as you might think. Try it. You might be convinced--as I was--that to and said can never appear in sequence because infinitive structures must contain to + the base form of a verb, but a concordancer can show you an authentic context where to + said can occur. Creating an error-detection rule that identifies to + said would result in a false alarm, something we try hard to avoid.
False positives are a real concern. Recently, I was shocked to discover that the sentence, I have to leave at... produced a false alarm. Koreans and Chinese have the family name I or Lee or Rhee, so when the Virtual Writing Tutor detected a singular proper name before a verb without any inflection, it returned the corrective feedback message that I have to should be I has to. I quickly added the exception. Needless to say, the Virtual Writing Tutor is a work in progress.
Finally, a more serious concern is that we will not be able to provide much help to advanced learners. That is a bridge we will cross once beginner and intermediate errors have been more fully dealt with. Stay tuned.
Some teacher-intuition comes into play as we review these texts for errors. High priority errors for us are those that relate to our lessons on English verb tense, aspect, and prepositions, and the error correction practice tasks we given them.
However, we do not rely on teacher-intuition alone to validate our rules. Collocations are checked in a 3+million word native speaker corpus in an attempt to create reliable rules and avoid false positives. Two things have become apparent in the process. First of all, 3 million words is not enough to capture the range of expressions and communicative functions that English has to offer. The corpus often comes up short when queried with what seems to me to be a common enough expression. Secondly, memory or teacher-intuition is not as reliable as you might think. Try it. You might be convinced--as I was--that to and said can never appear in sequence because infinitive structures must contain to + the base form of a verb, but a concordancer can show you an authentic context where to + said can occur. Creating an error-detection rule that identifies to + said would result in a false alarm, something we try hard to avoid.
Additional Features
The Virtual Writing Tutor is more than just an error detector. It is also an instant curriculum. When a learner submits his or her text with errors in it, the system returns corrective feedback messages and links to related online error correction activities. The idea is that motivated learners will be able to follow those links to individual activities and develop error-correction as a skill. For an example of the range of errors the system can detect and the type of individualize instruction the system can provide, see this click here. To see a random selection of sentences with errors, visit this random error-correction activity and follow the directions.
Limitations
Whenever we attempt to show off the system to other teachers, problems become apparent.Teachers have two expectations. One is that the system will detect all errors in a text. The second is that the system will not detect any errors in a "correct" piece of writing. By reviewing the database of captured texts, we are able to recognize these tests as the come along. Telltale signs are overly simplistic sentences that contain little else but a single agreement or tense error. For example, He work or He work yesterday are recent examples. Then there is always some attempt at feeding the system nonsense, like He work work. Here is a recent example of a test a teacher set for our system and how it bombed.False positives are a real concern. Recently, I was shocked to discover that the sentence, I have to leave at... produced a false alarm. Koreans and Chinese have the family name I or Lee or Rhee, so when the Virtual Writing Tutor detected a singular proper name before a verb without any inflection, it returned the corrective feedback message that I have to should be I has to. I quickly added the exception. Needless to say, the Virtual Writing Tutor is a work in progress.
Finally, a more serious concern is that we will not be able to provide much help to advanced learners. That is a bridge we will cross once beginner and intermediate errors have been more fully dealt with. Stay tuned.