Rabu, 29 Februari 2012

Sanctioned Attorney Nicholas Penkovsky Tried To Resurrect His "Rubber Room" Case in New York State Court of Appeals, They Didnt Buy His Argument


by Betsy Combier, Editor, Parentadvocates.org/NYC Rubber Room Reporter

From Editor Betsy Combier, an opinion: Attorney Nick Penkovsky has been sued by his own law firm, sanctioned by the United States District Court, and has continuously failed to win at 3020-a arbitration hearings representing "rubber room" teachers. He replaced infamous now disbarred attorney Ed Fagan in the case cited here. However he tried again to win something and presented the issue of the "rubber rooms" in oral argument to a panel of two judges (a third judge, Susan Carney, recused herself) at the Court of Appeals on February 9, 2012. The decision: AFFIRM the lower Court.
           
Disbarred Attorney Ed Fagan
'Rubber Room' Teachers Urge Circuit to Reinstate Their Claims
Mark Hamblett, New York Law Journal, February 10, 2012

The lawyer for New York City teachers who were placed in so-called "rubber rooms" for extended periods while awaiting possible disciplinary proceedings tried on Feb. 9 to convince a federal appeals court to reinstate their lawsuits.

Attorney Nicholas Penkovsky told Judges Robert Katzmann (See Profile) and Denny Chin (See Profile) of the U.S. Court of Appeals for the Second Circuit that a lower court was wrong to dismiss claims of employment discrimination, deprivation of a right to a prompt hearing and First Amendment retaliation for complaining about the actions of their supervisors.

The judges on the panel in Ebewo v. New York State Education Department, 10-4989-cv., also heard arguments from Assistant Corporation Counsel Ronald E. Sternberg, representing the city; Assistant Solicitor General Sudarsana Srinivasan for New York state; and, a pro se plaintiff in a consecutive case, Josefina Cruz.

Read briefs for the teachers, the city and the state.

Mr. Penkovsky represents five teachers who languished from two to five years in rubber rooms, nicknamed after the padded cells once used in mental hospitals.

In the rooms, officially called "temporary reassignment centers," as many as 600 teachers were placed at one time, according to media reports. There, the teachers spent their work weeks engaged in a variety of activities, including reading, sleeping, running small businesses and playing Scrabble.

The case was dismissed by Southern District Judge Victor Marrero (See Profile) on Nov. 15, 2011, on the report and recommendation of Magistrate Judge Andrew Peck (See Profile).

The magistrate judge also recommended sanctions of $7,000 against Mr. Penkovsky and $21,000 against Joy Hochstadt, Ms. Cruz's former lawyer, for filing "patently frivolous" claims in a Fourth Amended complaint.

The offending claims, Magistrate Judge Peck said, were that the rubber rooms violated the Thirteenth Amendment's prohibition against involuntary servitude and amounted to a hostile work environment.

The appeal of the sanctions ruling is before Juge Marrero.

At the Feb. 9 arguments, Judge Katzmann asked whether Mr. Penkovsky could cite a single case where a terminated teacher had been found to have a constitutionally cognizable property interest when they had been removed from the classroom but were still being paid full salary.

Mr. Penkovsky, a solo practitioner in Riverdale, said he could not. But he cited, Parrett v. City of Connersville, 737 F. 2d 690, where the Seventh Circuit found that a police detective who continued to be paid but was confined to a windowless office with no duties had effectively had his job taken away without due process of law.

And Parrett, Mr. Penkovsky said, was mentioned in a positive light by the Second Circuit in O'Connor v. Pierson, 426 F.3d 187 (2005), where the circuit, in dicta, kept open the possibility that a suspended, tenured employee who is being paid could raise a claim for violation of substantive due process.

Judge Katzmann asked whether, assuming the teachers were constructively discharged, "Why wouldn't state CPLR Article 78 provide them with adequate process?"

Mr. Penkovsky said that, first, federal courts properly have jurisdiction; second, there are questions about when a four-month statute of limitations for an Article 78 proceeding would start to run; and third, the violations are occurring "while waiting for the state to bring charges," so the launch of an Article 78 proceeding would be, in effect, "compelling" the state to bring charges,

The city announced in 2010 that it was terminating the rubber rooms, but Mr. Penkovsky said after oral arguments that the city has done no more than keep the teachers at district offices.

Ms. Cruz gamely told the circuit judges, "We were waiting in rubber rooms to defend ourselves." She added, "I was deprived of a constitutionally protected property interest," but, she said, the district court ignored her plea to certify a question of state law to the New York Court of Appeals on the procedural protections set forth in Education Law §3020, which she said were eliminated by New York City in 2006.

Arguing for the city, Mr. Sternberg said that "neither the briefs of the plaintiffs nor what I heard this afternoon calls into question" any of the reasoning employed by Magistrate Judge Peck or Judge Marrero.

Judge Chin asked Mr. Sternberg if the city held the position that "as long as they're paid, this can go on indefinitely?"

Mr. Sternberg answered no, but added that "under the circumstances" there are proceedings to challenge both the length of delay and the conditions, and both judges in the lower court "pointed out that Article 78 is available."

Judge Chin explored the claim that some of the teachers were constructively discharged.

"If you are relegated to a room day after day with nothing to do, isn't that constructive discharge?" he asked, raising the issue of the police detective in Parrett.

Mr. Sternberg said he did not have a "response" to Parrett but said the plaintiffs here not only continued to be paid, but they failed to allege constructive discharge in their complaint.

Ms. Srinivasan argued that New York state was not even a proper party to the litigation, because it is merely notified about the charges, which are handled by each school district in its own way. The state, she added, only provides a list of hearing officers and a small amount of the money for the hearings.

When Mr. Penkovsky returned to the lectern, he was again questioned by Judge Katzmann on why he did not bring an Article 78 proceedings.

"We have case law that says Article 78 proceedings are appropriate in this context," the judge said.

Mr. Penkovsky returned to his original point, saying that Article 78 cases only apply when a teacher has been terminated.

"However," he added, "where there is no hearing, there is a procedural due process" claim to be made.

A third judge, Susan Carney, recused herself.

Nick Penkovsky and Joy Hochstadt, Two Lawyers Who Took On NYC "Rubber Room" Cases, Are Reprimanded In Federal Court

Con Man and Snake Oil Salesman Ed Fagan Tries To Shut Down Parentadvocates.org., Lewenstein Serves Subpoena on Gizella Weisshaus

Why Teacher Peformance Data Should Be Public (and Why Bill Gates Gets It Wrong

Bill Gates

February 23, 2012 5 Comments by RiShawn Biddle
LINK

Few have done so much to help finance the school reform movement — and explain the economic reasons why all children need a high-quality education — as Bill Gates has done over the past two decades. For that, the Microsoft billionaire and cofounder of the foundation which share his (and his wife Melinda’s) name deserves plenty of praise.

But even Bill Gates gets it wrong. This was clear last month when a report from research outfit MRDC vindicated the small high schools effort his foundation championed (during education technology guru Tom Vander Ark’s time running its education philanthropy operations) and then largely abandoned when it didn’t immediately bear fruit. (Your editor also got it partly wrong too as did other reformers.) Same is true for the Gates Foundation’s continued support of including classroom observations in its so-called “multiple measures” approach to evaluating teachers — even as data from its latest Measures of Effective Teaching report reveals that doing so actually makes evaluations even less useful for teachers and students alike. And on the pages of the New York Times, Gates (or, more likely, the Gates Foundation’s communications department) takes the wrong approach on publishing data on the performance of teachers.

The latest round of this discussion was prompted by last week’s ruling by a New York State appellate court allowing New York City to release teacher performance data based on Value-Added analysis of student test score data. The American Federation of Teachers’ Big Apple local had fought tooth-and-nail against the move tooth. With New York City officials preparing to release the data on a series of Excel spreadsheets (when it could have actually done a better job by creating a simple interactive database), the release of the data should be championed by reformers such as Gates as great news for families — who need to know the quality of the teachers in whose care they leave their children — and good-to-great teachers (who can now get the recognition they deserve for high-quality work that often gets lost in cultures of mediocrity that do more to reward laggard counterparts). And this form of sunlight is the best sort of disinfectant, shedding light on low quality teaching, the faulty school leadership that enables it through in myriad ways, and the ed schools that fail to equip aspiring teachers with the tools needed for successfully helping their students.

Yet in his op-ed, Gates is far more concerned with the feelings of teachers than with promoting such systematic reform. He argues that publishing teacher data does little more than “shame” poor performing teachers without giving them the “specific feedback” they need in order to either improve or leave the profession. From where he sits, publishing the data will only harm efforts the Gates Foundation is pushing to overhaul teacher evaluation systems

Gates is hardly alone in this view. Within the past couple of years, Dropout Nation has criticized American Enterprise Institute education czar Rick Hess, and others in the Beltway reform crowd (as well as operations-oriented reformers such as Wendy Kopp of Teach For America), for opposing the publishing of Value-Added teacher performance data. The fact that many of these so-called “bold” reformers are unwilling to actually be bold when the opportunity presents itself is one reason why deserved to be called on the carpet. But another reason why their thinking deserves criticism is that on this subject, they lose focus on the concerns that matter most: That of families and high-quality teachers. They constantly approach this issue from the perspective of school leaders, and operators (whose roles in the human capital arena include evaluating performance and fostering strong school cultures) instead of from the perspective of parents (who as guardians of those who Clare consumers in education, only care about helping their kids achieve lifelong success) or good-to-great teachers as individual professionals (who as much want to be recognized and rewarded for their work as they want to get solid feedback from principals). And this is the mistake Gates makes as well.

Gates is right about the need of evaluation systems to provide teachers with “specific feedback”. In fact, the lack of specific feedback is (along with the absence of data on how teachers improve student achievement) is why traditional evaluations are so abysmal — and why a multiple measures approach of some form must be at the heart of overall teacher evaluations. (Whether that involves using inaccurate and far too subjective classroom observations is a different matter entirely.) But the specific feedback issue is a concern only for principals and superintendents, and, to a lesser extent, researchers, think tankers, and those involved in training aspiring teachers.

From where the consumers of education — children and the parents who advocate for them — sit, the more-important issue is whether the teacher can actually nurture their inherent genius and help them improve achievement over time. They should be the lead decision-makers in education, but, save for leading school overhauls through using Parent Trigger laws or even starting their own schools, actually structuring operations may not always be a matter with which they want to be concerned. Their bigger concern lies with the ability of teachers to improve student achievement over time, and whether those instructors care for — and empathize with — every child, regardless of who they are or where they live.

Given that the quality of a child’s education can vary from classroom to classroom, being able to choose the right teacher for their child matters more than an instructor’s tender mercies. And when one considers that at the elementary level, teacher quality can vary among teachers from one subject to another, parents should know how well a teacher does in the classroom — and be able to use the data to force schools to improve the quality of instruction in every classroom.

High-quality data on all aspects of education — especially teacher performance — is critical to helping families become real consumers and lead decisionmakers in education. It is also key in causing the kind of disruptions that have helped begin the first steps in systemically reforming American public education. And this is what Gates (whose own fortunes were made thanks to consumers making informed choices about computers, software, and operating systems) and other reformers should want.

As for good-to-great teachers? Their concerns about specific feedback being a component of professional development is balanced by their own concerns for actually being recognized for their work. After all, they often go without the proper recognition — in the form of wider arrays of compensation and career opportunities — that they so richly deserve. More importantly, as the Los Angeles Times demonstrated two years ago in its series revealing the performance data of teachers in the City of Angels’ school district, high-quality teachers are often forced by lower-performing colleagues to remain quiet about their achievements. And as seen in the case of John Taylor Gatto and the legendary Jaime Escalante, forced out of the profession because of jealousy within the ranks. Revealing this data can help push the reforms needed to help make teaching a stronger, sophisticated, and more-rewarding profession.

One can understand where Gates comes from on the issue of publishing teacher Value-Added data. But he and other Beltway and operation-oriented reformers are taking the wrong stance on this issue. Our kids, families, and teachers need and deserve to have this data. Because knowledge is key to fostering the very reforms Gates and others are championing that can help all of our children.

Charter Schools' Teachers' Rankings

City releases charter schools' teachers' rankings

Last Updated:8:54 AM, February 29, 2012
Posted:2:12 AM, February 29, 2012
Superintendent Seth Andrew, with students Kevin Lassiter and Mbayang Kasse, at Democracy Prep.

The city Department of Education released more data on teachers’ effectiveness in the classroom yesterday — this time ranking hundreds of teachers in charter schools and in the citywide special-education district.
The new ratings follow last week’s release of similar data for 18,000 public-school teachers in grades 4 through 8 — which came after The Post won a protracted legal battle to get the information under the Freedom of Information Law.

The data ranks teachers on a scale from 0 to 99 compared to their peers with similar experience, and is based on a complex formula designed to gauge a teacher’s ability to boost their students’ test scores on state math or reading tests.
The breakdown for 32 charter schools showed the high-performing Democracy Prep Charter School in Manhattan staffing the highest-rated teachers in 2010.
Eight of its 14 teachers — who work their classrooms in pairs — scored between 95 and 99, while not one instructor at the Harlem school ranked below a 60.
On the opposite end, not a single teacher at Sisulu-Walker Charter School, the oldest charter school in the state, ranked higher than “average.”
Teachers at the Harlem school, which was graded with a “C” by the city in 2010, were rated from a low of 3 to a high of just 48 based on a single year of data.
“It’s not good. You want your child to go to the best school. You want the teachers to be above average just like how they expect the students to be above average,” said Kevin Walters, whose 8-year-old son attends Sisulu-Walker. “I do expect better.”
Charter schools voluntarily participated in the ratings program, which launched as a pilot program in 2006 and then expanded citywide.
Comparing charter results to the rest of the city is difficult because relatively few charter teachers were rated. But one analysis shows that charters had the largest percentage of highly rated teachers — 39 percent scored a 75 or higher — than any community school district besides Brooklyn’s District 16.
The teachers union had fought the release of the ratings data in court for 17 months, charging that it was error-filled and based on faulty state tests.
GUIDE BOOK: Bronx stars Christina Varghese and Barry Price wrote this how-to book on teaching
 Indeed, the average margin of error for the rankings of charter-school teachers was 22 in math and 44 in English when multiple years of data were used, according to the DOE.
The formula for calculating charter-school rankings was slightly different from that of district schools because it excluded certain variables like student attendance and suspension rates, according to a spokesman.
As for individual instructors, former Harlem Children’s Zone Promise Academy I teacher Monica DeFabio scored the lowest among the two dozen teachers for whom there were multiple years of data.
She ranked in the 3rd percentile for fourth-grade math in 2010, but the error margin put her as high as in the 22nd. She also rated in the 5th percentile for fourth-grade English, but with an error range as high as 31 — which is considered average.
She has since moved to a Success Academy charter school, whose director, Eva Moskowitz, had nothing but praise for her.
“Monica DeFabio is one of the most talented teachers I’ve ever seen,” she said.
One of the few highly ranked teachers with more than one year of data plays the dual role of principal and instructor.
Joseph Negron, principal at KIPP Infinity Charter School in Harlem, ranked in the 99th percentile as a fifth-grade math teacher working in tandem with Angela Fascilla.
Additional reporting by Kevin Fasick

Selasa, 28 Februari 2012

Gary Rubinstein's Analysis of NYC Value-Added Data, Part 1 and 2

Analyzing Released NYC Value-Added Data Part 1

by Gary Rubinstein
The New York Times, yesterday, released the value-added data on 18,000 New York City teachers collected between 2007 and 2010.  Though teachers are irate and various newspapers, The New York Post, in particular, are gleeful, I have mixed feelings.
For sure the ‘reformers’ have won a battle and have unfairly humiliated thousands of teachers who got inaccurate poor ratings.  But I am optimistic that this will be be looked at as one of the turning points in this fight.  Up until now, independent researchers like me were unable to support all our claims about how crude a tool value-added metrics still are, though they have been around for nearly 20 years.  But with the release of the data, I have been able to test many of my suspicions about value-added.  Now I have definitive and indisputable proof which I plan to write about for at least my next five blog posts.
The tricky part about determining the accuracy of these value-added calculations is that there is nothing to compare them to.  So a teacher gets an 80 out of 100 on her value added — what does this mean?  Does it mean that the teacher would rank 80 out of 100 on some metric that took into account everything that teacher did?  As there is no way, at present, to do this, we can’t really determine if the 80 was the ‘right’ score.  All we can say is that according to this formula, this teacher got an 80 out of 100.  So what we need to ‘check’ how good of a measure these statistics are some ‘objective’ truths about teachers — I will describe three which we will see if the value-added measures support.
On The New York Times website they chose to post a limited amount of data.  They have the 2010 rating for the teacher and also the career rating for the teacher.  These two pieces of data fail to demonstrate the year-to-year variability of these value-added ratings.
I analyzed the data to see if they would agree with three things I think every person would agree upon:
1)  A teacher’s quality does not change by a huge amount in one year.  Maybe they get better or maybe they get worse, but they don’t change by that much each year.
2)  Teachers generally improve each year.  As we tweak our lessons and learn from our mistakes, we improve.  Perhaps we slow down when we are very close to retirement, but, in general, we should get better each year.
3)  A teacher in her second year is way better than that teacher was in her first year.  Anyone who taught will admit that they managed to teach way more in their second year.  Without expending so much time and energy on classroom management, and also by not having to make all lesson plans from scratch, second year teachers are significantly better than they were in their first year.
Maybe you disagree with my #2.  You may even disagree with #1, but you would have to be crazy to disagree with my #3.
Though the Times only showed the data from the 2009-2010 school year, there were actually three files released, 2009-2010, 2008-2009, and 2007-2008.  So what I did was ‘merge’ the 2010 and 2009 files.  Of the 18,000 teachers in the 2009-2010 data I found that about 13,000 of them also had ratings from 2008-2009.
Looking over the data, I found that 50% of the teachers had a 21 point ‘swing’ one way or the other.  There were even teachers who had gone up or down as much as 80 points.  The average change was 25 points.  I also noticed that 49% of the teachers got lower value-added in 2010 than they did in 2009, contrary to my experience that most teachers improve from year to year.
I made a scatter plot with each of these 13,000 teacher’s 2008-2009 score on the x-axis and their 2009-2010 score on the y-axis.  If the data was consistent, one would expect some kind of correlation with points clustered on an upward sloping line.  Instead, I got:

With a correlation coefficient of .35 (and even that is inflated, for reasons I won’t get into right now), the scatter plot shows that teachers are not consistent from year to year, contrary to my #1, nor do a good number of them go up, contrary to my #2.  (You might argue that 51% go up, which is technically ‘most,’ but I’d say you’d get about 50% with a random number generator — which is basically what this is.)
But this may not sway you since you do think a teacher’s ability can change drastically in one year and also think that teachers get stale with age so you are not surprised that about half went down.
Then I ran the data again.  This time, though I used only the 707 teachers who were first year teachers in 2008-2009 and who stayed for a second year in 2009-2010.  Just looking at the numbers, I saw that they were similar to the numbers for the whole group.  The median amount of change (one way or the other) was still 21 points.  The average change was still 25 points.  But the amazing thing which definitely proves how inaccurate these measures are, the percent of first year teachers who ‘improved’ on this metric in their second year was just 52%, contrary to what every teacher in the world knows — that nearly every second year teacher is better in her first year.  The scatter plot for teachers who were new teachers in 2008-2009 has the same characteristics of the scatter plot for all 13,000 teachers.  Just like the graph above, the x-axis is the value-added score for the first year teacher in 2008-2009 while the y-axis is the value-added score for the same teacher in her second year during 2009-2010.
Reformers beware.  I’m just getting started.
Continued in part 2 …

9 Responses

  1. Thanks for the post. Where did you access the raw data? Or did you have to request?
  2. Sean
    GR, I’m with you. Publishing the scores was puff-your-chest move. Rushing to include VAM in formal evaluations will turn the tide against a potentially promising tool. Demoralizing teachers is popular for reasons no rational person can understand. Economists and reformers can apply high levels of abstraction and little nuance into what is a complex profession.
    But you can’t put this type of research standard on VAM and then completely ignore it for current measures of quality: experience and master’s degrees.
    Take your second assumption: “Teachers generally improve each year.” For the first five years, there’s suggestive evidence. From years 6-10, a bit less. From years 10-30, close to none.
    Dan Goldhaber found nearly identical distributions of teacher quality comparing two groups: those with and those without master’s degrees.
    VAM, by comparison, is considerably more reliable. As a teacher for 15 years or whatever it is, you surely know that there’s (significant) variability in the quality of teachers. I think a better path is to let VAM breathe for a few years, let the modeling improve some, and then we’ll see.
    I ask you: in the tradeoff between type 1 (dismissing an effective teacher) and type 2 (keeping an ineffective one), which do you choose? The current system runs rampant with type 2. VAM obviously has serious potential for type 1 (and type 2).
    • Sean, what’s this with letting VAM breathe for a few years until the models improve?
      It’s not like they’re piloting this system. Starting in 2012-2013, teachers all across New York State will have 40% of their evaluations come from VAM – 20% from state tests, 20% from local tests, third party tests or the state tests measured a different way than the state measured them.
      If VAM is as unreliable as what Gary shows above, we’re going to see thousands of teachers unfairly tarred with the “ineffective” label who wind up in the NY Post with a glossy DOE-provided photo under the headline NY’S WORST TEACHERS.
      Maybe if I thought the Regents and the NYSED and Cuomo and Bloomberg and Gates and Murdoch and the rest of the so-called reformers weren’t trying to rid the system of thousands of teachers, I might trust them to implement this system fairly.
      But since I know that’s exactly what they want to do, I do not trust them or the system they want to implement.
      Given that Bloomberg is on record about wanting to fire 50% of NYC teachers, Gates thinks most teachers suck, and Merryl Tisch believes teachers are THE problem in public education (funny, she’s been a Regent for 15 years, but somehow the problems are never her fault), I think I would be a fool to trust them to fairly implement so complex and easy to manipulate a system.
      Therein lies the problem with VAM for me. I do not trust the people implementing it and it is so complex and non-transparent as it now stands that I would be a fool if I did.
      Perhaps as you say the model will improve later on.
      When that happens, we can then argue the wisdom of basing evaluations on high stakes tests.
      Until then, what we have is a poisoned and toxic environment that suggest teachers be wary of any “reforms” the powers that be want to implement, especially ones as complex as VAM. The publication of the TDR’s after the DOE offered promises that they would never do that is an exclamation point on the need for wariness.
  3. jandh
    crucial typo in conclusion
    should read “is better THAN her first year.”
  4. Rafi Nolan
    An important note that I don’t believe has been mentioned by publishing organizations (and the reason you should not expect a large jump from first to second year): Teachers with one and two years of experience are graded separately, with their percentile rankings representing their performance within the “peer group”. For first year teachers, the peer group consists only of other first year teachers, and likewise for second year teachers; as a result, the expected net improvement in percentile rankings from the first to second year would be close to zero.
    Of course this means that its highly inappropriate to compare the percentile rankings of first/second year teachers to those of more experienced teachers—it only makes sense to compare within the same level of experience (>2 years was considered as all the same level of experience for peer grouping purposes). No online databases that I have seen have noted this effect.
    Disclosure: I am one of those teachers in your sample group who saw large improvements in value-added scores from the first to second year. I appear to benefit from comparisons to other teachers in my school (for 09-10), when we were in fact not in the same comparison group.
  5. Great work, Gary. I hope others will take the same approach and expose the problems that are apparent here. And frankly, even if the models improve and there’s stronger correlations, I wouldn’t accept those correlations as proof of overall teaching efficacy. There are still too many assumptions built into the models, and too little of our work in the classroom and school accounted for in the tests.

 Analyzing Released NYC Value-Added Data Part 2

by Gary Rubinstein
In part 1[see below - Editor] I demonstrated there was little correlation between how a teacher was rated in 2009 to how that same teacher was rated in 2010.  So what can be more crazy than a teacher being rated highly effective one year and then highly ineffective the next?  How about a teacher being rated highly effective and highly ineffective IN THE SAME YEAR.
I will show in this post how exactly that happened for hundreds of teachers in 2010.  By looking at the data I noticed that of the 18,000 entries in 2010, about 6,000 were repeated names.  This is because there are two ways that one teacher can get multiple value-added ratings for the same year.
The most common way this happens is when the teacher is teaching self-contained elementary in 3rd, 4th, or 5th grade.  The students take the state test in math and in language arts and that teacher gets two different effectiveness ratings.  So a teacher might, according to the formula, ‘add’ a lot of ‘value’ when it comes to math, but ‘add’ little ‘value’ (or even ‘subtract’ value) when it comes to language arts.
To those who don’t know a lot about education (yes, I’m talking to you ‘reformers’), it might seem reasonable that a teacher can do an excellent job in math and a poor job in language arts and should not be surprising if the two scores for that teacher do not correlate.  But those who do know about teaching would expect the amount the students to learn to correlate since someone who is doing an excellent job teaching math is likely to be doing an excellent job teaching language arts since both jobs are set up by some common groundwork that benefits all learning in the class.  The teacher has good classroom management.  The teacher has helped her students to be self-motivated.  The teacher has a relationship with the families.  All these things increase the amount of learning of every subject taught.  So even if an elementary teacher is a little stronger in one subject than another, it is more about the learning environment that the teacher created than anything else.
Looking through the data I noticed teachers, like a 5th grade teacher at P.S. 196 who scored 97 out of 100 in language arts and 2 out of 100 in math.  This is with the same students in the same year!  How can a teacher be so good and so bad at the same time?  Any evaluation system in which this can happen is extremely flawed, of course, but I wanted to explore if this was a major outlier or if it was something quite common.  I ran the numbers and the results shocked me (which is pretty hard to do).  Here’s what I learned:
Out of 5,675 elementary school teachers, the average difference between the two scores was a whopping 22 points.  One out of six teachers, or approximately 17%, had a difference of 40 or more points.  One out of 25 teachers, which was 250 teachers altogether, had a difference of 60 or more points, and, believe it or not, 110 teachers, or about 2% (that’s one out of fifty!) had differences of 70 or more points.  At the risk of seeming repetitive, let me repeat that this was the same teacher, the same year, with the same kids.  Value-added was more inaccurate than I ever imagined.
I made a scatter plot of the 5,675 teachers.  On the x-axis is that teacher’s language arts score for 2010.  On the y-axis is that same teacher’s math score for 2010.  There is almost no correlation.
For people who know education, this is shocking, but there are people who probably are not convinced by my explanation that these should be more correlated if the formulas truly measured learning.  Some might think that this really just means that just like there are people who are better at math than language arts and vice versa, there are teachers who are better at teaching math than language arts and vice versa.
So I ran a different experiment for those who still aren’t convinced.  There is another scenario where a teacher got multiple ratings in the same year.  This is when a middle school math or language arts teacher teaches multiple grades in the same year.  So, for example, there is a teacher at M.S. 35 who taught 6th grade and 7th grade math.  As these scores are supposed to measure how well you advanced the kids that were in your class, regardless of their starting point, one would certainly expect a teacher to get approximately the same score on how well they taught 6th grade math and 7th grade math.  Maybe you could argue that some teachers are much better at teaching language arts than math, but it would take a lot to try to convince someone that some teachers are much better at teaching 6th grade math than 7th grade math.  But when I went to the data report for M.S. 35 I found that while this teacher scored 97 out of 100 for 6th grade math, she only scored a 6 out of 100 for 7th grade math.
Again, I investigated to see if this was just a bizarre outlier.  It wasn’t.  In fact, the spreads were even worse for teachers teaching one subject to multiple grades than they were for teaching different subjects to the same grade.
Out of 665 teachers who taught two different grade levels of the same subject in 2010, the average difference between the two scores was nearly 30 points.  One out of four teachers, or approximately 28%, had a difference of 40 or more points.  Ten percent of the teachers had differences of 60 points or more, and a full five percent had differences of 70 points or more.  When I made my scatter plot with one grade on the x-axis and the other grade on they y-axis I found that the correlation coefficient was a miniscule .24
Rather than report about these obvious ways to check how invalid these metrics are and how shameful it is that these scores have already been used in tenure decisions, or about how a similarly flawed formula will be used in the future to determine who to fire or who to give a bonus to, newspapers are treating these scores like they are meaningful.  The New York Post searched for the teacher with the lowest score and wrote an article about ‘the worst teacher in the city’ with her picture attached.  The New York Times must have felt they were taking the high-road when they did a similar thing but, instead, found the ‘best’ teachers based on these ratings.
I hope that these two experiments I ran, particularly the second one where many teachers got drastically different results teaching different grades of the same subject, will bring to life the realities of these horrible formulas.  Though error rates have been reported, the absurdity of these results should help everyone understand that we need to spread the word since calculations like these will soon be used in nearly every state.
I’ve never asked the people who read my blog to do this before since I prefer that it happen spontaneously, but I’d ask for you to spread the word about this post.  Tweet it, email it, post it on Facebook.  Whatever needs to happen for this to go ‘viral,’ I’d appreciate it.  I don’t do this for money or for personal glory.  I do it because I can’t stand when people lie and teachers, and yes those teachers’ students, get hurt because of it.  I write these posts because I can’t stand by and watch it happen anymore.  All you have to do is share it with your friends.

14 Responses

  1. amazing; I have tweeted emailed & Facebooked it. thanks!
  2. KSK
    I found the link from your first post, and put it up on Facebook. This is an outrage — thank you for doing the statistics. (I am one of the elementary teachers in your plot.)
  3. maxine turner
    Yes, the many aspects of classroom environment affect teacher performance, but I wish you or someone would discuss the lack of supports we receive from administrators and the DOE themselves.
    Disruptive behavior, usually by only a couple of students, is never dealt with. And some teachers are set up to have less progress with their students because they get the students with the most emotional and scholastic needs. Add to this a cut in services to these kinds of kids and a cut of supplies and learning materials — it’s a wonder we ever teach anything at all.
  4. Great work!  Thank you for your hard work and dedication!  
    I am a passionate elementary special education teacher.  As a teacher of special education, it is obvious that I am extremely concerned about VAM.  Here are a few thoughts from a special education teacher’s point of view.
    Developmentally, do we expect our children to grow equally in both reading and math at the same time?  It is well documented that when babies/toddlers begin to walk, their speech might decline.  When babies/toddlers begin to speak in sentences, other milestones might maintain status quo.  How can we expect our children to win both the reading AND the math “Races” in the same year?  Olympic sprinters are not expected to win the marathon, also.  
    Success in reading and/or math in school is a team approach.  In my school, I would not, could not (we are celebrating the birthday of Dr. Seuss this week) take credit for a student’s growth without acknowledging the hard work and dedication of the child, family, reading specialist, math specialist, regular education teacher, speech pathologist, OT, lunch server, recess supervisor, secretary, principal, parent volunteers, school custodian, etc.  How can 1 teacher be measured for 1 child’s success?  
    When standardized tests are given in the fall of the school year, how can the current teacher who has worked with the child for approximately 6 weeks take credit for the hard work and dedication of the team with whom worked with the child the previous year?  
  5. syoung
    I’m a retired teacher living on Vancouver Island, British Columbia. You reached me – so I’m hoping your incredible work will spread far and wide. Thanks for taking the time.
  6. Hi Gary, maybe I’m just looking in the wrong places, but I can’t seem to find a way to download the entire dataset. Could you give the link that you used? (I’m sure others would appreciate the same.) Thanks.
  7. This reminds me of a scatter plot my sister used in her masters thesis defense that showed no correlation – she connected the dots to make a picture of a donkey! It got a big laugh. Shared, and will disseminate. Thanks for the hard work on this, it’s so valuable.
  8. Ditto on the link. It’s really frustrating trying to find the data.
    And those R^2 values….damn.
  9. Tom
    You failed to run what I think would be the most obvious relationship, that between one years scores and the next.
    As a test run, I looked at just 4th grade math teachers in the 08-09 and 09-10 years. The correlation coefficient for this relationship turned out to be .44 , which, for one variable in the social sciences, would be considered quite high.
    What this suggests is that, while one-year of results should be taken with a grain of salt, after 3 or 4 years of data these numbers will become quite significant.
    While I do agree that no good comes from publishing this data (in particular single year scores), I think you too easily dismiss their usefulness in evaluating teachers over multi-year periods.
    Публичнымый
  10. Tom
    oops! I see you did that now in pt. 1. I would still argue that the .35 coefficient you found is high enough to draw conclusions from over a multi-year period.
    One other point that I think is getting missed here is the desire of the NY DOE to release these scores. As far as I can tell, they were forced to via the Freedom of Information act.