In the previous post I looked closely at the data on teacher effectiveness for six different high schools. In this follow-up post we look more closely at a range of data from two successful schools and one school identified in the Cuomo Report on Failing Schools. Finally, I draw some important conclusions and recommendations from the lessons of the data.
A closer look at two successful schools and one unsuccessful school in NYC. It appears that teacher effectiveness ratings are not valid measures in light of other available data, as the data showed in the previous post. Some of the better schools in NY (as measured by graduation rates and exam scores) have lower teacher effectiveness scores than some of the most struggling high schools in the state – sometimes greatly so.
In this post I want to dig deeper via additional data that is available for high schools in New York City to show that parent, student, and teacher survey data: School Quality Reports; and narratives from site visits further support the notion that the teacher effectiveness ratings are likely not accurate, especially in struggling schools.
Data on a successful high school. The first results that made this clear were presented in the previous post. School #3 had internal ratings of teacher effectiveness that were much lower than those in most other schools in NY – and especially School #5, which is a struggling school but rated all its teacher higher than most other high schools in NYS. Yet, 100% of School #3 students pass the Regents English Exam and there is a graduation rate of 91%, with a population that is 51% Black and Hispanic. Nor is it one of the well-known highly-selective schools.
I happen to know School #3. I have worked there, given feedback to staff and many teachers have been to our UbD workshops. I have seen excellent teaching at this school: challenging work, lots of interaction, and respect for kids. I know the people doing the supervising and the ratings: they have high standards about the right things academically and pedagogically.
My impressions can be more objectively supported by other available NYC data.
The bulk of the data below comes from the New York City School Survey Report for 2013-14. Data exists there from surveys of parents, teachers, and students. (In the data, blue-colored data comes from parents, orange-colored data comes from teachers, and maroon-colored data comes from students.) I chose the same 9 pieces of data for each school and list it below. The data for each school is listed in the same order:

  1. Overall Survey Results
  2. Parent Satisfaction
  3. Student Satisfaction
  4. Teacher Views on Improvement
  5. Student Views on Improvement
  6. Teacher views of school culture
  7. Student views of school culture
  8. Student account of instructional approaches
  9. Selected quotes from the school Quality Review Report, based on site visits

A Successful high school (School #3 from the last blog post).

  1. Overall Survey Results

Schl of Future Overall Survey Data

  1. Parent Satisfaction

Schl of Future Parent Satisfaction

  1. Student Satisfaction

Schl of Future Studnts Ins Core Survey

  1. Teacher Views on Improvement

 Schl of Future Tchr Core Survey

  1. Student Views on Improvement

Schl of Future Students Improve

  1. Teacher views of school culture

 Schl of Future - TCHR - Culture
 

  1. Student views of school culture

 SCHL OF FUTURE Stdt Schl Culture

  1. Student account of instructional approaches

Schl of Future - Stdts - Pedagogy 

  1. Selected quotes from the school Quality Review Report, based on site visits

Impact

The school’s curricula is coherent and engages all students in higher-level thinking that promotes college and career readiness for all learners.

Supporting Evidence

The curriculum supports metacognition and student empowerment. Literacy is taught across the curriculum with claim-based writing throughout all units of study. Grade wide argument writing yearlong plans delineate research based argument units throughout science, social studies and English classes. They include on demand tasks and performance based assessments.

All lessons have a clear aim that names the purpose, product and technique to be modeled and practiced. Charts and tools are cited in each unit to support independent thinking and practice. Tasks are engaging and include questions that emphasize accountable-talk discussions. All units begin with a launch that includes an engaging debatable question or an interesting problem or conflict. Tasks require students to make connections, explore and experience. In the molecular genetics unit, students solve crimes by looking at evidence. Students compare the STR length between the crime scene DNA sample and the suspects to identify the assassin in the scenario.

All units are structured and include an objectives, essential questions, enduring understandings, skills criteria, vocabulary development, discussion, practice and assessment. The thematic lens of the Imperialism/Colonization in India unit emphasizes student’s deep understanding of the age of imperialism and industrialism and focuses on independence and non-violent resistance.

Impact

Classes were marked by high-level questioning, meaningful student work products and student to student discussion for most learners to accelerate student learning.

Supporting Evidence

 Teachers’ goal set around the Danielson Framework, identifying goals that are grounded in school wide practices noted in the school teaching toolkit. These include techniques for asking questions that promote student thinking and give time to observe student understanding. Across classrooms visited, teachers used noted techniques such as turn and talk, think time, stop and jot, pass off, conferencing and questioning to promote metacognition and student empowerment.

 Focus questions were asked in all classrooms which supported student’s conceptual understanding. Students were also involved in structured discussions. Students were prepared with notes, texts from their research. Students developed their own questions, made connections, used evidence to support their claims and analyzed information.

 Flexible grouping and partnerships supported differentiation. Students were placed in guided groups to reteach skills or to provide extension activities to higher level students. Groups received different assignment sheets with tiered questions based on assessments. In the global history class, students examine primary and secondary sources to point out the author’s argument on imperialism.

Findings

The school leader communicates high expectations to the entire staff, students and parents and has created a culture of mutual accountability for these expectations with clear messaging and appropriate supports.

Impact

There is a coherent and thoughtful vision of school improvement replete with achievable goals and clear interim checkpoints to ensure the success of all students at the college and/or career level.

Supporting Evidence

 The principal sets clear expectations to teachers around professional responsibilities and implementing Danielson’s Framework for Teaching. There is a clear culture of high learning expectations for all students to ensure that they own their educational experience and are prepared for the next level. Students participate in advisory goal-setting conferences.

 During teacher team meetings, staff communicated a unified set of high expectations for all students and provides clear and effective feedback to students. Teachers also provide support to parents in understanding the expectations which include requiring four years of math and science for graduation. Additional requirements include “exhibitions” in 11th and 12th grade, work internships, college classes in 11th and 12th grade and attendance at college trips and Family college nights.

Findings

All teachers are involved in focused team sessions that collaboratively analyze classroom practice, assessment data, student work, and curricular products for the purpose of making thoughtful adjustments to teacher practice to increase student achievement and ensure Common Core alignment.

Impact

Teacher teams have a clear instructional focus supported by professional structures and protocols. This collaboration results in shared improvements in teacher practice and increased student learning. The inquiry based teacher teams’ model distributive leadership and builds a sense of communal ownership for student success.

Supporting Evidence

 The school has solid infrastructures of distributive leadership that promotes effective collaboration and collective monitoring of student progress. During weekly grade level team meetings, teachers’ action plan around students of concern and review student achievement data . Grade level inquiry team data meetings are also held three times a year based on school wide assessment. School-wide data teams comprised of grade team leaders cull and prepare data for inquiry teams. Monthly department-head led meetings are held to share investigations and to plan out school-wide initiatives. Weekly youth development team meetings include college guidance discussions.

A struggling high school. By contrast, a struggling NY City high school that I also know (having worked there twice), is on the Cuomo list of “failing” schools. Here are its survey and quality report data:

  1. Overall Survey Data (and comparative graduation data)

Lehmann Overall Surveys

  1. Parent Satisfaction

Lehmann Parents Satisfctn

  1. Student Satisfaction

Lehmann Stdts - Excited

  1. Teacher Views on Improvement

 Lehmann Tchr Improve

  1. Student Views on Improvement

Lehman Stdts - Improve

  1. Teacher views of school culture

 Lehmann TCHR Culture
 

  1. Student views of school culture

Lehman Students Scl Culture 

  1. Student account of instructional approaches

Lehmann Students Pedagogy 
It is noteworthy that the teachers in this school say that these instructional approaches happen “Often” – a student/teacher disconnect that does not occur at the other two schools discussed here.

  1. Selected quotes from the school Quality Review Report, based on site visits

What the school does well

  • The principal organizes resources and time in order to support instructional goals and increase student outcomes from a social-emotional and academic perspective. (1.3)
  • The school uses various assessment practices to analyze student performance, target instruction, and provide students with feedback in order to increase student achievement and academic progress over time. (2.2)
    • The principal expects that all teachers will utilize the school-wide grading policy, more frequent formative assessment strategies, including exit slips, written reflection, and the use of rubrics.
    • Teacher teams meet weekly in each small learning community (SLC) to analyze and discuss student data, construct item analysis and determine where students have gaps in instruction.

What the school needs to improve

  • Increase the alignment of curricula across grades and content areas to Common Core Learning Standards, and refine units of study in order to increase rigor in tasks to advance the post-secondary readiness of all learners. (1.1)
    • With the support of a curriculum director/assistant principal, the school continues to work with teacher teams on aligning curricula to Common Core Learning Standards and to further develop unit plans and lesson plan templates to effectively support students, yet this process of curricula refinement is inconsistently documented and only beginning to emerge in the math and social studies curricula. Additionally, the progress made in curriculum development is not being accurately communicated to the principal in a timely manner.
  • Deepen academic rigor by consistently designing challenging tasks and utilizing effective questioning that elicits higher-order thinking and extends learning for students on all levels. (1.2)
  • The principal believes that students learn best when they are given the opportunity to delve deeper into rigorous content, engage in student-centered instruction, collaborate and discuss evidence and viewpoints with their peers, and reflect on the process and learning. However, these practices as evidenced in classroom observations, are not being consistently implemented, as the majority of instruction observed was teacher-directed with many tasks not appropriately challenging, with an absence of text-based discussion and lack of conceptual understanding.For example, in an English class, the students’ response to the teacher when asked to analyze the influence of X in the text Y, was met with disengagement from most. One student stated, “We have been reading this same story over and over again and we need a new story”, while other students ignored the teacher and began talking about things unrelated to class.
  • In addition to the breakdown of classroom management, the lesson lacked rigor, directions for students to follow, and an ultimate objective, leaving the teacher scrambling to gain control and students not engaged in any meaningful learning.
  • In a history class, teachers asked students to annotate, but did not hold them accountable for what they annotated. Students were asked leading and lower-level questions which resulted in the teacher answering the questions rather than allowing students to engage in productive struggle.
  • In another class, the teacher designed a lesson using the lesson plan template the school devised, yet none of what was written in the plan was executed preventing students from interacting at high levels and from multiple entry points, which eventually caused the lesson to fall apart.
  • Out of 11 classes observed, there were only three where students were frequently asked to explain their answers. Furthermore, differentiation and multiple entry points for a variety of learners were not observed anywhere, with the majority of lessons requiring all of the students to do the same work.
  • The result is that across classrooms, not all students are consistently provided with the opportunity to engage in higher-level thinking or reflection, which is evident from low-level discussions and quality of student work products.
  • Enhance the monitoring of curriculum development and teacher team practices to ensure that teachers are effectively meeting the learning needs of all students as they work to meet the expectations of the Common Core. (5.1)
  • The principal monitors student progress and assesses teacher instructional practices across grades and content areas to ensure coherence and to ensure teachers participate in ongoing inter-visitations to learn best practices from each other. However, while there is student work posted with tasks and rubrics that align to Common Core expectations, there is insufficient evidence that academies regularly revise and modify curriculum plans to ensure that the learning needs of all students are being planned for, resulting in only some students being prepared to meet the expectations of the Common Core.
  • While there is a system in place for teacher teams to meet weekly, there is no evidence that the school has an accountability structure in place to regularly evaluate and adjust the SLC’s inquiry team practices and monitor the connection between the work they engage in during team meetings and the alignment to school goals.

What are the teacher effectiveness ratings for this school? Not one teacher is deemed Ineffective and only 5 are Developing., 90% are Effective and 5% are highly Effective.
Yet, note, above (under #9) that the only positive comments in the site report referred not to teaching but to actions taken by leadership: there are no positive comments about “what the school does well” related to teaching per se. (In fact, despite the positive site review of the school leadership, the teachers strongly dislike the work of the Principal and other administrators, as visible in other teacher survey data.)
A third comparison. Finally, I went looking for another New York City high school that had the same demographic profile as the struggling high school, above, but had nonetheless earned good ratings for making adequate improvement on their graduation rates and test scores. (Recall that school accountability in NY is based on improvement, not absolute scores and graduation rates, so schools with modest absolute scores can still be top-rated in New York. Put differently, the struggling school above failed to make adequate progress in the benchmark data of graduation rates and exam scores for 10 years.)
This “effective” school has the same demographics as the struggling school I cited above (more than 90% Black and Hispanic) and is located in Harlem.

  1. Overall Survey Results

HRHS Overall Survey Data

  1. Parent Satisfaction

HRHS Parent Satisfctn

  1. Student Satisfaction

HRHS Stdt Core Excited Learning

  1. Teacher Views on Improvement

 HRHS Tchr Improve Survey

  1. Student Views on Improvement

HRHS Stdt Improvve

  1. Teacher views of school culture

 HRHS Tchr Culture
 

  1. Student views of school culture

 HRHS Stdts Schl Culture

  1. Student account of instructional approaches

 HRHS Stdts - Pedagogy

  1. Selected data and quotes from the school Quality Review Report, based on site visits

What the school does well

  • School wide instructional practices are coherent and reflect a common belief to support all learners, thus resulting in greater student participation and ownership. (1.2)
  • Across a vast majority of classrooms, instruction is driven by the shared belief that students learn best by organizing for effort, having clear expectations in order to master core concepts, and using knowledge actively. Moreover, in the vast majority of classrooms, students were able to articulate the purpose behind each lesson, unit and the intended learning outcomes.
  • Administrators and teachers strategically use a variety of measures of student learning data and common assessments to adjust instruction and ensure increased academic achievement. (2.2)
  • All staff members review student work against agreed-upon and established proficiency rubrics. During the interview, students claimed that the use of common rubrics across classrooms has provided them with feedback that leads to clear next steps which has led to improved outcomes and an increase in successfully completing required coursework as evidenced by the increase in percentage of students completing the necessary credits and coursework to graduate, meet college eligibility requirements and/or enter work force. Departmental teams meet to further refine student tasks, including rubrics, and modify unit plans to ensure Common Core Learning Standards (CCLS) alignment, which are accessible to all students.
  • Across the school, teacher teams engage in well-structured inquiry-based professional collaborations to analyze data in order to strengthen instructional capacity resulting in improved student outcomes. (4.2)
  • Teacher teams work together to support a “three week,” collaborative professional development cycle which involves unit planning, working with outside staff developers and a lead teacher, participating in a “lesson study,” and looking at student work. For example, all teacher teams begin by identifying gaps in student understanding from the previous unit in order to address and plan. The members of the team follow up by observing each other teach a planned lesson from the planned unit, and then provide verbal and written feedback on the observed team member’s instruction. In addition, teachers analyze student work from the observed lessons during both content area and inter-departmental teacher collaborative inquiry.

What the school needs to improve:

  • Strengthen school-wide curricula to include rigorous and engaging tasks, and strategically align the instructional shifts to support college and career readiness for diverse learners. (1.1)
    • The school has integrated a “backwards” unit-planning template that is used across all subject areas. A review of lesson plans reflects daily aims to unpack the purpose of each lesson and the daily learning objective for students. Each lesson fits within a unit of study towards a culminating task and includes opportunity for peer-to-peer discussion.
    • Furthermore, lessons included close readings and text analysis and/or problem-based learning activity. However, written prompts and assignments embedded within the lesson plan reflected mostly “yes” or “no” questions.
    • While the school ensures that the units of study and lesson plans in social studies and science as well as ELA include the use of primary resources and a variety of content area texts such as magazine covers, advertisements and current newspaper articles to engage students and promote high levels of thinking, the strategic integration of planned student activities and tasks to include increased Depth of Knowledge (DOK) questions to raise student levels of thinking from content knowledge to analysis across all classrooms is limited. As a result, the goal to promote all students’ facility with an assortment of complex texts, academic vocabulary and speaking, listening and language needed for college and career readiness is limited and impacts increasing student achievement.

By the way, here are the teacher effectiveness ratings for this school: almost identical to that of the struggling high school:
Screen Shot 2015-03-12 at 7.18.07 PM 
[I think the data make pretty clear that the struggling school is indeed struggling and that the teacher effectiveness ratings are simply not credible.]
My tentative conclusions, from all the data of both blog posts.

  1. Local and state teacher effectiveness ratings are flawed if, as my sample suggests, teachers in less effective high schools generally get high ratings, and often get much higher ratings than those in more successful high schools (especially if they have been struggling to improve for up to 10 years).
  1. The combination of survey data and Quality Review site visits presents a fairly consistent and credible picture of the relative strengths and weaknesses of schools – but these rarely align with teacher effectiveness ratings in either successful or unsuccessful schools.
  2. This disconnect of Ratings vs. Review and Achievement data can only be reduced when there are exemplars of teaching to anchor the ratings, supported by calibration meetings across schools in a district. Without such exemplars, the teacher rating system will be based only on building-level norms (and politics) – and thus as flawed and non-standards-based as letter grades given to students. Like grades, in other words, it appears that teacher effectiveness in schools is being scored on a curve (at best), rather than against any credible standard. (There is credible data to show that the typical % of employees in all walks of life whose work is judged to be ineffective is around 8%; in all NY schools the average is 1%, as the data show).NYSED (and/or each district) needs to rectify the lack of official models and calibration protocols.At the very least, the state should say that teacher ratings of 100% effective in ANY school are of questionable validity on their face, and that extra justification for such ratings should be provided). Beyond that audit indicator, NYSED might require schools to provide 2-3 samples of video excerpts for effective teachers to be sure that there is calibration across the state (as is typically done in the AP and IB programs with scored student work, and on writing assessments in the state). The Teaching Channel videos could easily jumpstart such a process as could videos that exist in other states such as California.
  1. For “rating inflation” to be lessened, there must be far better incentives for all administrators to be more accurate and honest in rating teachers (and teachers, in proposing credible value-added SLO measures). I have heard from Principal and Supervisor friends of mine that the politics of school climate and the possible counter-weight of exam scores entices everyone to rate teachers higher than they truly believe is warranted.Perhaps, for example, schools should only be required to give aggregate “standards-based” ratings separate from the more delicate teacher-by-teacher “local norm” ratings. At least there might be an independent check on local rating standards by NYSED or via regional peer meetings chaired by BOCES staff, in the same way that IB teachers have their student grades “moderated” by the external reviewers of exams.
  1. It is high time we made a critical examination of the wisdom of ranking each of the four dimensions in the Danielson Framework equally.

As it stands now, it is quite possible for a teacher in any school to do well on the three non-teaching dimensions of the four, do poorly on the teaching dimension, but still get a good “teacher effectiveness” score. That seems like a highly misleading way to rate “teacher effectiveness” (even as we should, of course, value planning, professionalism, and community relations).

  1. Good schools improve and improvement is possible in all kinds of schools serving all kinds of students. Despite what many anti-reformers endlessly argue, from this data it is incorrect (and very fatalistic) to say that poverty plus ethnicity means ongoing adequate school and teacher improvement is impossible, as I have long argued that the data show. Furthermore, there are outlier schools in this and every other state that cause successful absolute levels of student achievement in spite of the SES-related obstacles. (See, for example). Alas, the most successful such Charter schools in New York are not represented in the NYSED accountability data resources cited in this post.

A final thought. Do I think that Governor Cuomo is up to some crass politics in his report? I do.
Do I think that this is a hard time to be an educator? I do – so much harder than when I was a full-time teacher (where we pretty much left alone, for good and for ill).
Yet, I also believe that until we get an honest and credible accounting of teacher effectiveness in all schools (and especially in struggling ones) we will perform a great disservice to kids in need – and, yes, to their teachers who deserve more accurate feedback than many now receive.
POSTSCRIPT: Numerous people tweeted back after the previous post that many NYS struggling schools are under-funded and that far too much was thus being asked of them. While I agree that many schools are under-funded and that teachers in these schools face very difficult barriers, it seems odd that everyone who responded with this argument failed to engage with the teacher effectiveness data I provided. Many also said that without better funding and other kinds of state support, improvement in such schools is “impossible.” The data simply do not support this fatalistic conclusion, nor does my work in New York City and Toledo where we, too, have seen solid gains through UbD training and implementation in previously struggling schools.
Let’s get on with improving the schools our students currently attend, doing what is in our control to improve them.

Categories:

Tags:

26 Responses

  1. How frequent (or infrequent) are the teachers observed in the schools? Are they announced or unannounced observations? More to the point: are the schools using the same system in the same fashion? You allude to the need to examine the weighting of the Danielson system. Perhaps other components of the evaluation system need to be examined as well–namely how often teachers and administrators are observed. However, I’m going to guess that these details won’t change the point you are making. Thanks for this…it is very thought provoking.

    • The schools clearly do things differently, school by school. I know of one school where teachers are observed multiple times each year; in another, once, briefly. But I agree with your last point: I don’t think the amount of visits make a difference bigger than the tendency to pump up the ratings.

  2. The teachers are probably all getting high ratings because the principals find it difficult to staff the schools. That is, they might not be the best, but they are the best that will work in the school, which is often the case with high poverty kids who don’t care. Besides, if you’re a hardworking teacher who tries to engage kids who don’t care AND you get a bad rating, why bother? Go somewhere that with an administrator who looks for how many kids you engage.
    And some of the criteria is ridiculous. I most certainly do not “hold my students responsible” for annotating. Nor should any school be required to differentiate, or give students “multiple entry points”. That’s a philosophical belief about teaching, not an absolute requirement.
    I think there’s a way to reasonably evaluate teachers who work with difficult kids, something better than just giving them a pass. But I wouldn’t really want to work at a school that was judged well using the given criteria. It sounds unpleasant.

    • I agree that the challenge of staffing is a factor. But working with challenging kids is not really the key issue: there is precious little discussion and modeling to help teachers and supervisors know what “engagement” or “rigor” looks like in their setting, regardless of what its challenges are. All parties need to be clearer on what counts as “effective teaching.”

      • But can that happen, if the school doesn’t have a critical mass of students who want to learn? No, let me put it another way: I think that most students can be reached, and that some teachers are better at reaching students than others. I don’t think that makes the other teachers bad. But I do think that schools need to prioritize getting a critical mass of kids who can behave and are willing to at least go through the motions to get out of school, or all the discussion and modeling won’t help the school get past the feeling that it’s drowning.

        • I don’t think it makes the other teachers “bad” but surely, then, we can rate them “developing” instead of “effective”. The whole thing needs to be more honest if we’re going to serve needy/challenging kids – that’s my only point. If limited-effective teachers think they are good enough, nothing changes…

  3. Rather than just evaluation, I wish all the observations were used to feed into coaching sessions like the ones I see in many Teaching Channel and Inside Mathematics videos. We teachers are like the students: when everything is summative, we don’t like the atmosphere.
    Question: At the end of your blog, did you leave out the example in #6 (“(See, for example, )”)?

  4. As a NYS resident and former educator, I’m bothered by two things: the test score as a component of the the teacher ratings AND poor evaluation skills in admistrators. My concern is that with the test score weighted highly teachers will focus on test prep rather than instructional improvement in an effort to improve the ratings (for both the school and individual teacher). Local measures will come to value teachers who focus on raising test scores rather than those who are working to provide quality instruction through less ‘drill and kill’ methods.
    I have grave concerns about the administrators in both failing and mediocre schools. I have come accross few building level administrators who are well versed in what good instruction would look like. They can say the right buzz words, but when it comes to looking for them in the classroom I’m not sold they know what to look for. I know that my evals as a young teacher were all great but I also know that I wasn’t that good.
    Finally, I think reasonable political discourse is going to be nearly impossible here. Teachers feel attacked by Cuomo, Cuomo/Regents don’t seem interested in changing the model to promote teacher coaching (everything points to test scores and only test scores, on tests that are not released and are scored by Kelly Temps), and parents are bombarded by Anti-Common Core propaganda without developing any understanding of the differences between standards and stardardized tests, or how VAM works, or what good teaching looks like.

    • That’s why I called for exemplars (and training in the exemplars) to ensure that admins know what we’re looking for. And they need more training in the kind of actionable feedback I have elsewhere written about.

  5. Thanks for the post. Very insightful, but really not too surprising.
    It stands to reason that effective schools with solid leadership would rate their teachers lower (or honestly) in the interest of giving actionable feedback to teachers. I would imagine that the successful schools with low teacher-effectiveness ratings also find ways to coach teachers and help them grow and improve
    While on the other hand, struggling schools, may struggle because there is no clear picture of what good teaching looks like. This is probably an over generalization – but it seems a case of the blind leading the blind. You have administrators who may not understand good instruction evaluating teachers who may not understand good instruction and you get inflated evaluations and low test scores.
    For me the solution to improving schools doesn’t necessarily revolve around the actual categorization of teachers into “effective” or “developing” but to provide the time and resources for teachers to improve wherever they fall. To put more simply. It’s not the result of the evaluation that really matters, but what we do with the result that will impact teaching.
    That is why I think schools who rate teachers lower end up with better results. Those are the schools who are, more than likely, doing more to provide meaningful professional development to teachers.

  6. I’ll reblog this on my blog and let you know if I receive any comments. In my last full time teaching position, I was formally observed once a year by our Principal but it seemed that I was casually observed almost every day by Mentors, A.P., Title 1 Specialists, etc. We also had to submit all our detailed lesson plans by paper or electronically each week. I am totally convinced that family dynamics play a role in student academic successes, and that more money spent for students may not solve all our problems.

  7. Reblogged this on How can I control my class? and commented:
    Let’s get on with improving the schools our students currently attend, doing what is in our control to improve them. Please read this observation analysis of effective schools in NY vs. the teacher evaluation results; i.e. high teacher evaluations doesn’t equal high student test results and graduation rates.

  8. I agree with your views of what good teaching should look like. But, by high school, a long history is brought into the building. (not to mention the long history brought to elementary schools) I’m asking – of course you know this better than I – not debating, but if the top teachers in the successful schools were sent as replacements into the unsuccessful schools, would they soon adopt the teacher-directed approach that is the norm in the unsuccessful school?

    • John, to me this is the 64,000 dollar question: what would happen if some of the district’s top teachers were assigned to a struggling school? With district leadership & willpower, co-operative unions, and valid teacher ratings this might make a huge existence proof of possibility. But think of all the pieces that have to line up. But, yes, my hunch is that, say a team of top-flight English teachers could make a considerable difference. It is a worthy and important experiment, regardless of outcome.

      • The value-added gamble was done in lieu of that worthy experiment, when nothing that dangerous should have been contemplated without first conducting those experiments.
        I’d support such a pilot, as long as transferring was voluntary. But, such an experiment would say little unless such transfers were involuntary, so we have a conundrum. (to my knowledge, in voluntary experiment the numbers of teachers who agreed to transfer was way too small to make a difference)
        If I recall, in the rare cases when such transfers were involuntary, the top teachers resigned and moved to the suburbs.
        I suspect we will find that people have basic personality preferences, and the personalities that make for great teachers in great schools are different than those who choose struggling schools (I also suspect that there’s a big difference between the personalities of the people who choose high school v elementary and different subjects.)
        The better approach is to make teaching in high-challenge schools a team effort, and recruit a second shift of educators with all sorts of different personalities and backgrounds to provide the socio-emotional supports.
        Also, why did the teachers in the struggling school reject the principal’s approach? Was there disagreement over discipline? Was there dubiousness over curriculum alignment? (although you, like most of my colleagues, will probably disagree on this, I don’t see curriculum alignment as viable in high school; getting high school kids with, say, 6th grade skills to master grade level concepts, I believe, is an art not something amenable to prescribed pacing.)

        • I agree, John: the transfer has to be voluntary. I think there would be more takers if –
          1) the time-frame was limited to 2-3 years and seniority at their old school was not lost.
          2) 4-person teams, not individuals only, were hired
          3) additional stipends were paid
          4) a secretary or other resource person was provided
          5) the teams had leeway to alter/personalize grouping, curriculum, pacing to accommodate the (large) differences in ability. (So, in fact, I don’t disagree with you IF and ONLY IF there are agreed-upon priority outcomes, e.g. literacy targets and argumentation from Common Core.
          6) If the whole thing were set up as an RFP in which teachers also had to partner with other services and/or funders.

  9. Thank you for this analysis. While I tend to agree with your conclusions, and I’m satisfied that you point to flaws in the evaluation system, I’m concerned with the mixing of terms and misunderstanding across our state of how teacher scores are created from district to district.
    In your previous blog post, we saw bar graphs representing Composite, State Growth and Other Comparisons, Locally Selected, and Other Measures. You point to the last two being the most salient because they are based on “locally-assigned ratings by administrators and on locally-developed growth measures proposed by teachers.”
    My understanding of high school teacher performance data is that there are no state-provided growth measures for teachers in grades 9-12. That is, the state-provided growth scores are only provided to teachers of math and ELA in grades 4-8 who have enough student scores (16) to derive a value. While high school principals receive a growth score based on growth in math and ELA and total number of regents passed, high school teachers’ “State Growth” is actually “Other Comparisons” (i.e., SLO’s).
    To further complicate the issue, if we had examined elementary or middle schools and their State Growth scores for teachers, we’d need to acknowledge that PE, social studies, technology, art, music, science, and all other teachers who don’t teach 4-8 ELA and math have scores based on “Other Comparisons.”
    So, in high schools, the State Growth measure is not derived from the state, but is just as subject to the issues in Conclusion #4 above as Locally Selected measures. For elementary and middle schools, the State Growth measures we see are actually a mix of state-provided growth scores and SLOs with every school having a different ratio of state-to-local as part of the recipe.
    To complicate it even further, some schools confuse the concepts of growth and achievement and may utilize achievement measures in their State Growth measures as well as in their Locally Selected measures. There is also a tendency to confuse an increase in the numbers of students reaching proficiency (a 3 or above in 3-8) as growth. While it’s true an increase in students achieving proficiency demonstrates improved achievement for the school, this does not necessarily mean that teacher growth scores improved.
    While I acknowledge your greater argument when it comes to the observation portion of the system confronts the lack of calibration among raters, another flaw in the system relates to district choice of teacher effectiveness rubrics. Conclusion #5 is spot on, but doesn’t acknowledge the myriad rubrics available for districts to choose from and the implications of consistency not only among teacher evaluators, but that not all rubrics are created equally.
    I appreciate this post, and support many of the findings within it. I felt compelled to point out additional inconsistencies in the system to support your final thought: “Yet, I also believe that until we get an honest and credible accounting of teacher effectiveness in all schools (and especially in struggling ones) we will perform a great disservice to kids in need – and, yes, to their teachers who deserve more accurate feedback than many now receive.”
    In short, there are so many inconsistencies in the teacher evaluation system, that honesty and credibility become lost because many educators misunderstand what ratings mean and no two schools utilize the same data to derive and support the ratings.

    • Thanks for all these clarifications. I was unaware of the issue of where SLO’s fit in HS and unaware of growth scores – which makes me wonder: what is the basis for the state score in high schools, then?
      I am aware of the many rubrics in use, though I have not seen any discussion of what a valid weighting might or might not be as far as NYSED is concerned, which is really my point. If districts can choose any rubric and any weighting then the system is even more questionable as a unitary rating of “teacher effectiveness”
      Thanks again for taking the time to enlighten me!

      • I agree with your greater point, and felt it was important to add to it by illustrating that the other components of the system are also subject to a large amount of local control.
        As for your question about the basis of the state score in high schools, this brings about the issue of two different measures used for school success versus teacher effectiveness.
        Schools are rated based on an achievement model which measures how many students reach specified levels of proficiency. For example, students passing regents (65 or above), students reaching mastery on regents (85 or above), and how many students graduate within four years after entering 9th grade. The only concept of “growth” here is getting more students to pass each test from one year to the next based on their performance index to meet AYP.
        The State Growth and Other Comparisons in high school is based on a concept of individual growth on an SLO. Strictly speaking, this measure of growth is not concerned with the number of students passimg, but the number of students improving over the period of a course based on a teacher-selected target for each student.
        Aside from schools being measured on achievement and teachers being measured on growth and achievement (depending on what was approved in the individual APPR), school report cards only show Regents and graduation results, but the State Growth and Other Comparisons may or may not include growth on the Regents and also includes the SLO scores of every teacher in the school (art, PE, non-regents core subject courses, electives, etc.).
        So, schools are measured on achievement on regents and graduation, but the teacher effectiveness ratings are based on all teachers of all subjects and their course-specific SLOs based on growth (for the state 20%) and possibly achievement (for the local 20%). If that’s not confusing enough, some regents growth may not even be included in high school teacher effectiveness ratings because teachers only create SLOs for 50% of their students starting with their largest course sections. In my experience, it’s not uncommon for a teacher to have his or her regents courses left out of the SLO process because those courses didn’t represent the biggest classes to reach 50% of students.
        I don’t mean to keep going on, and I don’t mean to say any of these measures are unuseful on their own, but the inconsistency of their application, the comparison of measures that don’t necessarily include the same source data, and the overall confusion of the meanings of any of these measures is problematic.
        All this to say there’s even more confusion and inconsistency than most people are aware of.

  10. The study in this EdWeek article addresses some of the points raised in your post and the comments.
    The study found that value added scores were similar at low & high poverty schools in FL & NC. The value added scores were more similar at the high end than the low end as one would probably expect.
    An interesting aspect of the study that I have not seen before was an analysis of teachers that switched between high and low poverty schools. There was very little difference in the average scores before and after the switches. In other words, they did not find evidence that these particular value added formulas favored low or high poverty schools.
    Not a big fan of value added scores, but must confess that I would be interested in seeing the average score for NY grade 4-8 teachers in “struggling” schools with state assigned ratings for Math or English. Also would like to see an analysis of the change in scores for NY teachers moving in and out of “struggling” schools.

  11. With all these questions about the validity of implementation of teacher evaluation systems, I am wondering what the research says about merit pay systems for teachers, especially the impact of such systems on improved student achievement.

    • It’s of course a very important question moving forward. As you know, many teachers and most unions are completely opposed to such a plan. However, there were some very interesting TEAM merit pay plans that have been tried out – even endorsed by local unions.
      I don’t consider myself knowledgeable enough on that subject to know what the research says. Readers?

Leave a Reply

Your email address will not be published. Required fields are marked *