Cypress Testing Framework Conversation
Jill Farley, Ken Rickard, and Byron Duvall discuss their experiences with the Cypress front-end testing framework.
Podcast Links
Cypress.io
Transcript
George DeMet:Hello and welcome to Plus Plus, the podcast from Palantir.net where we discuss what’s new and interesting in the world of open source technologies and agile methodologies. I’m your host, George DeMet.Today, we’d like to bring you a conversation between Jill Farley, Ken Rickard, and Byron Duvall about the Cypress front end testing framework. Cypress is a tool that web developers use to catch potential bugs during the development process. It’s one of the ways we can ensure that we’re building quality products that meet our client’s needs and requirements.
So, even if you aren’t immersed in the world of automated testing, this conversation is well worth a listen. Without further ado, take it away, Jill, Ken, and Byron.
Jill Farley:
Hi, I'm Jill Farley from Palantir.net. I'm a senior web strategist and UX architect. Today, I'll be discussing Cypress testing with two of my colleagues. I'll let them introduce themselves, and then we can have a relaxed conversation about it.
Ken Rickard:
I'm Ken Rickard. I'm senior director of consulting here at Palantir.net.
Byron Duvall:
And I am Byron Duval. I'm a technical architect and senior engineer at Palantir.net.
Jill Farley:
Well, thanks for sitting down and talking with me today, you guys. We are going to maybe just start this off for anybody who doesn't know what Cypress automated testing is with a quick, maybe less technical overview of it. So, Ken, what are Cypress tests in 60 seconds?
Ken Rickard:
In 60 seconds, Cypress is a testing framework that is used to monitor the behavior of a website or app in real time within the browser, so it will let you set up test scenarios and record them and replay them so that you can guarantee that your application is doing what. You expect it to do when a user clicks on the big red button.
Jill Farley:
I did not time you, but that was brief enough. To give a little context for why we're talking about this today, our technical team, led by Ken, just recently developed a virtual event platform for one of our clients. Actually, it was developed a couple of years ago.
We've been iterating on it for a few years and it's unique in that it debuts for a few intense weeks each year. It's only live for a couple of weeks and specifically hosts this virtual event for four days, and then goes offline for the rest of the year. So, we have to get it right and we specifically have to get it right for the tens of thousands of visitors over the course of that four-day event that are coming.
So, this year we really went all in on Cypress testing to really ensure the success of the event.
So, Ken, I've heard you say we have 90% test coverage on this particular platform right now after the work that we've done. What does that mean? What does 90% test coverage mean?
Ken Rickard:
"It means we can sleep at night," I think, is what I mean when I say that. It’s simply that roughly 90% of the things that an individual user might try to do on the website are covered by tests. So, I joke a little bit about what happens when you press the big red button.
I mean, we have big orange buttons on the website, and the question becomes, "What happens when you press that button? Does it do the thing you expect it to do?" Also, the content and behavior of that button might change depending on whether or not we're pre-conference, we're during the conference, or we're during a specific session in the conference, and that changes again post-conference.
So, we have all of these conditions that change the way we expect the application to behave for our audience. I'll give you a simple example. During a session, the link of the session title, when you find it in a list, doesn't take you to the session page. It takes you to the video channel that's showing that session at that time. That is true for most sessions for a 30-minute window during the entire conference.
Our testing coverage is able to simulate that so we know, "Yep, during that 30-minute window, this link is going to go to the right place." So when we talk about that sort of 90% coverage, it means, from an engineering standpoint, well, even from a product management standpoint, you can look at the feature list and say, "Well, we have 300 features on this website and we can point to explicit tests for 270 of them."
Those numbers I just made up, but that gets you the point.
Jill Farley:
That sounds like an incredible amount of work to try to understand what to test and what types of tests to write. I'm actually going to go over to Byron for a second. As a member of the development team, what was it like actually writing these tests, creating them, and using them?
Up front, prior to them actually doing their job and covering our bases, if it was hard, let's talk about that.
Byron Duvall:
It was interesting and it was different because we use a different language for the testing. There's a lot of special keywords and things that you have to use in the testing framework, so just learning that was a bit of a curve.
Then, the biggest issue, I think we ran into with all of the testing was the timing of the test. The Cypress browser runs tests as fast as it can, and it runs faster than a human can click on all of the things. So, you start to see issues when the app doesn't have time to finish loading before the test is clicking on things. You have to really work to make sure that you have all of the right conditions for that to happen, and everything loaded, and you have to specifically wait on things.
I think that was the most challenging part. We usually had an idea of what we were looking for when we were writing a piece of functionality, what we were looking for it to do. That was kind of an easier part because we could write the click commands and write the test for what we're actually looking for in the return on the page.
So, that was the easiest part of it. The trickiest part was just the whole timing issue.
Jill Farley:
So Ken, how do we decide what to test if we're doing hundreds of tests? Is there ever really an end to what we can test or how do you do that prioritization?
Ken Rickard:
There is a theoretical end because we could theoretically cover every single combination of possibilities. You go for what's most important, and for what the showstopper bugs are. So, for instance, three simple examples of the first test we wrote, actually test #1: Do the pages that we expect to load, actually load? And do they have the titles that we expect them to have when we visit them?
Test #2: Those pages all exist in the navigation menu. Does the navigation menu contain the things we expect it to? And when you click on them, do they go to the pages we want them to? Also fairly simple.
Then we start to layer in the complexity because some of those menu items and pages are only accessible to certain types of users, certain types of conference attendees on the website. So you have to have a special permission or a pass to be able to see it.
So test #3 would say, "Well, we know that this page is only visible to people who are attending a conference in person. What happens when I try to hit that page when I'm not an in-person attendee? And does it behave the way we expect it to?"
So we start from the sort of show stoppers, right? Because if someone has paid extra money for an in-person ticket, but I let everyone view that page or don't treat that in-person user as special in some way, we're going to have angry clients and angry attendees. So we sort of test that piece first.
Then it's a question of, I would argue, testing the most complicated behaviors first. Like, what is the hardest thing? What is the thing most likely to go wrong that will embarrass us? And in that case, it's we have a whole bunch of functionality around adding and removing things to a personal schedule.
And we counted it up and over because we have pre-conference, during conference, post-conference there turned out to be 15 different states for every single session and we have tests that cover all fifteen of those states. So that we know what happens when you do, like I say, click the big something at the specific time.
So that's really how we break it down.
Jill Farley:
Makes tons of sense. Sounds like those are both, well, it brings new meaning to coverage. We're not just covering the functionality; we're kind of covering our butts too, making sure that, you know, we're not missing any of the big things that could really affect the attendee experience and the business, I guess, the business focus of the conference.
Ken Rickard:
Right. And the other thing too, the, well, those two other things as well. It does let us focus on what the actual requirements are because there are times when you go to write a test, you're like, "Well, wait a minute. I'm not sure what this thing should do if I click on it. Let's go back to the project team and find out." And we did that a number of times.
And then when you have something that's complex and time-sensitive, the biggest risk you run from a development standpoint, I think, is "Oh, we fixed issue A but caused issue B." So, you get a bug report. You fix that bug and it breaks something else.
Complete test coverage helps you avoid that problem. Because we broke tests a lot and you'd see a failing test and be like, "Oh wait, that thing I just touched actually has effects on other parts of the system." And so having those pieces again gives us a better product overall.
Jill Farley:
So test failures could be a good thing in some cases.
Ken Rickard:
Very much so. I actually was reviewing someone's work this morning and they had to change the test cases. I don't think they should have, based on the work they were doing. So, I was reviewing the pull request and I said, "Hey, why did you change this? This doesn't seem right to me because it indicates a behavior change that I don't think should exist."
Jill Farley:
Byron, I want to ask you. I know that you were involved in some of the performance work on this particular platform. What do you think? Did our Cypress tests in any way prevent some performance disasters? Or do you think that it's mostly about functionality? Like, do the two relate in any way?
Byron Duvall:
I don't think we had any tests that uncovered performance issues. I can't think of any specific example. It was mostly about the functionality and it was about avoiding regressions, like Ken was talking about. You change one thing to fix something, and then you break something else over here. I don't think that we had any instances where a performance bug would have been caught.
Ken Rickard:
I would say every once in a while. One of the things that Cypress does is it monitors everything your app is doing, including API requests. I think it came in handy in a couple of cases where we were making duplicate requests. So, we had to refactor a little bit. These were pretty small performance enhancements, so yeah, nothing big around infrastructure scaling or things like that.
But Cypress could catch a few things, particularly, I mean, if you're talking about a test loading slowly. It's like, “Oh, we have to wait for this page to load.” That can be indicative of a performance issue.
Byron Duvall:
Yeah, that's a good point. And then, the Cypress browser itself will show you every request that it's making, so you can tell if it's making lots of requests that you don't believe it should be making, or if it's making them at the wrong times. That could indeed be a way to uncover something, but it's really completely separate from the tool that we use to test performance outside of those other clues that you might get from Cypress testing.
Jill Farley:
So we've talked about Cypress testing and functionality. Can it test how things look? How things display?
Ken Rickard:
It can. But it's not a visual testing tool. It's not going to compare screenshot A to screenshot B, but we could write specific tests for markup structure in the HTML. For example, does this class exist inside this other class? It does have some tools for testing CSS properties, which we use in a few cases. Jill, you'll remember this. Are we using the right color yellow in one instance? Thus, we have an explicit test for, “Hey, is this text that color?”
Jill Farley:
That's the that would have been the big disaster of the event is if it wasn't the right color yellow.
Ken Rickard:
So, we do have a few of those which are visual tests, but they are not visual difference tests. That's a whole different matter. However, you can write a test to validate that. Let's revert to my previous example: the big orange button is, in fact, orange.
Jill Farley:
Just a couple more questions. Let's start with: What are a few ways that going all in on Cypress testing this year got in our way? Perhaps there's something we might do a little differently next time to streamline this, or maybe it's always going to get in our way, but it's worth it.
Ken Rickard:
I think the answer is, it can be painful, but it's worth it. The fundamental issues we had when we push things up to GitHub and then into CircleCI for continuous integration testing is that we don't have as much control over the performance and timing of the CircleCI running of the app as we do when we're running it locally. So tests that pass routinely locally might fail on CircleCI. That took us a long time to figure out. There are ways to get around that problem, which we are using.
The other significant issue is that to do tests properly, you have to have what are called data fixtures. These simply mean snapshots of what the website content looks like at certain points in time. They're called fixtures because they are fixed at a point in time, so they should not change. But because we were transitioning from last year's version of this application to this year's, there was a point where we had to change the content in our fixtures. We're actually about to experience that next week when we transition from the test fixtures we were using to a new set of fixtures, which contain the actual data from the conference - all of the sessions, all of the speakers, all of it.
Updating that is a massive amount of work. So being able to rely on things like, "Hey, I want to test that Jill Farley's speaker name comes across as Jill Farley," means we have to make sure that the content we have maps to that.
Jill Farley:
Is it fair to say that incorporating this into our development process, again, it's worth it, but it slows it down?
Ken Rickard:
I don't think it overall slowed us down. I believe overall, it might have increased our efficiency.
Byron Duvall:
Indeed, I believe it did increase our efficiency. It allows us to avoid a lot of manual point-and-click testing on things when we're done. We can do some development, write a test or have someone else write the test, and use that tool to do our manual testing while we're coding instead of just sitting there pointing and clicking. Even creating test data, if we have the fixtures in there like Ken was talking about, is a big help as well.
One of the things that did slow us down, however, was being able to differentiate between a timing or an expected failure versus an actual problem or regression with the code. There were instances where we didn't recognize which type of failure it was. We changed the test, but then we inadvertently broke something else in the app, so we should have paid more attention to that specific test failure.
When you're trying to troubleshoot various types of failures, timing failures, and things that may differ on CircleCI or just fail intermittently, figuring out whether it's a real problem or not can slow you down. It can also kind of defeat the purpose of testing as well.
Ken Rickard:
Right. In normal operation, which I would say we're in about 90% of the time, it simply means that on a given piece of work we're doing, we're changing just one thing, and that's the only thing we need to focus on. We can trust the tests to cover everything else. So, we can be assured that nothing else broke. The question then becomes, do we have a good new test for this thing that just got added?
Sometimes, I conduct pull request reviews without actually checking out the code and running it locally. I can just look at it and think, "OK, I see what you did here. You're testing here. You didn't break anything. OK." That's acceptable and, actually, it's a great feeling.
Jill Farley:
This is probably the last question. We've talked a lot about the benefits to the development process and kind of gotten into the specifics of how to do it. I'm curious for anyone who's considering incorporating this into their process, are there key, you know, kind of maybe two to three benefits from a business perspective or a client perspective? Why take the time to do this? I can actually think of one from a business side. I was sort of the delivery manager on this project, and in layman's terms, the manual QA process on the client side, once we sort of demoed this work to them, was so much shorter. I mean, last year, when we weren't doing this, there was a lot more pressure on the human point and click testing, like Byron was saying, not just on our development team side but on the client side as well. So as they're reviewing the functionality and really testing our work to see if it's ready for prime time, the development of a safer testing process really decreased the amount that we found during the final QA phases. It was really nice to sit back at that point and say, "Yeah, we've covered all of our bases." So Ken, what would you say are the biggest business benefits to incorporating Cypress testing into a product?
Ken Rickard:
The biggest business benefits, I would say, are getting a better definition of what a feature actually does because the developers have to implement a test that covers that feature. This creates a good feedback loop with the product team regarding definition. Another significant factor that derails projects is either new feature requests that come in at inappropriate times or regressions caused by making one change that accidentally breaks multiple things we were unaware of. You witnessed such incidents in past projects when we didn't have test coverage, but now we don't have to deal with that anymore. Those are the big ones.
It also occurred to me, as you were talking about the client, that one of the nice things about Cypress is that it records all the tests it runs and generates video files. Although we didn't share those with the client, we could have sent them the videos and said, "Okay, we just finished this feature. Here's how it plays. Can you make sure this covers all your scenarios?" They could watch the video and provide feedback. This potential is significant from a business standpoint because it allows for various asynchronous testing. It's funny because when you play back the Cypress videos, you actually have to set them to play at around 1/4 speed, otherwise, it's hard to follow along.
Jill Farley:
So that question I had about does this slow us down? It sounds like we make up the time.
Ken Rickard:
Oh, we definitely make up the time. Yeah, most definitely, again, just in in catching regressions.
Jill Farley:
So, business benefits we've got: saving the client time and heartache on the final QA, we've got getting to a better definition of what a feature does, safeguard against regression. Byron, do you have any other thoughts on the sort of business benefits of Cypress?
Byron Duvall:
I think that pretty well sums it up. I don't think I would add anything to that list, that's pretty comprehensive.
Ken Rickard:
I mean, it's a little selfish to say, but I think it made us better developers, because you have to think through all of the implications of things.
Jill Farley:
Well, I think that's a great statement to end this conversation on. You both feel as though you're now better developers because of this experience. Thank you both!
Hopefully, this gave everyone an idea of how we went about it, some things to plan for, and a bit of guidance on how to approach introducing Cypress into your process. Thank you both so much for your time, and happy Cypress testing!