The most important and also most demanding aspect of creating a website is undoubtedly the creation of its content. It must be relevant enough to keep the user reading and come to the main point before he bounces off the page, no matter it is a web app or an article. But how relevant is the design of the site? Its beauty?
Why should I care?
When it comes to beauty, I am very ordinary: I believe I can recognize it, but I can not tell its properties. I know there are some websites that are more beautiful than others, even if I can not tell what makes them more beautiful. And most of the time in my life I don’t to need tell it. But I am currently trying to evaluate how to reach potential customers for paas products. As I learned recently in “sad startup stories” this can be pretty expensive. Attention is a scarce good it seems. And also hard to grasp completely. In a discussion about the sites performance the question “does it look good enough?” pops up sooner or later. Later if you have actionable data, sooner if not. And in the early stages of a new site, data is one of the things that you probably don’t have. I did put an ad on adwords and measured the traffic with Google Analytics to understand my users. Analytics is great. It tells me how many vistors my website has, how much time they spent on it on average and how many of them perform certain actions and land in the famous conversion funnel.
But it does not tell me why my visitors did things or more importantly why they did not do things. It can measure conversions but it does not explain their absence. What can I improve about my site to attract visitors? More text, less text? A friend suggested: maybe it’s not beautiful enough. There it is again. Is it because my site is not beautiful enough? Not flat enough? Unfortunately there is no metric for this in Google Analytics.
But I will not give in and just assume some answer! I believe this is much more expensive in the long run. I will have to do a small experiment and find out. That sounds like fun!
First I try to verify some supposedly obvious assumptions: a beautiful version of a landing page will perform better than an ugly one. My current landingpage www.nukapi.com/apphosting_ is based on a commercial Startup Template 2.0 from designmodo, which claims to enable its users to “create beautiful, responsive websites”. They look good to me, so I gave it a try. The regular price is $249. For this money I could buy around 625 visitors from Google, since I pay around $0.40 per vistor in Adwords currently. How does such a design perform in comparison to a fast and cheap hacking like this: www.nukapi.com/apphosting? It is based on feeling responsive which I use for my general website.
One easy tool for evaluation is Google Optimize, the successor of Google content tests in Analytics. It’s a standalone application, which shares data with Google Analytics. It also takes care of the split of the traffic, so that each visitor sees only one variant of the landing page. And it does the math: it tells when you have collected enough data and the probability that the result is not just coincidence. Of course you can do the math yourself, by importing the data from Google Analytics and using python or R. But its much more convenient to use a tool like this. I give it a try to get started fast. You can define multiple variants of the page you want to test as well as the objectives you are interested in:
These objectives are defined in Google Analytics. I am interested in the count of the visitors, which show the intention of buying my product or willingness in participating in the beta and the average time the visitors spent on the website. The final experiment description consists of a null hypothesis, that there are no differences between the variants in terms of the defined metrics and the counter hypothesis, that there is a difference. This text does help you later when you interpret the results. It is important to stick to the original goals to prevent misinterpretations and wishful thinking
Beware of false generalization!
When you are doing an experiment like this, there are at least two traps you should be aware of: do the math right and beware of false generalization! Wikipedia defines, that “the significance level defined for a study, is the probability of the study rejecting the null hypothesis, given that it were true”. That means it is not sufficient that the average conversion in one of the variants is higher to conclude that it performs generally better. You will never be absolute certain about this, but you should know how certain you can be. Wikipedia teaches us even a name for this certainty: “the p-value of a result is the probability of obtaining a result at least as extreme, given that the null hypothesis were true”. You should define the null hypothesis and decide which certainty you desire before you perform the experiment. Computing the certainty, or the p-value, is a well described process, which can be performed easily in programming languages like R or python. In this experiment we rely on Google Optimize to do the math for us. But even when you do the math right, there is the very real danger of misinterpreting the results. You can not generalize the results to other designs. The experiment is only able to judge this specific two designs. So if you design your site by experiments, never rely on a single experiment. Try to reproduce your results in multiple runs if you have the time and resources. Try multiple versions of the experiment if you can. And even if you do so: beware of false generalization and premature learnings. All you get is a comparison of two specific sites, not a general model for beauty or design.
When do I get the results?
The setup was quite easy. What now? Google recommends to let the test run at least two weeks to get valid data. So I guess I have time to implement some more experiments. I will publish as soon as possible a follow up article with the results. Stay tuned.
Siavash Sefid Rodi
fun startup stories