perjantai 28. lokakuuta 2011
Even though it is said in the article that software teams aren't like sport teams or flocks, I want to share a view from one ice-hockey coach, Hannu Aravirta. He described his management style to be flying in V-formation, like flock of cranes. The person who is in the front of the formation is changed regularly to give breathing time for everyone. Even though in ice-hockey team (in this case, national team of Finland) there is a formal ranking given by the organization, everyone will take turns in the front of the flock. And everyone must lead the flock in the same direction.
In real world, the rotation is not random, though. The stronger birds might be in the front for a longer time (I'm not sure if this is true). This happens also in the context of software development. The flock, or team, will face variety of problems and issues. Some of those are technical, some are business related, and so on. So it should be the person who has the best capabilities to resolve the current issue who is in front of the flock, giving the direction where to go.
Like all analogies, this isn't complete. It gives an impression that the current leader of the flock will make all decisions alone, which is not the case.
torstai 27. lokakuuta 2011
One important part of administrating servers is monitoring. Monitoring tells you when something goes wrong and why it went wrong, preferably before users notice the failure.
There's two somewhat distinct parts in monitoring, alerts and statistics. At higher level, alerts will notify operations staff when something is wrong, for example server stops responding. Then statistics gives you some ideas what went wrong.
We are currently using Icinga as a alerting tool and Munin as a statistics gatherer. Installation and configuration of these tools is little bit complicated, but by following documentation it is possible to have pretty decent monitoring system running in few hours.
Here's an actual case where monitoring proved to be essential. We had a situation where one server would stop responding after running few weeks and only way to resolve this was restarting server software. To make things worse, this particular system wasn't monitored at all and was running some business critical services. So first step was activating Icinga on that server so we could get alerts when the software stopped responding and after that, Munin was installed with pretty basic configuration.
After few weeks, following graphs were visible:
These graphs show information about network connections which were active. As it can clearly be seen from the "Netstat - by month", we had a pretty serious resource leak. We added a new alert to Icinga, which monitored the active connection count and sent an warning early enough so we could restart that server before resources were exhausted. The time window for restarting was pretty easy to deduct, because we could use graphs from Munin to see how fast the connection rate was going up and the upper limit for connections.
The actual reason for this is still somewhat a mystery, but it was tracked into one specific proxy server that one of our client used: it just didn't close connections properly. So we added an timeout for idle connections, which should have been there from the beginning. And currently monthly graph shows this:
Just try to guess when the fix was deployed.
sunnuntai 18. syyskuuta 2011
First thing I noticed was tremendous amount of requests that the initial load generates, as it was more than 50. For comparison, loading the front page of google and executing a search generates around 20 requests. A lot of these requests were different css -files and small images. At least those images would have been easy to combine into css -sprites, and some of those css files could have been combined, too.
Second simple thing missing is compression of responses. Again, simple explanation.
Third low hanging fruit would be usage of CDN for JQuery. There might be one problem, as they are using pretty old version of JQuery (I spotted 1.4.2 in one place and 1.3.2 in another, release date of previous was February 19th, 2010) and it might not be available on public CDNs (I didn't check this, lazy sunday as it is).
The backend seems to be implemented in Java and Apache Struts. It is hard to say anything else about the backend without using a lot more time.
Launching a web service as big as this is always a challenge, and I don't know if implementing these small things would have prevented the problems. It seems that most of the page loading time is spent at the backend, so the improvement on the frontend might not have any impact at all. If the application is partitioned into front- and backend, that is.
By the way, at the moment, the online shop is in maintenance mode.
perjantai 16. syyskuuta 2011
8th Annual VTT SOFTWARE ENGINEERING SEMINAR - New Thinking - Better Software Engineering - Great Experience
Originally published at http://www.sysart.fi/news/64/58/8th-Annual-VTT-software-engineer-seminar/d,devblogi, please comment there.
Lean, transparency, global software development.... in the cloud.
VTT and leading companies challenge you to learn and discuss about productive ways of doing things in changing environment. Topics include innovation, security, Cloud technologies, lean approach towards efficiency and other software development methods and tools.
The general theme seemed to be transparency and openness in development and innovation. In the keynote, Elektrobits Jari Partanen talked a lot about transparency in the context of process improvement, saying that transparency is key enabler to identify what are the opportunities for improvement for lean and agile approaches - "Transparency attracts". He also emphasized that having a wiki is not enough for information sharing, there must be some activities like workshops and so on.
Another good point in the opening keynote, at least from my point of view, was the comment about how Elektrobit doesn't want to stress word Lean, they wan't to emphasize their "Way of Working", because it more attractive to all of the organizational entities. It is true that Lean, just like agile, is being considered as a thing that development does. But this leads easily into situation where process optimization is on the local level, when you should optimize the whole.
Transparency theme continued in the sessions which talked about innovation. Innovation should be open and transparent on all organizational levels and it should be customer oriented, but usually there isn't any communication channels between different organization levels and then the innovations from customers tend to disappear. So there should be an open, social system supporting innovation. Even though social is hype at the moment, in this context it is interesting. As lean methods say, the improvement ideas should come from the actual workers, and here they mean the support persons, developers and marketing people who will see what the customer wants and how the customer uses the product.
Rest of the sessions were about lean, distributed development, security and user experience. Some of these suffered a little bit from hype, as there was few times when the presenter said when talking about some basic software develepoment, almost as an afterthought, "in the cloud". That term is seriously overused and overloaded. It is funny when some people use cloud to describe how application can interact with each other and form mashups, and some people talk about technical characteristics, like provisioning, scaling, availability etc. But that seems to be how it works.
One session caught my interest just because it had a great name, "Flow-based software development on epic level". But the session was somewhat a disappointment, the main idea can be said in one sentence, "Global development queue of prioritized cross-product epics are assigned to teams eliminating product silos". The problem with this is the lack of domain knowledge; when every team works with every product, no one understands the domains. It might work if all products are on same domain, like in F-Secure, where the presenter, Arto Saari, was working, but I have my doubts. And F-Secure hasn't implemented this at full scale, yet.
Few one liners from different sessions:
- "EB doesn't want to stress word Lean they want to emphasize their "Way of working" - it is more and more attracting all the organizational entities" - Jari Partanen, Elektrobit
- "Transparency is key enabler to identify what are the opportunities for improvement for lean and agile approaches - "Transparency attracts"" -Jari Partanen, Elektrobut "
- Success is increasingly determined by the ability to embrace the potential of innovation ecosystems" - Timo Koivumäki, VTT
keskiviikko 31. elokuuta 2011
That sound pretty much like a iterative development cycle.
torstai 11. elokuuta 2011
sunnuntai 8. toukokuuta 2011
Many companies view paying for employees to learn as an unnecessary cost. Anything that is not tactical or does not produce immediate, measurable results is "fluff, and those who endorse opposing view are not considered serious-minded, action-oriented business people. But the absence of deep technical understanding drives a "more is better" philosophy, which leads to more elaborate gauges, more reviews, more audits, more inspections, and more checkpoints (because it is always "safe" to check and check again).And after few lines where they talk about different quality initiatives like ISO, Six Sigma:
On the surface, these quality initiatives demonstrate a "commitment to quality" and make people feel that they have accomplished something. In truth, however, they fail miserably in achieving those things that guide an intelligent and balanced approach to fostering quality in a PD process: the time, dedication, and hard work required for a deep technical understanding of that process.First, I'd like to add documentation into those "more is better" things. Same goes with different certifications and process models.
All of those things are used when those who doesn't have deep technical understanding try to control development process. They try to enforce some "best practices" they've seen in some publications without understanding them correctly and just think that "we must do this because everyone does this". There's at least two problems with this:
- If you only follow best practices, you're only doing what those who developed did few years a go.
- "Best practice" might not be the best in your context.
torstai 14. huhtikuuta 2011
Everything went on pretty smoothly, until I tried to boot up my first slave using PXE. Then I got the error message on the console of the new virtual machine:
FATAL: Could not read from the boot medium! System halted.
It seemed that the new virtual machine didn't even try make a connection to the dhcpd -server running on master. In PXE -boot, first thing that the client does is getting ip -address from the dhcpd -server. But nothing happened.
After two or three hours of testing and checking configuration files, in a moment of desperation, I tried to change the type of the network card from "Intel PRO/1000 MT Desktop" to "PCnet-FAST III (Am79C973)" and everything started to work. Wicked.
Hopefully this will help someone :)
There seems to be a bug report for this already.
tiistai 12. huhtikuuta 2011
Main theme in the sessions I followed was alternative languages on JVM, functional programming and system stability. There was a lot of talks about J2EE, but I pretty skipped all of those. It seemed that polyglot programming is the next big thing, especially when you combine it with scalable systems and massive amounts of data.
If I'd have to say which were the best sessions, I'd say that Michael Nygard had sessions which were interesting because I'm currently working on areas he talked about. Venkat Subramaniam was extremely enthusiastic and his sessions would have been enjoyable even if he had talked about JSP -pages, but now the themes made his sessions even better. Jevgeni Kabanov managed to make some deep memory handling stuff understandable by implementing simple processor in Java.
I've been lately thinking that a lot of articles and seminar sessions are directed to wrong audience. I'd say that most of developers, who reads articles and participates seminars, knows that writing tests is good, using multiple languages will boost your productivity and so on. It is the management who should be preached about writing tests or using right tools for the job. The evil management stands prohibits developers from doing the Right Thing (tm). But during Venkat Subranamians and Ted Newards final talks I realized that when I think that way, I'm actually avoiding the responsibility.
I definitely try to participate next year, and I'll convince few friends to join me.
Below you'll find long, poorly written summaries on the sessions which I participated.
Linda Rising - Deception and estimation
In this session, Linda Rising talked about why estimations tend to optimistic. Main reason seems to be that we are hardwired deceive ourselves. We also tend to refuse to think about or process information we do not like and we can even distort data to comply with our own opinions. Of course, we don't see this when we do it.
We tend to overestimate our own abilities. For example, we estimate that we will live ~10 years longer than statistics tell. But overestimation of our abilities might not be all bad; Our ancestors were optimistic, if they've been realistic about survival and odds they've would not even tried.
When doing agile development, estimation is done in small intervals. Then you might make a good guess, but estimates won't get any better. At least you'll never get it right. The main thing to remember is that estimations aren't facts.
"There are no facts about the future." David T. Hulett
Matt Raible - Comparing JVM Web Frameworks
Matt Raibles' JVM Web Framework comparison is pretty well known, just google for it. Main takeaway from this session was a method for evaluating what framework is best for your current needs. One interesting evaluation point was passion: if there's is someone in your team who is passionate about some framework, you should take that into account when making decision.
Nathaniel T. Schutta - Hacking Your Brain For Fun and Profit
This was about the brain.
Sleep is important (I should know, I'm writing this after late arrival to home after conference). During sleep, brain is active and processes events that happened during the day sorting what is useful and what is not. Also naps are proven to boost performance, there's been a study that shows that 26 minute nap increases performance by 34%. But the main thing is that you should know your own sleep patterns: know when you are at your best and schedule your time according that.
Exercising is another important thing which will improve your performance and, of course, health. By exercising, Nathaniel didn't mean full-fledged marathon training. Even moderate amounts are enough. You could use standing desks or even "treadmill desk". One interesting method was walking meeting. Nathaniel does his one-on-one meetings with his boss while walking around.
Third main point in this session was all about learning and getting better in what you do. Change is constant, so there's always stuff to learn. Learning happens best when there's elaborate, meaningful stories and examples which include context (war stories). Spaced repetition is also important.
You go through different stages in your development when your understanding evolves: from beginner to expert, from simplistic to complex to profoundly simple. For beginners, rules are important, but they kill experts.
In today's world, there's too much information. This leads to "infotention", which means that you give a little bit attention to many things. Your attention is precious, don't waste it. You should start an information diet, meaning that for some time (days, weeks, months) you select what to ignore.
Last thing was writing down your ideas. This is important because you will forget what you've thought if don't write them down. And this is meaningful because ideas beget ideas.
Ted Neward - Busy developers guide to Scala: Patterns
Scala is a language which people have said to be the next Java, which I personally do not agree. Nevertheless, Scala has a lot of interesting language structures that removes need of some patterns and changes some while creating new ones.
When you have first class functions, shocking amount of patterns go away (Chain of responsibility -> list of functions, visitor -> pattern matching).
Neal Ford - The Productive Programmer
Last session of day one was a three hours long marathon about programmer productivity. Neal shared a lot of tips, tricks and programs which will make everyday life a lot more productive. He also talked about automation and tools. Main thing here is that you should know your tools and use them as much as possible. But even then, remember that tools aren't the product. Don't shave the yaks.
Jevgeni Kabanov - Do you really get memory?
Jevgeni Kabanov is the CTO and founder of ZeroTurnaround. Session had it's roots in two blog posts, http://dow.ngra.de/2008/10/27/when-systemcurrenttimemillis-is-too-slow/ and http://dow.ngra.de/2008/10/28/what-do-we-really-know-about-non-blocking-concurrency-in-java/.
Jevgeni showed us a simple processor model written in Java (no working code, though). During this he talked about how memory is accessed on operating system (and lower) level, what does volatile and synchronized mean, how heap and garbage collection works.
Few quotes from this session:
"Digging into Java and found some weird stuff there, memory in Java is weirdest abstraction ever."
"It's all about memory: most performance problems are memory related."
"You are always running in a distributed system."
"There is always something exiting in the garbage collection world."
Venkat Subramaniam - State of Scala
This session was mainly about new features in Scala 2.8, so there's not too much to say about that. Venkat talked about streams, vectors (and Tries) etc. the most important unheard feature (for me) was @tailrec -annotation. With this annotation, you can have a compile time error if annotated function is not tail recursive.
Of all sessions during 33 Degree, Venkats'¨sessions were amongst the top five. He's just so enthusiastic about programming and knows how to capture the audience.
Steve Freeman - Fractal TDD: Using tests to drive system design
Main point in this session was the division between unit testing and system testing. Unit testing makes system easier to modify, system testing makes it easier to support. A lot of stuff that is needed for system testing is extremely useful when running in production. For example, end-to-end -testing needs that following things are possible :
- know what the system is doing,
- know when the system has stopped,
- know when the system has gone wrong,
- know why the system has gone wrong,
- restore the system to good state
All of those are required for automated end-to-end testing.
For previous to be successful, a good logging is important. Logs are part of UI, but usually a lot of decision on logs are done on too low level. This leads to what Steve calls Logorrhea, meaning inconsistent log levels, inconsistent formats, duplicated reports etc. Solution to this is to move logging to right domain -> monitoring events instead of logging using structured messages.Then error reporting, self healing, alerts can listen these events and act accordingly. This leads to observable system behavior, meaning that you observe stuff that is useful instead stuff that people write when they don't know what else to do
One interesting point in this session was that you shouldn't mock 3rd party integration. This is because you cannot change that API and then you'll lose one important part of TDD as you cannot modify your design. So you should write an adapter and mock that in your unit tests. For 3rd party integration, you should have test integration separately while including your adapter in these tests.
Nathaniel Scutta - HTML 5 Fact and Fiction
Pretty standard HTML5 things about development of standard and features in HTML 5. You can already start using HTML 5 features, if you use feature detection (http://www.modernizr.com/)
Venkat Subramaniam - Programming Clojure
About Clojure syntax:
"If you're used to lisp, it's very easy. If you're not used to lisp, get used to it."
Venkat said also "I program in 8 languages and there wasn't single language cried and kicked and screamed when learning syntax." meaning that new syntax will always look ugly.
Steve Freeman - Five years of change, No outages
The example application was a data warehouse for bond state data. What the system did was receiving updates from different systems and send those updates forward after some manipulation. This was 3rd. or 4th attempt to develop this system. Main reasons for success were following.
- Started with clear team culture -> culture has been holding up through 2-3 generations
- The team had a culture which required to make things right. All members had experiences about projects which had been failures, so they wanted to this one right.
- People were hired for attitude, degree of productivity. Outside researcher was with the team for a while and commented "Other teams talk about quality, you seem to be doing it"
- First acceptance test two weeks from beginning
- The domain was such that it was easy to take one vertical segment at a time. This way it was possible to demonstrate used methods and progress from early on
- There was an existing system, so they could use real data
- Like previous, this helped verifying system.
- Own deployment environment, no operations people to say what to use -> made possible to script deployment -> easy to set up environments -> Deployment to production is 10 minutes
- Using Fit to show how system works -> Analysts could write new tests and figure out what is happening.
- Right tests on right level makes it possible that you don't have to remember everything.
- A lot of effort went into testing, not in coding tests but in discussions about what the system should be doing
Neal Ford - Functional Thinking
"OOP makes working with state easier. FP makes elimination of state easier"
In this session, Neil went through some basic functional programming methods and styles. He also pointed out that you don't need functional programming language, you can think and code functionally in Java also.
In the end, he laid out five principles:
- Immutability instead of state transitions
- Results over steps
- Composition over structure
- Declarative over imperative
- Paradigm over tool
Main thing in this session was that you can use and benefit from functional thinking even if you don't use language which supports functions as first class citizens
Patrycja Węgrzynowicz - Automated Bug Hunting
Last session of second day was about code quality and tools which can be used. Despite of some technical problems, the presentation went on pretty smoothly. There was some interesting techniques and tools, like usage or test Oracles and Theorem provers.
Michael Nygard - Failure Comes in Flavors
For me, this was probably the most interesting session. Michael had few good war stories about weird bugs which affected millions of people.
The most important thing to have is failure oriented mindset. Every system, every network cable, everything will try to pull your system down. There are a lot of reasons why different systems are brought to halt. If every system was unique, there wouldn't be any hope. Luckily we have patterns in failures.
- Integration points, out of process calls
- Every socket, process, pipe or remote procedure call can and will eventually kill your system
- Timeouts, Circuit breakers
- Chain reaction
- Failure in one component raises probability of failure in its peers
- Common in search engines and application servers
- Resource leaks are usual
- Bulkheads -> separate horizontal layer to different pools
- Cascading failure
- Layer has been lost, failure moves vertically
- SOA -> one big failure domain
- "Damage containment"
- It's not realistic to eliminate every bug
- Timeouts, Circuit breakers
- sheer traffic, flash mobs, click-happy
- malicious users
- screen scrapers, badly configured proxy servers
- Attack of Self-Denial
- Good marketing can kill your system at any time
- Two types of "bad" users
- expensive services -> ssl, integrations, pages
- bargain hunters, screen scrapers
- useless sessions
- divert, throttle or avoid creating sessions
- especially for spiders
- Self healing
- Turn off expensive features
- Use lightweight landing sites (static)
- Divert/throttle, good user experience for few users even if you cannot serve everyone
- Reduce burden of serving each user, watch memory
- Only allow the user's second click to reach application servers
- Differentiate people from bots, don't keep sessions for bots
- Minimize memory
- Weird things happen
- Keep lines of communication open
- support the marketers, they'll do what they want if you say no
- ie. buy that from 3rd party, integrate it to system -> BOOM!
- Blocked threads
- Most common form of crash: all request threads blocked
- Very difficult to test
- Permutation of code pathways
- timing, amount of traffic
- Keep threads isolated/do not use threads
- Unbalancing capacities
- Traffic floods sometimes start inside the data center walls
- Chained systems, where lower has less resources -> might not be issue, depends on traffic and usage
- Ratios are different in production vs. development
- Watch out for changes in traffic patterns
- Funneling of traffic
- Slow responses
- Connection refused -> fast failure, thread released
- Slow response -> thread tied down, user wait
- On slow response, systems and users try again (timeout -> retry)
- Too much load, transient network saturation, firewall overloaded, protocol with built in retries (nfs, dns) hosts file inside own center, use conf management
- Chatty remote protocols
- Unbounded result sets
- Development and QA with small result sets
- Other systems doesn't restrict result sets, be careful in SOA
- Realistic data volumes, copy data from production
- External systems can change overnight
Simon Ritter - The Future of the Java Platform: Java SE 7 and Java SE 8
Pretty basic stuff about new features, mainly just for SE 7. A lot of this was familiar from online articles, but still good session, especially about the reasons why Java SE 7 was delayed.
Matthew McGullough - Hadoop: Divide and Conquer Gigantic Datasets
This was also pretty much basic Hadoop stuff, nothing too fancy but good presentation. Interesting part was the history, structure and usage of Hadoop. It is quite interesting idea to store all data instead of summarizing it regularly. When you combine this with sensor networks, you might have something interesting stuff going on.
Neal Ford - Abstraction Distractions
Everything we do is a abstraction over another. Abstraction distraction happens when we think something is real although it is just an abstraction.
Don't mistake the abstraction for the real thing
Always understand 1 level below your usual abstraction
Once internalized, abstractions are hard to rid of
Abstraction are both walls & prisons
Don't name things with underlying details
Your abstraction isn't perfect
Understand the implications of rigidity
Good apis are not mereley high-evel or low-level; they're both at once
Generalize 80% cases; get out of the way for the rest
Composability, the One true abstraction?
Michael Nygard - Architect for Scale
System scalability is hot topic today, and everyone seems to be concentrating on it (even if they don't have to). So it was nice to hear someone who has actually operated large scale systems.
Sizes of systems:
Medium 1 million requests per hour, 100 nodes, no need to talk about scalability, application server centric
Large scale 10M/hour, 1000 nodes, data centric, automated operations, async messaging, multiple datastores, caching servers, different views of universe and time per server, "where I store data?"
Extreme scale 10B/hour, 10000 nodes, Operations centric, "How I deploy?"
Purely technical definition: Reduction in elapsed processor time due to parallelization of workload
Workload can be divided into two different sections: pure serial section and parallel section.
Contention and coherency
Contention on serial resources
Coherency = state across multiple processes, needs time
If you keep adding more nodes, workload goes down. This is due the coherency, more and more time goes into keeping everything up-to-date. To keep amount of nodes small enough,, big applications must be partitioned. There are two easy partitioning schemes, horizontal and functional. In horizontal partitioning, data is distributed using keys. This is best applied by application logic. Functional partitioning means that different function/transactions are done on different servers. This can be accomplished on client side or by using a load balancer.
You can reduce the serial factor too. Serial factor can be made smaller by using reverse proxies, web accelerators or CDNs. You can also make responses smaller or using caching. But beware, wrong configurations can actually weaken your application performance.
One of the most important things in application development is usually forgotten. It is operations, those guys who keep applications running. As applications grow in size, the amount of administrator and other support staff rises supra-linearly. You can reduce the amount of needed administrator by using automation in deployment and configuration.
Venkat Subramaniam - It could be heaven or it could be hell: On being a Polyglot Programmer
Venkat had another great session, this time about being a polyglot programmer, ie. programmer who uses multiple different languages. Session started with claim "what language we use moulds our thoughts". This might lead to suboptimal resolutions to problems. But if you know multiple languages, you might be able to see a different way to resolve the same problem. And it's possible to use language "specific" structures in other languages. So if you only know one type of language, you're at a significant disadvantage.
Java programmers share unrelenting hope ( I can do that in Java too!). But there are a lot of things you can do more easily with different languages, for example XML-generationin Groovy. Still, the Java platform has a lot of good feature, mainly powerful VM, good libraries and garbage collection. And since 1995, the virtual machine and libraries has gotten better, but what about Java language? Luckily other languages on the JVM can use Java's good features.
The hard part of starting to use multiple languages is convincing others to allow it. Change is hard. One way to do this just not telling them that you're using different language and just showing the results. But don't be infatuated with technology.
Some people change when they see the light, other when they feel the heat
Ted Neward - Rethinking "Enterprise"
Teaching follows same pattern from first grade to industrial courses: first you get a solution (teach something) and then you're given a problem to solve. So the problem and solution is always near each other. This teaches us to always use the latest thing we learned.
Resist the Temptation of the familiar! Because every project is different you should reject the "Goal of Reuse".
It's common to try to find the best solution for a problem by asking others or searching for best practices. But the problem is usually so complicated that just to facilitate any answer at all, we have to use such simplified models of the problem that any result is essentially useless. This is one result from the fact that every project is unique. So eschew the "best practice". Best practice actually gets you not the best, but merely the average. "Best practices" are our attempts to avoid thinking. We're afraid that we are wrong, so we try to find answers from anothers so we could hide behind their backs.
But there are no shortcuts. You have to do katas, meaning that you have to code small systems using different frameworks, libraries, languages and so on. You have to develop your own evaluation function for every case.
There is no spoon.