The Clockwork Mind
...armed with the knowledge you get by simply paying attention...

Fixing Skipped Updates in JIRA Agile

Just a quick post to get a solution out there for a weird problem that apparently only I’ve had.

Over the weekend, I worked on the long-overdue production JIRA upgrade at my work. (Seriously, we were three major versions behind. By the time I became the admin, we were all afraid to touch it for fear of losing our precious workflows.) I had tested the upgrade process on another server, taking that one from 5.2.7 to 7.12 fairly smoothly, except for when it became apparent our test VM needed more than 2GB of RAM. So, forewarned about add-on incompatibilities and upgrade paths, I was reasonably confident I could handle any upgrade weirdness.

What I wasn’t expecting was for JIRA to refuse to recognize a step in the upgrade process, meaning the carefully planned 5.2.7 to 6.4 to 7.0 to 7.12 path left some pieces behind when the 6.4 upgrade didn’t register or run some of the upgrade steps, even though it appeared to be working when tested. Specifically, the Greenhopper/JIRA Agile/JIRA Software transition only about half worked.

The Symptom: We found our Agile boards had disappeared in 7.12.

The Cause: The JIRA Software application wasn’t starting up due to failed upgrades.

The Log Entry:

2019-08-11 10:55:54,059 JIRA-Bootstrap ERROR      [c.a.s.core.upgrade.PluginUpgrader] Upgrade failed: This version of JIRA Agile requires a minimum build number of #45 in order to proceed. The latest upgrade task to have been successfully completed is #39. You must uninstall this version of JIRA Agile and attempt an upgrade via an intermediary version.

The Reason: Atlassian stopped providing cumulative updates from the beginning of time in Greenhopper plugin 7.1. The 7.12 version of Greenhopper could only start the upgrade process at step 45. We were at step 39.

Suggested Solution: Replace the Greenhopper plugin with the last version that had all cumulative updates, 7.0.11. Restart the server and watch the updates fly. Very reasonable. Easy to find on the Atlassian site.

Result: Failure.

The Symptom: No upgrades. No boards.

The Cause: JIRA Agile never starts up to try to upgrade anything.

The Log Entry:

'com.pyxis.greenhopper.jira' - 'JIRA Agile'  failed to load.
    		Cannot start plugin: com.pyxis.greenhopper.jira
    			Unresolved constraint in bundle com.pyxis.greenhopper.jira [185]: Unable to resolve 185.0: missing requirement [185.0] osgi.wiring.package; (&(osgi.wiring.package=net.java.ao)(version>=0.9.0)(!(version>=2.0.0)))

Suggested Solution: None. That’s weird.

Desire: Not to roll back to the beginning and disrupt all of the users that don’t need the Agile Boards.

Next Step: Digging through documentation and muttering.

The Find: JIRA’s OSGI Browser

The Clue: The OSGI browser made one thing obvious. JIRA 7.12 uses net.java.ao 2.0.0, making that part of the log entry very meaningful.

The Question: What happens if I grab the last version of net.java.ao from the JIRA upgrade backups and temporarily replace the bundled plugin, while running Greenhopper 7.0.11? (Always say yes to backups.)

Result: Me watching the build number tick up one by one in the database. Yes!

Cleanup: Replace the replaced files with their correct versions, restart, and watch the build numbers hit 51. Reindex, and we’re golden.

Lessons Learned: Update JIRA often, and always pay attention to log files.

RIP, Joe Armstrong

Sad news today. Joe Armstrong, co-creator of the Erlang programming language, passed away.

I never met him, but I read his writings in books, Twitter, and his blog, and he always struck me as a person who knew how to be an older programmer. He was constantly teaching people and learning new things, behaving with grace, dignity, and humor, always curious…this is the model I want to follow.

It’s tempting to say the world got a little darker today, but instead we’ll remember how Joe Armstrong’s life made it a little brighter.

For the curious, a collection of links:

Calling Erlang from Elixir

Lately I’ve been working with Elixir, a new-ish language based on the Erlang VM (BEAM). Elixir is designed to take advantage of Erlang’s parallel processing power while presenting the developer with less intimidating syntax.

I’m not going to spend any time here trying to sell Elixir. Elixir-lang.org does that well, and much like with Ruby on Rails, Elixir has the Phoenix Framework leading the way to greater adoption. I will say I got it right off the bat better than any other functional language I’ve seen.

I’ve been doing programming exercises from Exercism.io, which is a great resource for a lot of different languages. While doing the Gigasecond exercise, I had to drop down into the Erlang weeds for some time/date functionality. (Erlang and Elixir relate much as Java and Jython do.)

This exercise calculates the date one billion seconds after a given date. While there might be some Elixir-native libraries that can do this sort of calculation by now, I decided to use Erlang’s Calendar module to transform the data.

def from({year, month, day}) do
  :calendar.datetime_to_gregorian_seconds({ {year, month, day}, {0, 0, 0} })
  |> + 1000000000
  |> :calendar.gregorian_seconds_to_datetime
  |> elem(0)
end

Going through it step-by-step:

The function is called from/1 – defined by the test module Exercism supplied – and accepts a tuple as an argument with the year, month, and day.

(You’ll see a lot of function/n notation in Elixir. It means the function takes n arguments. You can have functions with the same name that take different numbers of arguments…but it’s more complex than that. Elixir does pattern matching to determine which function to call. We’ll talk about that in the future.)

In order to call an Erlang function, you have to call its module using the syntax for an atom, an Elixir constant with a value of itself. In this case, calling the Calendar module is done with :calendar, and invoking the function is through the oft-used dot notation.

We’ll get a new value from the passed-in date – Elixir variables are generally intended to be immutable – by passing the date into the :calendar.datetime_to_gregorian_seconds/1 function, which should return the number of seconds since year 0.

In this case, Erlang’s :calendar.datetime_to_gregorian_seconds/1 function takes one argument, a datetime, which is represented as a tuple consisting of a date tuple {year, month, day} and time tuple {hour, minute, second}. We only care about the date, and not the time, so we pass in {0, 0, 0} as the time.

Next, we pipe the return value of the Erlang function, using the pipe operator |>, to the next step, which returns a new value with 1000000000 added to the previous value. So, the number of seconds since year 0 plus 1 billion. And yes, the pipe operator is very similar to the Unix version.

Passing that new value to another function in the Erlang calendar module: :calendar.gregorian_seconds_to_datetime/1, which takes in an integer value of seconds and converts it back to a datetime calculated from year 0.

We get back a datetime – which, remember, is a tuple of a date tuple and time tuple – but we don’t care about the time. So, we use elem/2 to get back the 0th element of the datetime tuple.

But wait? elem/2? Elem takes a tuple as the first argument, and the index position as the second. Since we are piping the datetime tuple into the elem/2 function, the tuple ends up as the first argument even when not explicitly written out, and the supplied 0 becomes the second argument.

What we get back is the date tuple, and since that is the last value before the end, from/1 returns that date tuple to the calling function.

I mostly wrote this out to remind myself that Erlang modules are available as atoms in Elixir, but I like it as an example of piping through a series of data transformations as well.

I found this very helpful: Calling Erlang code from Elixir: a tale of hubris, strings, and how to read the documentation.

The nice thing about Elixir is that there are a few other ways to do this same thing, and as near as I can tell, they all work about as efficiently. So, you can write code in the way that you find easiest to read.

Stay tuned for more adventures in Elixir.

Magical disappearing session value

Fascinating bug: Intermittent, only reproducible on some computers and not others, only a problem in a certain version of Internet Explorer on those computers, affecting server-side behavior, and faux-consistent. (In other words, the behavior could be consistent, but the consistency was due to a confounding variable rather than the obvious cause.)

I work on a web application that uses SES-friendly URLs for nearly everything. This works fine, even though it’s implemented at the application framework level and is therefore kind of a hack that browsers might or might not understand. We’ve run into a few instances of relative links not working correctly in some browsers, but those were easily fixed.

For searching our resources, on the results page we put all of the search criteria into the URL so people can share the URL with others. Since we use the same search functionality for our topic pages, the topic page URLs usually look something like this, running a search in the background by topic and subtopic ID:

https://www.example.com/resource/list/topic/1111/subtopic/2222

On this topic page, you can also choose to get the presented results in different formats. We started getting bug reports that certain subtopics were presenting the correct resources on the page, but if you clicked to get certain other formats, it would ignore the subtopic and just give you all of the resources under the topic. This behavior would happen on certain subtopics, but not others, under the same topic.

My hypothesis, from experience with the setup, was that our legacy of storing search variables in the session for certain purposes was probably biting us. But how to prove it, and why only one session variable?

I started working with it, and I found out I couldn’t reproduce the behavior on my development machine, even with the exact same install of IE11 with the same general settings. After trying the usual “turn it off and on again” type of phone troubleshooting, I headed over to the client’s office to borrow a computer.

It turned out that it happened pretty consistently with the borrowed computer. Of course, I was testing against production, and my development environment was on my local machine. I didn’t have admin rights to the borrowed computer, so setting up the environment there wasn’t going to happen, and getting the two machines to talk would require one more Ethernet connection than I had. Luckily, we’ve got a decent test server setup that we normally use for acceptance testing, so as long as I could get the same behavior to happen there I could remote into the test server and do the necessary logging there.

I finally found a place where the problem happened consistently, so, reloading the page on the borrowed computer while remoted into the test server, I started logging the session variables in question, working my way back from the endpoint in question. Yep, the session variable for the subtopic wasn’t disappearing, but it was null. Checking at the start of the process, session.subtopic was 2222, like it should have been.

Logging all the way through, I discovered something troubling. The session.subtopic value was disappearing during or before an event that had nothing to do with session variables. The usual debugging method of staring at the code while muttering didn’t reveal anything, so I went back to the network tab in IE’s Developer Tools and reloaded the Ajax call again and again. Still nothing.

We do have two mutually exclusive JavaScript files that could potentially load into that page. Maybe we somehow loaded both of them, occasionally creating a heisenbug due to precedence? Let’s reload the one page where everything worked fine and look for those two JS files. Nope, only one.

Wait, what’s that? Way at the end of the list of files going over the network is something that starts with https://www.example.com/resource/list…. That’s not right.

Full URL:

https://www.example.com/resource/list/topic/1111/subtopic/resources/images/icons/favicon.ico

Tried it on my computer, and that load never happens. Try it on the client’s computer, and it happens pretty regularly, but not all the time.

Searching through the code base, I found a place where I had added the site’s favicon months before:

<link rel="shortcut icon" href="resources/images/icons/favicon.ico" type="image/vnd.microsoft.icon" />

Not sure where I found it originally, but I’m sure I copied and pasted from somewhere. Unfortunately, my search for the proper code didn’t lead me to rel=”shortcut icon” considered harmful.

See any problems? Yep, it’s a relative link, so it requires the browser to make a guess at the proper place in the URL from which it is relative. Plus, I’d also dropped favicon.ico in the normal root directory default location, so it shouldn’t have been necessary at all.

Chrome and Firefox just ignored the whole thing and pulled the favicon.ico from the root. IE followed it, but only sometimes. I’m not sure, and probably never really will be sure, why there was a difference between the client IE and mine, but I suspect mine had cached the file at some point while the client’s IE requested it sometimes. The sometimes was just consistent enough to appear to happen on certain subtopic values and not others.

So, the effect, many times, was that the page would load with the proper URL and proper information, that it would store in the session variables. The page would then reload into the browser using a URL without a valid subtopic value and ending in a filename. IE would pull all of the data into that presumed file, and then discard it as not being a legitimate .ico file, never actually reloading the visible page. That second run through the process overwrote the session variables with all of the same information, except for that one variable at the end that had an invalid value. Validation would quietly set it to null, since usually the page results and URL would clue you in on what was wrong.

I removed the offending line, and we were all good. Even the client’s IE was fine with the root version of the favicon.ico.

I suspect that, if we had a trailing slash after the subtopic value, or anything else less important was in that last position, we would have never noticed it until questioning our server usage.

Conclusion:

  • We were often running searches twice for months. I can’t imagine how many server resources we were using, even with caching the results.
  • I really need to rearchitect the search process to simplify it and remove some of our old behavior. We have better tools now, and there’s little excuse for a simple process that uses very different code for the same results.
  • Examining the actual traffic more often would have brought this bug to light.
  • 90% of the users are using a slightly different system than I am. We need one of those for testing.
  • No matter how innocuous it seems, copying and pasting deserves extra testing attention.
  • Validation really needs to send a message, even if it seems like it could just be quiet.

Exporting variables in Go

This is going to be one of those classic “I’m writing this down because I know I’m going to forget and do it again” posts.

I’m currently playing around in Go, and I was wondering about using an external config file in a Go program. Heading to Stack Overflow gave me several suggestions of packages and config file types, but I decided to try a basic JSON version.

(BTW, this seems likely: Computer Programming To Be Officially Renamed “Googling Stackoverflow”.)

I pretty much just copied the code from the top answer, using my preferred camelCasing for the struct fields instead of what the top answer had.

package main

import (
	"encoding/json"
	"fmt"
	"net/url"
	"os"
)

type Configuration struct {
	apiURL         string
	clientId       string
	clientSecret   string
}

func main() {
	file, _ := os.Open("conf.json")
	decoder := json.NewDecoder(file)
	configuration := Configuration{}
	err := decoder.Decode(&configuration)
	if err != nil {
		fmt.Println("error:", err)
	}

	fmt.Println(configuration.apiURL)
}

The problem was, whenever I ran the code, I just got blank in the console. No error, just blank. I went to look at why, and remembered that Go sets variables that start with lowercase to be private to the package, while uppercase variables are public outside of the package. This is called Exporting.

Changing the struct fields to start with uppercase, like so, made everything work.

type Configuration struct {
    ApiURL          string
    ClientId        string
    ClientSecret    string
}

But why? This was all happening in the same package, right? Private vs. public shouldn’t matter. And even if it did, shouldn’t there be an error thrown when we’re assigning the values?

Stack Overflow to the rescue again. I fed the encoding/json library the configuration struct and expected it to just match JSON fields to struct fields. To do that, it uses reflect to view the fields in the struct.

See the problem yet? The encoding/json library isn’t part of this package. Therefore, can it see and write to the private fields in the configuration struct? Nope. And since Decode doesn’t require a one-to-one field match between the JSON object and the struct, and Go will happily fill out ignored fields in the struct with null values, I was getting a configuration struct back with null values for every field. So, fmt.Println helpfully printed out the value of the field to console: blank.

According to one of the Stack Overflow answers above, if you want to keep the struct private, but the fields accessible for reflection, you can do it this way with a lowercase struct and uppercase fields:

type configuration struct {
	ApiURL         string
	ClientId       string
	ClientSecret   string
}

Lessons learned. If I’m incorrect about any of this, hit me up @shanethacker.

Update: Tags in Go structs look interesting. I’d bet Decode can use them for matching.