Building a Realtime Election Tracker

On January 11th, 2020, Taiwan went to the polls to elect their next President and Legislature. For us at the News Lens, we formed a small team of designers, engineers, and editors to build this year’s election tracker.

Feature Image
Jan 2020

On January 11th, 2020, Taiwan went to the polls to elect their next President and Legislature. In Taiwan’s short history as a democracy, this election was only the 7th presidential election that Taiwan has held, conducted after months of unrest in Hong Kong and an increasing trend of populism around the world. For us at the News Lens, we formed a small team of designers, engineers, and editors to build this year’s election tracker.

2020年1月11號,台灣舉辦了總統立委大選。從台灣民主化到現在的時間不長,這只是第七次總統大選。這段時期,香港正發生反送中事件,以及世界許多國家有往民粹主義發展的趨勢。我和在關鍵評論網的厲害設計師、編輯和產品長藉著這個機會做了一個特別的專題

Product

產品規劃

Before we started planning the technical details, we began by thinking about election coverage as a product: What do readers care about? What are their reading habits / usage patterns? Who is our audience?

開始討論技術的細節之前,我想先從產品的角度來談這個選舉報導專題:我們必須思考我們讀者所最關注的東西、閱讀習慣和他們的背景。按照這些需求,我們設計的頁面是左側的表格與右側的地圖互相連動。上方是整體的總統或立委選情。

Election Tracker Devices

For starters, we expected that 80% of our readers would be accessing our content from a mobile device, and the large majority would be looking for Chinese content, although we would also make a version for our English readers. We knew that we wanted to display the results as close to realtime as we possible could, and knew that we wanted to cover presidential, district legislators, party legislators, and indigenous legislators. We also knew that we wanted to create an experience that gave readers content for before, during, and after the election, and to make content that could be built upon and referenced in the future.

第一個考量是我們預期 80% 的讀者會用手機看我們的開票網站,再者,雖然大部分讀者是中文使用者,但也會有英文的讀者對台灣大選有興趣。我們也知道頁面需要最即時的資訊,包含總統、分區立委、不分區立委和原住民立委。而且我們也知道給讀者的體驗不只是選舉開票的過程,在選舉之前跟之後的資訊都很重要。另外,我們也想做出一些未來還可以持續沿用的元件。這開票網站總共花了差不多兩個月的時間完整。包含的團隊主要是一位工程師和一位設計師。

To begin, we launched our election coverage by doing an aggregation of political polls in Taiwan, giving readers a more holistic view of trends of support for each of the presidential candidates. This not only gave us a way to promote our election coverage, but also gave us a way to test our new, Svelte-based system for building interactive articles and work out some of the kinks before election day. In total, building this tracker took us about two months, with a dedicated engineer and designer working nearly full time on this project.

我們首先製作的選舉專題,是台灣民意調查結果的綜合呈現,讓我們的讀者能比較完整地了解每位候選人的支持度與支持度的變化趨勢。這個民調專題不但讓我們多增加一個方式推播相關的選舉報導,也讓我們在選舉之前可以測試數位報導的新系統。(下面有補充我們用的 Svelte 框架)

Maps

地圖

Given that the main interaction users make with the election tracker is through the map, the map had to be responsive to user interaction and data updates. To allow users to explore as much of the data as possible, we decided that our map would at have three different layers to interact with.

因為我們的開票頁面中,與使用者最重要的互動是透過地圖的方式,所以我們的地圖一定要有很好的操作體驗,更新資料的反應要迅速。為了讓讀者能夠探索最大量的資料,我們決定頁面的地圖資訊需要分成三個不同層次。

The topmost layer is the whole country, including the outer islands that are part of Taiwan’s jurisdiction, followed by 22 counties (縣市). Each of the 22 counties can be further divided into towns (鄉鎮區), and the smallest unit of locality which we did not display is the village (里).

最上層是全台灣的圖層,用 22 個縣市來呈現。各個縣市的下一圖層是鄉鎮區。而我們決定不要分到里的層級,因為我們怕讀者會搞不清,現在他在哪一層。

All of this map data is obtained from Taiwan’s government website in the form of a 114MB shapefile. Obviously, if we were to directly convert the shapefile to topojson (a format for browsers to draw maps, more on this later), we’d still be looking at a file that is ~90MB in size; far too big to have every user download to their device.

地圖圖資的原始檔案是台灣政府提供的 shapefile。很明顯地,如果我們直接把這個檔案轉到 topojson 給 D3 畫地圖,檔案大小大約 90 MB,這麼大的檔案會下載太久,也會影響網站呈現的效能。

Moreover, the map needs to be processed for special areas (areas that are part of Taiwan’s jurisdiction but may not be part of an official locality, such as 釣魚台) and relocation of the outer islands so that Kinmen and Lienchiang don’t make the map look too big. At first, I processed a few topojson files using QGIS and the shapefile obtained from the government, but after many failed attempts, (ie incorrect island relocation, missing districts, resulting file too large, too many details removed from the map during compression, etc) I decided to create an API endpoint that would process the map that I want on the fly.

還有一個考量是我們的地圖必須把特殊的地區整理出來,有一些屬於台灣控制但被管制沒有人居住的地方,例如釣魚台。地圖還需要把金門和連江移到離本島近一點,不然台灣地圖中間都會是海。一開始的時候,我手動用 QGIS 產生一些 topojson 的檔案,但因為每次做的都有一點不對(外島移的地方不太對、地區不對、檔案還是太大、壓縮的太過度等等),最後我決定用程式依據設定即時產生 topojson 地圖。

Dynamic Map Code

In essence, the API endpoint takes a few parameters:

  • Map Version: Due to the fact that these borders will change based on redistricting or otherwise, I wanted to future-proof this API by allowing future maps to be easily versioned by year.
  • Locality Depth: Do we want the borders of the county, town, village, or district?
  • Specific Locality: Do we want the borders of a specific county, town, village, or district?
  • Simplification Level: How simplified do we want the resulting JSON file to be? More simplification and quantization means smaller file sizes but also a loss of detail.
  • Condensation of Outer Islands: Do we want a map where Kinmen and Lienchiang are where they are geographically or do we want to see them closer to mainland Taiwan?

基本上,這個 API 有幾個設定:

  • 地圖版本:因為台灣地區的邊界是變動的,我的系統必須能按照年度去抓不一樣的地圖。
  • 地區層級:我們的地圖要縣市、鄉鎮、里還是選區的邊界線?
  • 特定的地區:我們只要某一個地區的地圖嗎?
  • 壓縮程度:我們想要把地圖簡化和壓縮到什麼程度?雖然簡化會產出更小的檔案,但地圖細節也會相對變少。
  • 外島移動:我們要不要把金門和連江移到比較靠近本島的地方?

However, because legislative districts are not exactly correlated to any one of the existing locales, we had to separately process all of the locales of each of the legislative districts. Because a whole day of searching could not turn up a shapefile for Taiwan’s legislative districts, we decided to create the map ourselves based on which counties, towns, or villages, belonged to which district.

然而,因為區域立委的選區劃分跟我們上面提的地圖地區不一樣,所以我們需要額外去處理選區的地圖圖層。在政府的網站找了老半天,都沒找到選區的地圖,所以就只好自己依據中選會公布的選區劃分資料來做一個選區的地圖。

Custom Map Making

SVG vs Canvas

必須考慮 SVG 的缺點, Canvas 的優點

After constructing a reliable way to generate our topojson, the next question was figuring out how to display it in our browser. The naive approach is to simply render the topojson to a SVG. However, the problem with this is that SVGs use the DOM, which means that changes or user interactions with a complex SVG is slow and resource intensive. Add locale labels and mouse interactions and suddenly your SVG implementation slows to a crawl.

做完我們產生地圖的方式之後,我們下一個問題是怎麼顯示到瀏覽器裡。其實如果很單純得把 topojson 直接用 D3 顯示 SVG 是很容易,但用 SVG 對效能不是很好。因為 SVG 用的是瀏覽器的 DOM,這變成每次跟地圖互動或者更新資料時,都要用 DOM 去更新。再加上地區的標籤與滑鼠的互動,網頁就會突然變得很慢。

The alternative is to use a canvas for display and an invisible SVG for interaction. This allows us to avoid expensive DOM operations of maintaining and manipulating the color of each locale on the map and instead use the canvas API to paint our map, while maintaining a SVG just to keep track of when users interact. In my very unscientific benchmarking on my own machine, I found that SVGs needed to be about twice a simplified compared to canvas in order to achieve similar performance. More about this method here.

效能比較好的作法是用 canvas 顯示,但用 SVG 接互動的事件。這個做法可以避免 DOM 因為計算每一個不同地區的顏色和位置而變慢。大略的實驗之後,我發現用這種做法大約可以提昇兩倍的效能。如果想要更深入了解這個做法,可以參考這個教學

The optimizations that I could have done on the map are endless. If we were to further optimize the map, I would recommend using some kind of WebGL implementation, potentially MapboxGL to make map rendering performance even better, but a canvas approach was good enough for this use case.

如果還有更多的時間,其實還可以繼續地優化我們的地圖,例如使用一種 WebGL 的套件、MapboxGL 等等,但我覺得這個專題用 Canvas 就夠好了。

Data Pipeline

資料流程

The data for our election tracker was obtained from the Central Election Committee, which gives us a 3MB JSON file of all the data anytime it is updated. Because this JSON file is so large and can take up to 5 seconds to download from the Central Election Committee, we decided to only fetch it at a regular cadence and process / store it into a database we can control and scale.

因為我們的資料來源是中選會,每次串入的資料是一個 3MB 的 JSON 檔案。因為檔案尺寸這麼大,有時候會需要到五秒的時間才能完成下載,所以我們決定用 cronjob 一分鐘取一次,然後存在我們自己的資料庫裡(Google Datastore)。這樣我們才比較能應對網站的高流量。

Data Pipeline

To do this, we used a cron job that would fetch the latest data from the Central Election Committee every minute, and separate out presidential and legislative results before storing it into Google Datastore.

然後為了讓編輯們能加上當選註記,我們開了一個 Google Spreadsheet 讓編輯填寫。這個 Spreadsheet 直接連到我們的系統。這個做法後來有出問題,我會在下面一段解釋。

Then, in order to indicate when a candidate has won, we connected a Google Spreadsheet that allowed our editors to manually mark when a race had been won. The system would then use that data to display the winner on the frontend. As it turns out, this became a problem on the night of the election, which I expand more on in the “lessons learned” section.

Election Rules Edge Cases

選舉規則特殊狀況

In terms of electoral rules, Taiwan follows a simple majority-wins policy for electing it’s president, which is easy to display on a map. However, legislative races follow a fairly complex system:

  • Taiwan’s legislature has a total of 113 seats.
  • 73 district legislators are elected through a majority-wins policy for each of the 73 designated districts of Taiwan.
  • 34 party legislators are elected through a party-vote system, in which voters vote on a party, and all parties that obtain over 5% of the overall vote get to assign legislators based on the proportion of votes they receive. The legislators of a party elected through this system must be >50% female, according to the order they are ranked in their party list.
  • 6 indigenous legislators are elected by two three-member constituencies, picking the top three legislators for each plains and highlands indigenous constituencies.

台灣選總統是最高票數的候選人贏,這種規則很容易被顯示在地圖上。但是,選立委的規則很複雜,要用地圖呈現是一個蠻大的挑戰:

  • 台灣立委總共有 113 席。
  • 73 席是分區立委,總共有 73 選區選出各個選區的最高票數候選人。
  • 34 席是不分區立委,投給各個政黨。政黨票超過總票數 5% 的政黨,會按照比例去分配這 34 席。但是用這個方法選上的候選人,有女性保證名額,必須超過50% 是女性。
  • 6 席的原住民立委中,3 席給山地原住民,3 席給平地原住民。

This logic poses a few challenges. First of all, the topmost level we can display for district legislators is the district itself, because we can’t calculate the color for a county that has multiple districts with different races.

這些邏輯要怎麼呈現呢?首先,分區立委能夠呈現的最高層不是縣市,因為一個縣市如果有多個選區,你沒辦法在無關的選區選出哪一個政黨領先,所以最高層只能顯示各個選區的地圖。

Second, because indigenous legislature races elect the top three candidates for each plains and highlands indigenous groups, simply coloring a district with the party of the first-place candidate does not reflect the true breakdown of the vote. Thus, we decided to take the top three candidates and add up the votes based on their political parties, then take the color of the party with the most votes.

第二,因為原住民立委是山地和平地各選出最高票的三位,我們不能直接用最高票的候選人去呈現一個地區,因為這不見得真正代表一個選區的支持。我們討論到後來決定把前三高票按照政黨加起來,然後再選最高票政黨的顏色去呈現。

Third, because of the gender rule for party votes, legislators of a party that get elected through this method do not necessarily correspond with the order that they are placed in the party list. As a result, we have to build an algorithm to determine skip over the next male legislator in the list if there are already too many men.

第三,因為不分區立委有女性保證名額,每個政黨所排得不分區席次順序,不一定是選上席次的順序。因此,我們再特別寫一個邏輯來判斷什麼時候需要跳過男性的立委,以滿足因為那個政黨選上的女性比例。

Post-Election Graphics

選後圖表

Another large element of our election coverage was the graphics that we post to social media such as Facebook and Instagram. Because we wanted to be able to generate social media optimized graphics as fast as possible, we needed a method faster than designers plugging in election results into graphic. To achieve this, we built an internal website that would dynamically generate SVGs with the latest data from the Central Election Committee.

我們選舉專題另外一個很大的部分是要提供在網站、FB 和 IG 要發布的圖表。如果等到選舉之後才請設計師出資訊圖表,他們一定會做得又趕又累。因此,我們寫了一個內部用的網站,可以即時用中選會的資料產生他們事先設計好的圖表。

Dynamic Graphic Generation

To achieve this, designers would first come up with the design of the graphics in the form of an adobe illustrator document. Then we took those graphics and exported each one into a separate SVG. Because SVG files are a XML-based specification, we can port these into our web application in order to make content within the SVG dynamic.

我們的流程是先把設計師設計的圖表輸出成 SVG。因為 SVG 只是一個 XML 類型的檔案,我們可以很容易用網頁框架動態地產生 SVG。用 SVG 這樣操作的好處是用什麼程式語言或框架都可以。在我們的用途中,是將產出的 SVG 設定為可以下載成 SVG 以及PNG 兩種檔案。

The great thing about using SVGs in this way is that it is language and tool agnostic, which means you can use whatever existing templating language / framework you use to generate these graphics. In our case, we fitted it into an internal website that the social media team can directly access to download the generated SVG or a PNG format to be posted on social media platforms like Facebook and Instagram.

Monitoring

系統監控

Another critical piece of election-preparedness was making sure that we knew everything that was happening with our web application at all times. To monitor the health of the application and make sure that everything scaled properly, we set up a grafana dashboard that kept track of server errors, response latency, upstream API connections, cache hit ratios, as well as the standard CPU, memory, and database metrics.

另一個很重要的準備是確定我們隨時都能知道系統健不健康,這樣我們才能夠在出事時,有更快的反應。為了系統監控,我們在 Kubernetes Cluster 上運用 Prometheus 和 Grafana 紀錄錯誤、回答速度、往外連線、cache hit 率和比較基本的 CPU、memory 和 DB 等等的資訊。

Grafana Monitoring

By having the web application directly report metrics to prometheus, it was much easier to get the fine grained metrics specific to elections that we cared about. Because we had taken the time to integrate key metrics into prometheus, when errors began to show up during the election, we were able to immediately know what was causing the error.

把 prometheus 的資訊套在程式上可以比較容易監控一些客製化的資訊。選舉當天 Google spreadsheet 出問題的時候,我們因為有套 prometheus ,所以馬上就知道是 Google 那邊的問題。

Svelte

紐約時報開發的前端框架

Although React has been possibly the most popular front-end framework to use these days, one of the questions I find interesting to ask during interviews is “Under what circumstances is React not a good framework to use? When would you recommend not using React?” This is a question that stumps many less-experienced engineers and people without a good grasp of what React really does in the browser under the hood. In fact, I believe that there are many situations under which React is not a good choice, and building interactives in a news environment is one of them.

最近在前端工程師的世界裡 React 變得很紅,但我最近常常很喜歡問的問題是「怎麼樣的狀況之下你覺得 React 不適合?什麼時候會建議不要用 React 開發一個專案?」這是一個需要比較有經驗的前端工程師才能回答的問題。我認為有很多狀況下 React 不適合。而媒體的特殊數位報導就是其中一個。

News organizations have a very different development and deployment cycle than tech companies do, requiring tighter deadlines, faster feedback loops, and more bespoke flexibility. Thus, I decided to use Svelte, giving us the following main advantages among many:

  • Development speed: Given the amount of work to be done within a short amount of time, I decided that Svelte would be much faster to write than React, giving faster development and feedback times. Using Sapper with Svelte would also allow me to write the server-side data APIs quickly and easily.
  • Bundled JS Size: Because Svelte runs as a compiler, it’s able to build and include only the code that your applications needs, making the compiled bundle much smaller than a React application.

新聞對數位報導的開發需求,需要更優雅的程式設計去即時地傳遞與回應訊息,這是一件非常有挑戰性的事情。因此,我這次開發即時開票頁面使用的框架是 Svelte。為什麼要選擇用 Svelte,它有哪裡比 React 好?

  • 開發速度。Svelte 大致上比 React 要寫的程式碼少 ~30%。這個聽起來可能沒有很多,但 30% 的差別也代表 30% 的效率。
  • 程式大小。因為 Svelte 是一個 Compiler,最終寫完的程式碼檔案很小,因為他只會打包你的程式所需要的東西。

If you’re interested in learning more about Svelte, there are many tutorials to help you get started.

如果你有興趣學 Svelte,這裏有很多可以參考的教學。

Lessons Learned

學到的心得

Load Test

壓力測試

The biggest problem we saw on election night was that the site was down for ~1 hour due to high traffic. Specifically, it wasn’t the servers on our end that could not take the traffic, it was our usage of Google Sheets that caused Google to lock our google spreadsheet when the flow of traffic became too large. After Google locked out requests to the spreadsheet, our system started responding to downstream users with 500 errors, causing the entire site to be inoperable.

我們在選舉當天遇到關鍵評論網的最高流量。這個高流量造成網站有差不多一小時的時間沒辦法使用。而問題是出在我們打 Google Spreadsheets 的流量太高,導致 Google 把我們的 Spreadsheet 封鎖,所以我們的 API 一開始有很大量的錯誤訊息。

The solution was to first disconnect the spreadsheet from our services, which brought both our service and the google spreadsheet back online. Then, we implemented a quick function that would prevent the service from requesting data from the google spreadsheet so often. This brought everything back online.

排除這個問題的第一步,是把我們系統跟 Google Spreadsheet 的連線斷開,讓 Google Spreadsheet 回復到正常。接下來,我們加了一個很簡單的邏輯,讓我們的程式不要那麼常去打 Google Spreadsheet。這個作法使我們的系統回復到正常狀態。

The lesson here is that, unless you fully load test your entire system before the election, you never know what problems will be caused when traffic increases by a few orders of magnitude. Often, third-party services such as Google Docs will have their own unwritten throughput and bandwidth limits.

這個事件讓我們學到的是,你永遠不知道高流量可能會造成什麼樣的問題,而你也需要完整地做一個壓力測試,你才能知道第三方服務會有可能出現什麼樣的問題。

Have a Clear Incident Response Plan

規劃一個明確的反應流程

No matter how much planning you do in preparation of the big day, incidents are always likely to happen and things will never go exactly according to plan. That said, after you’ve made all the preparation you can possibly make, the next important thing to plan for is how to respond when things do not go as plan. In our case, by dividing up responsibilities we had a basic level of responsibilities when things went awry, but it would have been more advantageous to rehearse a few scenarios.

不論你做了多少準備,當下發生的事情通常都會跟你預設的不一樣。這代表你做完所有能夠想像的情況準備之後,你還要預設一個當事件發生時的反應流程。這個流程包含跑一些練習,預設誰負責溝通,誰負責修問題。這兩個角色一定需要準備,因為修問題的人需要專心的修問題,沒辦法同時跟大家回報現在的狀況。

Namely, when the site goes down, it is important to define who will fix the problem and who will report the status to the rest of the team. It is hard for one person to do both, lest the person trying to fix the problem becomes bombarded with too many inquiries about the current status and is unable to concentrate on fixing the problem.

Wrapping Up

專案結論

In conclusion, I hope the process we used and the lessons we learned can be built upon by the community, allowing future elections and similar events to be faster, more efficient, and better experiences for our readers and our organizations. I’m sure that our processes will continue to evolve, and I hope we can continue to learn better workflows and collaboration processes.

我希望我們在關鍵評論遇到的經驗可以幫助大家的思考,讓以後的選舉我們都可以給讀者更好的閱讀體驗。我也希望我們的流程會不斷的進步,讓我們能繼續分享我們學到的東西。