Difference between revisions of "User:Dekarl"

From XMLTV
Jump to: navigation, search
m (Known Issues with Character Encoding: update notes)
(Proper TV Metadata Schema Bits and Pieces (Hi TVBrainz): Sesame Street started with shared seasons to branch off into a localized german branch)
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Known Issues with Character Encoding =
 
= Known Issues with Character Encoding =
Which grabbers status will turn red if we add character encoding checks?
+
*Need to verify that we can dump perl strings at XMLTV::Writer and it will do the right thing with regard to escaping anything outside $encoding into XML entities.
*{{grabber|ch_search}} data source sends windows-1252 as iso-8859-1 (e.g. Euro Symbol) Newer HTML readers are supposed to handle this correctly. Need to verify that we can dump perl strings at XMLTV::Writer and it will do the right thing with regard to escaping anything outside $encoding into XML entities.
+
*{{grabber|hr}} {{grabber|no_gfeed}} {{grabber|se_swedb}} [http://repo.or.cz/w/nonametv.git/commitdiff/17bfeec55a6cc01adb9db4d8a78f0fb17cfde11d fix commited upstream]
*{{grabber|se_swedb}} data source sends windows-1252 as iso-8859-1 (e.g. single right quotation mark in actor name from data source Viasat)
+
*{{grabber|re}} writes iso-8859-1 header and programs but adds utf-8 encoded categories.
+
 
*{{ticket|1910245}} should add a test for HTML entities in the generated XML. (hint ´ is invalid XML!)
 
*{{ticket|1910245}} should add a test for HTML entities in the generated XML. (hint ´ is invalid XML!)
  
 
= Issues with Time Zones =
 
= Issues with Time Zones =
*{{ticket|2483562}} {{grabber|huro}} in netherlands
+
*{{grabber|dk_dr}} DST issues
 +
*{{grabber|il}} DST issues
 +
*{{grabber|it}} DST issues
 +
*{{grabber|pt_meo}} DST issues
 +
*{{grabber|uk_bleb}} DST issues, floating start>stop leads to wrong date calculation and time offsets
  
 
= Memory Leaks?? =
 
= Memory Leaks?? =
Line 18: Line 20:
 
** {{ticket|2837668}} site changes: unexpected hash references
 
** {{ticket|2837668}} site changes: unexpected hash references
 
** {{ticket|2858285}} close it, was "no channels found", the grabber does not fail completely anymore (see status)
 
** {{ticket|2858285}} close it, was "no channels found", the grabber does not fail completely anymore (see status)
** {{ticket|2880092}} close it, was "no programs at all", test_grabbers gets some programs
 
 
** {{ticket|2910015}} close it, was "no programs on channel 1", test_grabbers tests with that channel succesful
 
** {{ticket|2910015}} close it, was "no programs on channel 1", test_grabbers tests with that channel succesful
* {{ticket|1938073}} close it, [[XMLTVFormat]] doesn't allow floating time stamps! so shifting stuff around to please some local time is up to the consuming program. (better yet, just store everything in utc and add some intelligent local time handling... /rant)
 
You can also append a timezone to the end; '''if no explicit timezone is given, UTC is assumed.'''
 
  
 
= Cleanup SourceForge Project =
 
= Cleanup SourceForge Project =
Line 33: Line 32:
  
 
= Potential Data Sources =
 
= Potential Data Sources =
* {{ticket|3015240}} pt_meo dedicated API, seems clear for personal use (and _pt is a scraper)
+
Candidates for wrapping into [[User:Dekarl/Static_File_Grabber_Template|Static File Grabbers]]
* {{ticket|2825794}} mx, user medisoft likes to run his own xmltv service, maybe suggest to contact the stations and use nonametv ;)
+
* _cz_arcao: XMLTV export from [http://xmltv.arcao.com/ arcao.com]. Provides explicit time offsets.
 +
* _dk_ontv: XMLTV export from [http://ontv.dk/xmltv/ ontv.dk].
 +
* _eu_phazer: XMLTV service from tvprofil.net aka [http://tvprofil.net/xmltv/ Phazer XMLTV Service]. Notice that they provide timestamps in their local time as floating time which is intepreted as UTC...
 +
* <strike>_fr_kazer: XMLTV service from [http://kazer.org/ kazer.org].</strike> [http://xmltv.cvs.sourceforge.net/viewvc/xmltv/xmltv/grab/fr_kazer/ done]
 +
* _it_ambrosa: XMLTV export from [http://www.ambrosa.net/index.php/contents/XMLTV.html ambrosa.net]. Explicit about non-commercial use only.
 +
* _ru_teleguide: XMLTV export from [http://www.teleguide.info/article1.html teleguide.info]. Provides explicit time offsets.
 +
 
 +
= Configuration API =
 +
* [http://sourceforge.net/mailarchive/forum.php?thread_name=49B6B9F2.80905%40holmlund.se&forum_name=xmltv-devel] possible extensions/clarifications from a consumers POV
 +
* list in the supplementary files mapping DVB/ATSC id to grabber/channel. Then let --list-channels & co. enrich channel list with related ids. [http://code.mythtv.org/trac/wiki/TaskBrowserBasedSetup MythTV's Browser Based Setup] might be a consumer for this to allow automatic mapping of channels in the guide to channels on the video source.
 +
 
 +
= Data Sinks =
 +
* Check [http://www.cse.unsw.edu.au/~willu/w/xmltv/grabbers/index.html] and see if all are mentioned here
 +
 
 +
= Best Practices =
 +
== Consumers of XMLTV Data ==
 +
* be prepared that xmltv ids really might be similar to FQDN (255 characters max.) the longest I've seen in the wild is 69 characters (_es_laguiatv)
 +
 
 +
= Random Pieces of Information =
 +
== You might receive the same transport stream on multiple frequencies ==
 +
TS 101 211 - DVB Guidelines on implementation and usage of Service Information (SI)
 +
NOTE 1: The cell_id cannot be used to identify a service. The combination of service_id and original_network_id
 +
remains a unique identification of a service.
 +
 
 +
It is recommended to make all receivable multiplexes with the same transport_stream_id but with different
 +
cell_ids available to the user, and only when a service (not a transport stream) is available through multiple
 +
multiplexes to select a preferred multiplex based on e.g. reception quality.
 +
 
 +
Any reference resolution from a transport_stream_id or a service_id (e.g. from a linkage_descriptor
 +
transport_stream_id/service_id pair) to a multiplex/frequency requires consideration to handle the potential multiplicity
 +
 
 +
Note that in networks deploying the service_availability_descriptor, the unique identification of a transport stream by
 +
the tuple (transport_stream_id, original_network_id), can often be sensibly replaced by identification through the triplet
 +
(transport_stream_id, original_network_id, cell_id).
 +
 
 +
== Proper TV Metadata Schema Bits and Pieces (Hi TVBrainz) ==
 +
*Some series have multiple sets of episode titles per locale, usually its one set per broadcasting company
 +
** the Exes http://www.fernsehserien.de/the-exes/episodenguide/staffel-1/16232
 +
*Some series have a different title per season
 +
**Elephant Princess http://de.wikipedia.org/wiki/Elephant_Princess
 +
*Some series have alternate titles
 +
**The Killing http://de.wikipedia.org/wiki/Kommissarin_Lund_%E2%80%93_Das_Verbrechen this series also uses roman or arabic numbers in addition to the title as season specific series title
 +
*Usually the episode title is unique per series, but some series have multiple episodes with the same title
 +
**Lindenstraße http://de.wikipedia.org/wiki/Lindenstra%C3%9Fe/Episodenliste#Mehrfache_Folgentitel
 +
*some series started on radio and continued on tv
 +
** Die Hesselbachs http://de.wikipedia.org/wiki/Die_Hesselbachs
 +
*some series got rebranded
 +
** Pusteblume -> Löwenzahn http://en.wikipedia.org/wiki/L%C3%B6wenzahn
 +
*many series don't do seasons
 +
**Eisenbahnromantik http://www.swr.de/eisenbahn-romantik/ (lots of repeats on bags of tv stations and lots of full episodes on youtube, good for testing VOD integration due to no regional restrictions http://www.youtube.com/user/Eisenbahnromantik )
 +
*some episodes belong to multiple series
 +
**it is common for documentary brands to buy unrelated documentary movies and mini series that are run under a bigger brand
 +
** - example of the two episodes that belong to two series goes here -
 +
*some series have a different order per country/station
 +
**Wickie und die starken Männer http://forums.thetvdb.com/viewtopic.php?f=41&t=18059
 +
*some series contain the same episode multiple times / with multiple episode numbers
 +
**Eisenbahnromantik http://www.swr.de/eisenbahn-romantik/
 +
*some series / collection of movies are unclear if they should be a series or a collection of movies or both
 +
**Varg Veum, has a main cast and seasons, also used a translated and the original season title http://de.wikipedia.org/wiki/Der_Wolf_%28Fernsehreihe%29
 +
**Rosamunde Pilcher, shares no cast http://de.wikipedia.org/wiki/Rosamunde_Pilcher#Verfilmungen
 +
**Inga Lindström, shares no cast http://de.wikipedia.org/wiki/Inga_Lindstr%C3%B6m#Inga-Lindstr.C3.B6m-Reihe
 +
**Utta Danella, shares no cast http://de.wikipedia.org/wiki/Utta_Danella#Verfilmungen
 +
**Harry Potter, a feature film series http://en.wikipedia.org/wiki/Harry_Potter_%28film_series%29
 +
*some series are coproduced for multiple locales with most of an episode being shared internationally, but parts being replaced locally (completely different shots/actors)
 +
**Fraggle Rock, see http://muppet.wikia.com/wiki/Fraggle_Rock#Co-Productions
 +
*some series started out internationally as dubs and later branched of into unrelated shows of the same brand
 +
**Sesame Street, see http://muppet.wikia.com/wiki/Sesamstrasse
 +
*some episodic movies are basically short movies pasted together
 +
**http://en.wikipedia.org/wiki/Love_at_Twenty
 +
 
 +
The set of titles (series/season/episode) should be marked as being a true alternate or just a typo/search hint.
 +
Some titles are international (used for all languages without localization) titles - showing a poster of the international is better then defaulting to a specific locale.

Latest revision as of 06:13, 14 October 2014

Known Issues with Character Encoding

Issues with Time Zones

Memory Leaks??

Cleanup List

Feel free to take anything from the list

  • #1880681 the bug was solved (not a bug), the suggestion for Supplementary Files is turning one can of worms into another...
  • tv_grab_huro has no maintainer?
    • #2748362 site changes: holes in the collected programs
    • #2837668 site changes: unexpected hash references
    • #2858285 close it, was "no channels found", the grabber does not fail completely anymore (see status)
    • #2910015 close it, was "no programs on channel 1", test_grabbers tests with that channel succesful

Cleanup SourceForge Project

  • remove group/status/category examples from the tracker (might check for other unused stuff while there)

Check for Breakage caused by LWP::Simple

  • silent uncompression
  • silent code page conversion
  • silent proxy handling

Maybe it's best to move most uses over to our own Get_nice.

Potential Data Sources

Candidates for wrapping into Static File Grabbers

  • _cz_arcao: XMLTV export from arcao.com. Provides explicit time offsets.
  • _dk_ontv: XMLTV export from ontv.dk.
  • _eu_phazer: XMLTV service from tvprofil.net aka Phazer XMLTV Service. Notice that they provide timestamps in their local time as floating time which is intepreted as UTC...
  • _fr_kazer: XMLTV service from kazer.org. done
  • _it_ambrosa: XMLTV export from ambrosa.net. Explicit about non-commercial use only.
  • _ru_teleguide: XMLTV export from teleguide.info. Provides explicit time offsets.

Configuration API

  • [1] possible extensions/clarifications from a consumers POV
  • list in the supplementary files mapping DVB/ATSC id to grabber/channel. Then let --list-channels & co. enrich channel list with related ids. MythTV's Browser Based Setup might be a consumer for this to allow automatic mapping of channels in the guide to channels on the video source.

Data Sinks

  • Check [2] and see if all are mentioned here

Best Practices

Consumers of XMLTV Data

  • be prepared that xmltv ids really might be similar to FQDN (255 characters max.) the longest I've seen in the wild is 69 characters (_es_laguiatv)

Random Pieces of Information

You might receive the same transport stream on multiple frequencies

TS 101 211 - DVB Guidelines on implementation and usage of Service Information (SI)

NOTE 1: The cell_id cannot be used to identify a service. The combination of service_id and original_network_id 
remains a unique identification of a service.
It is recommended to make all receivable multiplexes with the same transport_stream_id but with different
cell_ids available to the user, and only when a service (not a transport stream) is available through multiple
multiplexes to select a preferred multiplex based on e.g. reception quality.
Any reference resolution from a transport_stream_id or a service_id (e.g. from a linkage_descriptor
transport_stream_id/service_id pair) to a multiplex/frequency requires consideration to handle the potential multiplicity
Note that in networks deploying the service_availability_descriptor, the unique identification of a transport stream by
the tuple (transport_stream_id, original_network_id), can often be sensibly replaced by identification through the triplet
(transport_stream_id, original_network_id, cell_id).

Proper TV Metadata Schema Bits and Pieces (Hi TVBrainz)

The set of titles (series/season/episode) should be marked as being a true alternate or just a typo/search hint. Some titles are international (used for all languages without localization) titles - showing a poster of the international is better then defaulting to a specific locale.